1
|
Validity of inpatient electronic health record-based measures of oxygen-related therapy in the United States: Lessons applicable for studying COVID-19 endpoints. Pharmacoepidemiol Drug Saf 2024; 33:e5785. [PMID: 38565526 DOI: 10.1002/pds.5785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 03/04/2024] [Accepted: 03/14/2024] [Indexed: 04/04/2024]
Abstract
INTRODUCTION During the COVID-19 pandemic, inpatient electronic health records (EHRs) have been used to conduct public health surveillance and assess treatments and outcomes. Invasive mechanical ventilation (MV) and supplemental oxygen (O2) use are markers of severe illness in hospitalized COVID-19 patients. In a large US system (n = 142 hospitals), we assessed documentation of MV and O2 use during COVID-19 hospitalization in administrative data versus nursing documentation. METHODS We identified 319 553 adult hospitalizations with a COVID-19 diagnosis, February 2020-October 2022, and extracted coded, administrative data for MV or O2. Separately, we developed classification rules for MV or O2 supplementation from semi-structured nursing documentation. We assessed MV and O2 supplementation in administrative data versus nursing documentation and calculated ordinal endpoints of decreasing COVID-19 disease severity. Nursing documentation was considered the gold standard in sensitivity and positive predictive value (PPV) analyses. RESULTS In nursing documentation, the prevalence of MV and O2 supplementation among COVID-19 hospitalizations was 14% and 75%, respectively. The sensitivity of administrative data was 83% for MV and 41% for O2, with both PPVs above 91%. Concordance between sources was 97% for MV (κ = 0.85), and 54% for O2 (κ = 0.21). For ordinal endpoints, administrative data accurately identified intensive care and MV but underestimated hospitalizations with O2 requirements (42% vs. 18%). CONCLUSIONS In comparison to nursing documentation, administrative data under-ascertained O2 supplementation but accurately estimated severe endpoints such as MV. Nursing documentation improved ascertainment of O2 among COVID-19 hospitalizations and can capture oxygen requirements in adults hospitalized with COVID-19 or other respiratory illnesses.
Collapse
|
2
|
Multi-replicas integrity checking scheme with supporting probability audit for cloud-based IoT. PeerJ Comput Sci 2024; 10:e1790. [PMID: 38259890 PMCID: PMC10803085 DOI: 10.7717/peerj-cs.1790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024]
Abstract
Nowadays, more people are choosing to use cloud storage services to save space and reduce costs. To enhance the durability and persistence, users opt to store important data in the form of multiple copies on cloud servers. However, outsourcing data in the cloud means that it is not directly under the control of users, raising concerns about security and integrity. Recent research has found that most existing multicopy integrity verification schemes can correctly perform integrity verification even when multiple copies are stored on the same Cloud Service Provider (CSP), which clearly deviates from the initial intention of users wanting to store files on multiple CSPs. With these considerations in mind, this paper proposes a scheme for synchronizing the integrity verification of copies, specifically focusing on strongly privacy Internet of Things (IoT) electronic health record (EHR) data. First, the paper addresses the issues present in existing multicopy integrity verification schemes. The scheme incorporates the entity Cloud Service Manager (CSM) to assist in the model construction, and each replica file is accompanied with its corresponding homomorphic verification tag. To handle scenarios where replica files stored on multiple CSPs cannot provide audit proof on time due to objective reasons, the paper introduces a novel approach called probability audit. By incorporating a probability audit, the scheme ensures that replica files are indeed stored on different CSPs and guarantees the normal execution of the public auditing phase. The scheme utilizes identity-based encryption (IBE) for the detailed design, avoiding the additional overhead caused by dealing with complex certificate issues. The proposed scheme can withstand forgery attack, replace attack, and replay attack, demonstrating strong security. The performance analysis demonstrates the feasibility and effectiveness of the scheme.
Collapse
|
3
|
Pitfalls in Analyzing FHIR Data from Different University Hospitals. Stud Health Technol Inform 2023; 307:146-151. [PMID: 37697848 DOI: 10.3233/shti230706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
The German Medical Informatics Initiative has agreed on a HL7 FHIR-based core data set as the common data model that all 37 university hospitals use for their patient's data. These data are stored locally at the site but are centrally queryable for researchers and accessible upon request. This infrastructure is currently under construction, and its functionality is being tested by so-called Projectathons. In the 6th Projectathon, a clinical hypothesis was formulated, executed in a multicenter scenario, and its results were analyzed. A number of oddities emerged in the analysis of data from different sites. Biometricians, who had previously performed analyses in prospective data collection settings such as clinical trials or cohorts, were not consistently aware of these idiosyncrasies. This field report describes data quality problems that have occurred, although not all are genuine errors. The aim is to point out such circumstances of data generation that may affect statistical analysis.
Collapse
|
4
|
Assessing patterns in cancer screening use by race and ethnicity during the coronavirus pandemic using electronic health record data. Cancer Med 2023; 12:16548-16557. [PMID: 37347148 PMCID: PMC10469733 DOI: 10.1002/cam4.6246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 05/16/2023] [Accepted: 06/02/2023] [Indexed: 06/23/2023] Open
Abstract
BACKGROUND Efforts to prevent the spread of the coronavirus led to dramatic reductions in nonemergency medical care services during the first several months of the COVID-19 pandemic. Delayed or missed screenings can lead to more advanced stage cancer diagnoses with potentially worse health outcomes and exacerbate preexisting racial and ethnic disparities. The objective of this analysis was to examine how the pandemic affected rates of breast and colorectal cancer screenings by race and ethnicity. METHODS We analyzed panels of providers that placed orders in 2019-2020 for mammogram and colonoscopy cancer screenings using electronic health record (EHR) data. We used a difference-in-differences design to examine the extent to which changes in provider-level mammogram and colonoscopy orders declined over the first year of the pandemic and whether these changes differed across race and ethnicity groups. RESULTS We found considerable declines in both types of screenings from March through May 2020, relative to the same months in 2019, for all racial and ethnic groups. Some rebound in screenings occurred in June through December 2020, particularly among White and Black patients; however, use among other groups was still lower than expected. CONCLUSIONS This research suggests that many patients experienced missed or delayed screenings during the first few months of the pandemic, which could lead to detrimental health outcomes. Our findings also underscore the importance of having high-quality data on race and ethnicity to document and understand racial and ethnic disparities in access to care.
Collapse
|
5
|
Enhancing early autism prediction based on electronic records using clinical narratives. J Biomed Inform 2023; 144:104390. [PMID: 37182592 PMCID: PMC10526711 DOI: 10.1016/j.jbi.2023.104390] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 04/14/2023] [Accepted: 05/09/2023] [Indexed: 05/16/2023]
Abstract
Recent work has shown that predictive models can be applied to structured electronic health record (EHR) data to stratify autism likelihood from an early age (<1 year). Integrating clinical narratives (or notes) with structured data has been shown to improve prediction performance in other clinical applications, but the added predictive value of this information in early autism prediction has not yet been explored. In this study, we aimed to enhance the performance of early autism prediction by using both structured EHR data and clinical narratives. We built models based on structured data and clinical narratives separately, and then an ensemble model that integrated both sources of data. We assessed the predictive value of these models from Duke University Health System over a 14-year span to evaluate ensemble models predicting later autism diagnosis (by age 4 years) from data collected from ages 30 to 360 days. Our sample included 11,750 children above by age 3 years (385 meeting autism diagnostic criteria). The ensemble model for autism prediction showed superior performance and at age 30 days achieved 46.8% sensitivity (95% confidence interval, CI: 22.0%, 52.9%), 28.0% positive predictive value (PPV) at high (90%) specificity (CI: 2.0%, 33.1%), and AUC4 (with at least 4-year follow-up for controls) reaching 0.769 (CI: 0.715, 0.811). Prediction by 360 days achieved 44.5% sensitivity (CI: 23.6%, 62.9%), and 13.7% PPV at high (90%) specificity (CI: 9.6%, 18.9%), and AUC4 reaching 0.797 (CI: 0.746, 0.840). Results show that incorporating clinical narratives in early autism prediction achieved promising accuracy by age 30 days, outperforming models based on structured data only. Furthermore, findings suggest that additional features learned from clinician narratives might be hypothesis generating for understanding early development in autism.
Collapse
|
6
|
Towards electronic health record-based medical knowledge graph construction, completion, and applications: A literature study. J Biomed Inform 2023:104403. [PMID: 37230406 DOI: 10.1016/j.jbi.2023.104403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/16/2023] [Accepted: 05/19/2023] [Indexed: 05/27/2023]
Abstract
With the growth of data and intelligent technologies, the healthcare sector opened numerous technology that enabled services for patients, clinicians, and researchers. One major hurdle in achieving state-of-the-art results in health informatics is domain-specific terminologies and their semantic complexities. A knowledge graph crafted from medical concepts, events, and relationships acts as a medical semantic network to extract new links and hidden patterns from health data sources. Current medical knowledge graph construction studies are limited to generic techniques and opportunities and focus less on exploiting real-world data sources in knowledge graph construction. A knowledge graph constructed from Electronic Health Records (EHR) data obtains real-world data from healthcare records. It ensures better results in subsequent tasks like knowledge extraction and inference, knowledge graph completion, and medical knowledge graph applications such as diagnosis predictions, clinical recommendations, and clinical decision support. This review critically analyses existing works on medical knowledge graphs that used EHR data as the data source at (i) representation level, (ii) extraction level (iii) completion level. In this investigation, we found that EHR-based knowledge graph construction involves challenges such as high complexity and dimensionality of data, lack of knowledge fusion, and dynamic update of the knowledge graph. In addition, the study presents possible ways to tackle the challenges identified. Our findings conclude that future research should focus on knowledge graph integration and knowledge graph completion challenges.
Collapse
|
7
|
Why Are Data Missing in Clinical Data Warehouses? A Simulation Study of How Data Are Processed (and Can Be Lost). Stud Health Technol Inform 2023; 302:202-206. [PMID: 37203647 DOI: 10.3233/shti230103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
In recent years, the development of clinical data warehouses (CDW) has put Electronic Health Records (EHR) data in the spotlight. More and more innovative technologies for healthcare are based on these EHR data. However, quality assessments on EHR data are fundamental to gain confidence in the performances of new technologies. The infrastructure developed to access EHR data - CDW - can affect EHR data quality but its impact is difficult to measure. We conducted a simulation on the Assistance Publique - Hôpitaux de Paris (AP-HP) infrastructure to assess how a study on breast cancer care pathways could be affected by the complexity of the data flows between the AP-HP Hospital Information System, the CDW, and the analysis platform. A model of the data flows was developed. We retraced the flows of specific data elements for a simulated cohort of 1,000 patients. We estimated that 756 [743;770] and 423 [367;483] patients had all the data elements necessary to reconstruct the care pathway in the analysis platform in the "best case" scenarios (losses affect the same patients) and in a random distribution scenario (losses affect patients at random), respectively.
Collapse
|
8
|
Time interval uncertainty-aware and text-enhanced based disease prediction. J Biomed Inform 2023; 139:104239. [PMID: 36356933 DOI: 10.1016/j.jbi.2022.104239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/14/2022] [Accepted: 11/02/2022] [Indexed: 11/09/2022]
Abstract
Deep learning methods have achieved success in disease prediction using electronic health records (EHR) data. Most of the existing methods have some limitations. First, most of the methods adopt a homogeneous decay way to deal with the effect of time interval on patient's previous visits information. However, the effect of the time interval between patient's visits is not always negative. For example, although the time interval between visits for patients with chronic diseases is relatively long, the importance of the previous visit to the next visit is high, and we may not be able to consider the effect of the time interval as negative at this point. That is, the effect of the time interval on previous visits is exerted in a nonmonotonic manner, and it is either positive, negative, or neutral. In addition, the effect of text information on prediction results is not taken into account in most of methods. The text in EHR contains a description of the patient's past medical history and current symptoms of the disease, which is important for prediction results. In order to solve these issues, we propose a Time Interval Uncertainty-Aware and Text-Enhanced Based Disease Prediction Model, which utilizes the uncertain effects of time intervals and patient's text information for disease prediction. Firstly, we apply a cross-attention mechanism to generate a global representation of the patient using the patient's disease and text information from the EHR. Then, we use the key-query attention mechanism to obtain the two importance weights of the two visit sequences with and without time intervals, respectively. Furthermore, we achieve disease prediction by making slight adjustments to the encode part of the Transformer, a deep learning model based on a self-attention mechanism. We compare with various state-of-the-art models on two publicly available datasets, MIMIC-III and MIMIC-IV, and select the top 10 diseases with the highest frequency in the dataset as the target diseases. On the MIMIC-III dataset, our model is up to three percent higher than the optimal baseline in terms of evaluation metrics.
Collapse
|
9
|
A Comprehensive and Improved Definition for Hospital-Acquired Pressure Injury Classification Based on Electronic Health Records: Comparative Study. JMIR Med Inform 2023; 11:e40672. [PMID: 36649481 PMCID: PMC9999254 DOI: 10.2196/40672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 12/24/2022] [Accepted: 01/14/2023] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Patients develop pressure injuries (PIs) in the hospital owing to low mobility, exposure to localized pressure, circulatory conditions, and other predisposing factors. Over 2.5 million Americans develop PIs annually. The Center for Medicare and Medicaid considers hospital-acquired PIs (HAPIs) as the most frequent preventable event, and they are the second most common claim in lawsuits. With the growing use of electronic health records (EHRs) in hospitals, an opportunity exists to build machine learning models to identify and predict HAPI rather than relying on occasional manual assessments by human experts. However, accurate computational models rely on high-quality HAPI data labels. Unfortunately, the different data sources within EHRs can provide conflicting information on HAPI occurrence in the same patient. Furthermore, the existing definitions of HAPI disagree with each other, even within the same patient population. The inconsistent criteria make it impossible to benchmark machine learning methods to predict HAPI. OBJECTIVE The objective of this project was threefold. We aimed to identify discrepancies in HAPI sources within EHRs, to develop a comprehensive definition for HAPI classification using data from all EHR sources, and to illustrate the importance of an improved HAPI definition. METHODS We assessed the congruence among HAPI occurrences documented in clinical notes, diagnosis codes, procedure codes, and chart events from the Medical Information Mart for Intensive Care III database. We analyzed the criteria used for the 3 existing HAPI definitions and their adherence to the regulatory guidelines. We proposed the Emory HAPI (EHAPI), which is an improved and more comprehensive HAPI definition. We then evaluated the importance of the labels in training a HAPI classification model using tree-based and sequential neural network classifiers. RESULTS We illustrate the complexity of defining HAPI, with <13% of hospital stays having at least 3 PI indications documented across 4 data sources. Although chart events were the most common indicator, it was the only PI documentation for >49% of the stays. We demonstrate a lack of congruence across existing HAPI definitions and EHAPI, with only 219 stays having a consensus positive label. Our analysis highlights the importance of our improved HAPI definition, with classifiers trained using our labels outperforming others on a small manually labeled set from nurse annotators and a consensus set in which all definitions agreed on the label. CONCLUSIONS Standardized HAPI definitions are important for accurately assessing HAPI nursing quality metric and determining HAPI incidence for preventive measures. We demonstrate the complexity of defining an occurrence of HAPI, given the conflicting and incomplete EHR data. Our EHAPI definition has favorable properties, making it a suitable candidate for HAPI classification tasks.
Collapse
|
10
|
Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C). BMC Med Res Methodol 2023; 23:46. [PMID: 36800930 PMCID: PMC9936475 DOI: 10.1186/s12874-023-01839-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 01/09/2023] [Indexed: 02/19/2023] Open
Abstract
BACKGROUND Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data.
Collapse
|
11
|
Effects of a QI intervention on pediatric asthma treatment using patient outcomes and workflow in an emergency department. J Asthma 2023:1-11. [PMID: 36562525 DOI: 10.1080/02770903.2022.2162412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
OBJECTIVE Evaluate a nurse-initiated quality improvement (QI) intervention aimed at enhancing asthma treatment in a pediatric emergency department (ED), utilizing outcomes and workflow. METHODS We evaluated the impact of QI interventions for pediatric patients presenting to the ED with asthma with pre-post analysis. A pediatric asthma score (PAS) of >8 indicated moderate to severe asthma. This secondary analysis of the electronic health record (EHR), evaluated on 1) patient outcomes (time to clinical treatment, ED length of stay [EDLOS], admissions and discharges home), 2) clinical workflow. RESULTS We compared 886 visits occurring between 01/01/2015 and 09/27/2015 (pre-implementation period) with 752 visits between 01/01/2016 and 09/27/2016 (post-implementation). Time to first documentation of PAS was decreased post-intervention (p<.001) by >30 min (75 ± 57 to 39 ± 54 min). There were significant decreases in time to treatment with both steroid and bronchodilator administration (both p<.001). EDLOS did not significantly change. Based on acuity level, those discharged home from the ED with high acuity (PAS score ≥8), had a significant decrease in time to initial PAS, steroid and bronchodilator use and EDLOS. Of those with high acuity who were admitted to the hospital, there was a difference pre- to post-implementation, in time to first PAS (p<.05), but not to treatment. Workflow visualization provided additional insights and detailed (task level) comparisons of the timing of ED activities. CONCLUSIONS Nurse-initiated ED interventions, can significantly improve the timeliness of pediatric asthma evaluation and treatment. Examining workflow along with the outcomes, can better inform QI evaluations and clinical management.
Collapse
|
12
|
COVID-19 vaccination and venous thromboembolism risk in older veterans. J Clin Transl Sci 2023; 7:e55. [PMID: 37008615 PMCID: PMC10052419 DOI: 10.1017/cts.2022.527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 12/03/2022] [Accepted: 12/14/2022] [Indexed: 02/04/2023] Open
Abstract
Introduction It is important for SARS-CoV-2 vaccine providers, vaccine recipients, and those not yet vaccinated to be well informed about vaccine side effects. We sought to estimate the risk of post-vaccination venous thromboembolism (VTE) to meet this need. Methods We conducted a retrospective cohort study to quantify excess VTE risk associated with SARS-CoV-2 vaccination in US veterans age 45 and older using data from the Department of Veterans Affairs (VA) National Surveillance Tool. The vaccinated cohort received at least one dose of a SARS-CoV-2 vaccine at least 60 days prior to 3/06/22 (N = 855,686). The control group was those not vaccinated (N = 321,676). All patients were COVID-19 tested at least once before vaccination with a negative test. The main outcome was VTE documented by ICD10-CM codes. Results Vaccinated persons had a VTE rate of 1.3755 (CI: 1.3752-1.3758) per thousand, which was 0.1 percent over the baseline rate of 1.3741 (CI: 1.3738-1.3744) per thousand in the unvaccinated patients, or 1.4 excess cases per 1,000,000. All vaccine types showed a minimal increased rate of VTE (rate of VTE per 1000 was 1.3761 (CI: 1.3754-1.3768) for Janssen; 1.3757 (CI: 1.3754-1.3761) for Pfizer, and for Moderna, the rate was 1.3757 (CI: 1.3748-1.3877)). The tiny differences in rates comparing either Janssen or Pfizer vaccine to Moderna were statistically significant (p < 0.001). Adjusting for age, sex, BMI, 2-year Elixhauser score, and race, the vaccinated group had a minimally higher relative risk of VTE as compared to controls (1.0009927 CI: 1.007673-1.0012181; p < 0.001). Conclusion The results provide reassurance that there is only a trivial increased risk of VTE with the current US SARS-CoV-2 vaccines used in veterans older than age 45. This risk is significantly less than VTE risk among hospitalized COVID-19 patients. The risk-benefit ratio favors vaccination, given the VTE rate, mortality, and morbidity associated with COVID-19 infection.
Collapse
|
13
|
Measuring Training Disruptions Using an Informatics Based Tool. Acad Pediatr 2023; 23:7-11. [PMID: 35306187 DOI: 10.1016/j.acap.2022.03.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 03/01/2022] [Accepted: 03/11/2022] [Indexed: 01/19/2023]
Abstract
OBJECTIVE Training disruptions, such as planned curricular adjustments or unplanned global pandemics, impact residency training in ways that are difficult to quantify. Informatics-based medical education tools can help measure these impacts. We tested the ability of a software platform driven by electronic health record data to quantify anticipated changes in trainee clinical experiences during the COVID-19 pandemic. METHODS We previously developed and validated the Trainee Individualized Learning System (TRAILS) to identify pediatric resident clinical experiences (i.e. shifts, resident provider-patient interactions (rPPIs), and diagnoses). We used TRAILS to perform a year-over-year analysis comparing pediatrics residents at a large academic children's hospital during March 15-June 15 in 2018 (Control #1), 2019 (Control #2), and 2020 (Exposure). RESULTS Residents in the exposure cohort had fewer shifts than those in both control cohorts (P < .05). rPPIs decreased an average of 43% across all PGY levels, with interns experiencing a 78% decrease in Continuity Clinic. Patient continuity decreased from 23% to 11%. rPPIs with common clinic and emergency department diagnoses decreased substantially during the exposure period. CONCLUSIONS Informatics tools like TRAILS may help program directors understand the impact of training disruptions on resident clinical experiences and target interventions to learners' needs and development.
Collapse
|
14
|
Prevalence of High Weight Status in Children Under 2 Years in NHANES and Statewide Electronic Health Records Data in North Carolina and South Carolina. Acad Pediatr 2022; 22:1353-1359. [PMID: 35342033 PMCID: PMC9508281 DOI: 10.1016/j.acap.2022.03.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/18/2022] [Accepted: 03/21/2022] [Indexed: 01/18/2023]
Abstract
OBJECTIVES We evaluated the prevalence of high weight status in children ages 0 to 24 months (m) using data from electronic health records (EHR) and NHANES. We also examined relationships between weight status during infancy and obesity at 24 months of age. METHODS EHR data from 4 institutions in North and South Carolina included patients born January 1, 2013-October 10, 2017 (N = 147,290). NHANES data included study waves from 1999 to 2018 (unweighted N = 5121). We calculated weight-for-length (WFL), weight-for-age (WFA), and body mass index (BMI), excluding implausible values, and categorized weight status (<85th, 85th to <95th, or ≥95th percentile), assessing prevalence at birth, 6, 12, 18, and 24 months. Utilizing individual, longitudinal EHR data, we used separate regression models to assess obesity risk at 24 months based on anthropometrics at birth, 6, 12, and 18 months, adjusting for sex, race/ethnicity, insurance, and health system. RESULTS Prevalence of BMI ≥95th percentile in EHR data at 6, 12, 18, and 24 months were 9.7%, 15.7%, 19.6%, and 20.5%, respectively. With NHANES the prevalence was 11.6%, 15.0%, 16.0%, and 8.4%. For both, the prevalence of high weight status was higher in Hispanic children. In EHR data, high weight status at 6, 12, and 18 months was associated with obesity at 24 months, with stronger associations as BMI category increased and as age increased. CONCLUSIONS High weight status is common in infants and young children, although lower at 24 months in NHANES than EHR data. In EHR data, high BMI at 6, 12, and 18 months was associated with increased risk of obesity at 24 months.
Collapse
|
15
|
Unsupervised probabilistic models for sequential Electronic Health Records. J Biomed Inform 2022; 134:104163. [PMID: 36038064 PMCID: PMC10588733 DOI: 10.1016/j.jbi.2022.104163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 06/23/2022] [Accepted: 08/11/2022] [Indexed: 11/18/2022]
Abstract
We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.
Collapse
|
16
|
Integrative analysis of clinical health records, imaging and pathogen genomics identifies personalized predictors of disease prognosis in tuberculosis. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.07.20.22277862. [PMID: 35898335 PMCID: PMC9327630 DOI: 10.1101/2022.07.20.22277862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Tuberculosis (TB) afflicts over 10 million people every year and its global burden is projected to increase dramatically due to multidrug-resistant TB (MDR-TB). The Covid-19 pandemic has resulted in reduced access to TB diagnosis and treatment, reversing decades of progress in disease management globally. It is thus crucial to analyze real-world multi-domain information from patient health records to determine personalized predictors of TB treatment outcome and drug resistance. We conduct a retrospective analysis on electronic health records of 5060 TB patients spanning 10 countries with high burden of MDR-TB including Ukraine, Moldova, Belarus and India available on the NIAID-TB portals database. We analyze over 200 features across multiple host and pathogen modalities representing patient social demographics, disease presentations as seen in cChest X rays and CT scans, and genomic records with drug susceptibility features of the pathogen strain from each patient. Our machine learning model, built with diverse data modalities outperforms models built using each modality alone in predicting treatment outcomes, with an accuracy of 81% and AUC of 0.768. We determine robust predictors across countries that are associated with unsuccessful treatmentclinical outcomes, and validate our predictions on new patient data from TB Portals. Our analysis of drug regimens and drug interactions suggests that synergistic drug combinations and those containing the drugs Bedaquiline, Levofloxacin, Clofazimine and Amoxicillin see more success in treating MDR and XDR TB. Features identified via chest imaging such as percentage of abnormal volume, size of lung cavitation and bronchial obstruction are associated significantly with pathogen genomic attributes of drug resistance. Increased disease severity was also observed in patients with lower BMI and with comorbidities. Our integrated multi-modal analysis thus revealed significant associations between radiological, microbiological, therapeutic, and demographic data modalities, providing a deeper understanding of personalized responses to aid in the clinical management of TB.
Collapse
|
17
|
Describing the population experiencing COVID-19 vaccine breakthrough following second vaccination in England: a cohort study from OpenSAFELY. BMC Med 2022; 20:243. [PMID: 35791013 PMCID: PMC9255436 DOI: 10.1186/s12916-022-02422-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2021] [Accepted: 05/30/2022] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND While the vaccines against COVID-19 are highly effective, COVID-19 vaccine breakthrough is possible despite being fully vaccinated. With SARS-CoV-2 variants still circulating, describing the characteristics of individuals who have experienced COVID-19 vaccine breakthroughs could be hugely important in helping to determine who may be at greatest risk. METHODS With the approval of NHS England, we conducted a retrospective cohort study using routine clinical data from the OpenSAFELY-TPP database of fully vaccinated individuals, linked to secondary care and death registry data and described the characteristics of those experiencing COVID-19 vaccine breakthroughs. RESULTS As of 1st November 2021, a total of 15,501,550 individuals were identified as being fully vaccinated against COVID-19, with a median follow-up time of 149 days (IQR: 107-179). From within this population, a total of 579,780 (<4%) individuals reported a positive SARS-CoV-2 test. For every 1000 years of patient follow-up time, the corresponding incidence rate (IR) was 98.06 (95% CI 97.93-98.19). There were 28,580 COVID-19-related hospital admissions, 1980 COVID-19-related critical care admissions and 6435 COVID-19-related deaths; corresponding IRs 4.77 (95% CI 4.74-4.80), 0.33 (95% CI 0.32-0.34) and 1.07 (95% CI 1.06-1.09), respectively. The highest rates of breakthrough COVID-19 were seen in those in care homes and in patients with chronic kidney disease, dialysis, transplant, haematological malignancy or who were immunocompromised. CONCLUSIONS While the majority of COVID-19 vaccine breakthrough cases in England were mild, some differences in rates of breakthrough cases have been identified in several clinical groups. While it is important to note that these findings are simply descriptive and cannot be used to answer why certain groups have higher rates of COVID-19 breakthrough than others, the emergence of the Omicron variant of COVID-19 coupled with the number of positive SARS-CoV-2 tests still occurring is concerning and as numbers of fully vaccinated (and boosted) individuals increases and as follow-up time lengthens, so too will the number of COVID-19 breakthrough cases. Additional analyses, to assess vaccine waning and rates of breakthrough COVID-19 between different variants, aimed at identifying individuals at higher risk, are needed.
Collapse
|
18
|
Development and implementation of a prescription opioid registry across diverse health systems. JAMIA Open 2022; 5:ooac030. [PMID: 35651523 PMCID: PMC9150082 DOI: 10.1093/jamiaopen/ooac030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/21/2022] [Accepted: 04/28/2022] [Indexed: 11/21/2022] Open
Abstract
Objective Develop and implement a prescription opioid registry in 10 diverse health systems across the US and describe trends in prescribed opioids between 2012 and 2018. Materials and Methods Using electronic health record and claims data, we identified patients who had an outpatient fill for any prescription opioid, and/or an opioid use disorder diagnosis, between January 1, 2012 and December 31, 2018. The registry contains distributed files of prescription opioids, benzodiazepines and other select medications, opioid antagonists, clinical diagnoses, procedures, health services utilization, and health plan membership. Rates of outpatient opioid fills over the study period, standardized to health system demographic distributions, are described by age, gender, and race/ethnicity among members without cancer. Results The registry includes 6 249 710 patients and over 40 million outpatient opioid fills. For the combined registry population, opioid fills declined from a high of 0.718 per member-year in 2013 to 0.478 in 2018, and morphine milligram equivalents (MMEs) per fill declined from 985 MMEs per fill in 2012 to 758 MMEs in 2018. MMEs per member declined from 692 MMEs per member in 2012 to 362 MMEs per member in 2018. Conclusion This study established a population-based opioid registry across 10 diverse health systems that can be used to address questions related to opioid use. Initial analyses showed large reductions in overall opioid use per member among the combined health systems. The registry will be used in future studies to answer a broad range of other critical public health issues relating to prescription opioid use.
Collapse
|
19
|
Factors Associated With COVID-19 Death in the United States: Cohort Study. JMIR Public Health Surveill 2022; 8:e29343. [PMID: 35377319 PMCID: PMC9132142 DOI: 10.2196/29343] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 11/21/2021] [Accepted: 04/01/2022] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Since the initial COVID-19 cases were identified in the United States in February 2020, the United States has experienced a high incidence of the disease. Understanding the risk factors for severe outcomes identifies the most vulnerable populations and helps in decision-making. OBJECTIVE This study aims to assess the factors associated with COVID-19-related deaths from a large, national, individual-level data set. METHODS A cohort study was conducted using data from the Optum de-identified COVID-19 electronic health record (EHR) data set; 1,271,033 adult participants were observed from February 1, 2020, to August 31, 2020, until their deaths due to COVID-19, deaths due to other reasons, or the end of the study. Cox proportional hazards models were constructed to evaluate the risks for each patient characteristic. RESULTS A total of 1,271,033 participants (age: mean 52.6, SD 17.9 years; male: 507,574/1,271,033, 39.93%) were included in the study, and 3315 (0.26%) deaths were attributed to COVID-19. Factors associated with COVID-19-related death included older age (80 vs 50-59 years old: hazard ratio [HR] 13.28, 95% CI 11.46-15.39), male sex (HR 1.68, 95% CI 1.57-1.80), obesity (BMI 40 vs <30 kg/m2: HR 1.71, 95% CI 1.50-1.96), race (Hispanic White, African American, Asian vs non-Hispanic White: HR 2.46, 95% CI 2.01-3.02; HR 2.27, 95% CI 2.06-2.50; HR 2.06, 95% CI 1.65-2.57), region (South, Northeast, Midwest vs West: HR 1.62, 95% CI 1.33-1.98; HR 2.50, 95% CI 2.06-3.03; HR 1.35, 95% CI 1.11-1.64), chronic respiratory disease (HR 1.21, 95% CI 1.12-1.32), cardiac disease (HR 1.10, 95% CI 1.01-1.19), diabetes (HR 1.92, 95% CI 1.75-2.10), recent diagnosis of lung cancer (HR 1.70, 95% CI 1.14-2.55), severely reduced kidney function (HR 1.92, 95% CI 1.69-2.19), stroke or dementia (HR 1.25, 95% CI 1.15-1.36), other neurological diseases (HR 1.77, 95% CI 1.59-1.98), organ transplant (HR 1.35, 95% CI 1.09-1.67), and other immunosuppressive conditions (HR 1.21, 95% CI 1.01-1.46). CONCLUSIONS This is one of the largest national cohort studies in the United States; we identified several patient characteristics associated with COVID-19-related deaths, and the results can serve as the basis for policy making. The study also offered directions for future studies, including the effect of other socioeconomic factors on the increased risk for minority groups.
Collapse
|
20
|
Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use. Learn Health Syst 2022; 6:e10264. [PMID: 35036548 PMCID: PMC8753309 DOI: 10.1002/lrh2.10264] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 11/10/2022] Open
Abstract
INTRODUCTION Secondary use of electronic health record (EHR) data for research requires that the data are fit for use. Data quality (DQ) frameworks have traditionally focused on structural conformance and completeness of clinical data extracted from source systems. In this paper, we propose a framework for evaluating semantic DQ that will allow researchers to evaluate fitness for use prior to analyses. METHODS We reviewed current DQ literature, as well as experience from recent multisite network studies, and identified gaps in the literature and current practice. Derived principles were used to construct the conceptual framework with attention to both analytic fitness and informatics practice. RESULTS We developed a systematic framework that guides researchers in assessing whether a data source is fit for use for their intended study or project. It combines tools for evaluating clinical context with DQ principles, as well as factoring in the characteristics of the data source, in order to develop semantic DQ checks. CONCLUSIONS Our framework provides a systematic process for DQ development. Further work is needed to codify practices and metadata around both structural and semantic data quality.
Collapse
|
21
|
Building capacity of community health centers to overcome data challenges with the development of an agile COVID-19 public health registry: a multistate quality improvement effort. J Am Med Inform Assoc 2021; 29:80-88. [PMID: 34648005 PMCID: PMC8524633 DOI: 10.1093/jamia/ocab233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Revised: 09/02/2021] [Accepted: 10/11/2021] [Indexed: 11/12/2022] Open
Abstract
Objective During the coronavirus disease 2019 (COVID-19) pandemic, federally qualified health centers rapidly mobilized to provide SARS-CoV-2 testing, COVID-19 care, and vaccination to populations at increased risk for COVID-19 morbidity and mortality. We describe the development of a reusable public health data analytics system for reuse of clinical data to evaluate the health burden, disparities, and impact of COVID-19 on populations served by health centers. Materials and Methods The Multistate Data Strategy engaged project partners to assess public health readiness and COVID-19 data challenges. An infrastructure for data capture and sharing procedures between health centers and public health agencies was developed to support existing capabilities and data capacities to respond to the pandemic. Results Between August 2020 and March 2021, project partners evaluated their data capture and sharing capabilities and reported challenges and preliminary data. Major interoperability challenges included poorly aligned federal, state, and local reporting requirements, lack of unique patient identifiers, lack of access to pharmacy, claims and laboratory data, missing data, and proprietary data standards and extraction methods. Discussion Efforts to access and align project partners’ existing health systems data infrastructure in the context of the pandemic highlighted complex interoperability challenges. These challenges remain significant barriers to real-time data analytics and efforts to improve health outcomes and mitigate inequities through data-driven responses. Conclusion The reusable public health data analytics system created in the Multistate Data Strategy can be adapted and scaled for other health center networks to facilitate data aggregation and dashboards for public health, organizational planning, and quality improvement and can inform local, state, and national COVID-19 response efforts.
Collapse
|
22
|
Preventing unnecessary imaging in patients suspect of coronary artery disease through machine learning of electronic health records. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2021; 3:11-19. [PMID: 36713995 PMCID: PMC9707976 DOI: 10.1093/ehjdh/ztab103] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Revised: 11/22/2021] [Accepted: 12/02/2021] [Indexed: 02/01/2023]
Abstract
Aims With the ageing European population, the incidence of coronary artery disease (CAD) is expected to rise. This will likely result in an increased imaging use. Symptom recognition can be complicated, as symptoms caused by CAD can be atypical, particularly in women. Early CAD exclusion may help to optimize use of diagnostic resources and thus improve the sustainability of the healthcare system. To develop sex-stratified algorithms, trained on routinely available electronic health records (EHRs), raw electrocardiograms, and haematology data to exclude CAD in patients upfront. Methods and results We trained XGBoost algorithms on data from patients from the Utrecht Patient-Oriented Database, who underwent coronary computed tomography angiography (CCTA), and/or stress cardiac magnetic resonance (CMR) imaging, or stress single-photon emission computerized tomography (SPECT) in the UMC Utrecht. Outcomes were extracted from radiology reports. We aimed to maximize negative predictive value (NPV) to minimize the false negative risk with acceptable specificity. Of 6808 CCTA patients (31% female), 1029 females (48%) and 1908 males (45%) had no diagnosis of CAD. Of 3053 CMR/SPECT patients (45% female), 650 females (47%) and 881 males (48%) had no diagnosis of CAD. On the train and test set, the CCTA models achieved NPVs and specificities of 0.95 and 0.19 (females) and 0.96 and 0.09 (males). The CMR/SPECT models achieved NPVs and specificities of 0.75 and 0.041 (females) and 0.92 and 0.026 (males). Conclusion Coronary artery disease can be excluded from EHRs with high NPV. Our study demonstrates new possibilities to reduce unnecessary imaging in women and men suspected of CAD.
Collapse
|
23
|
Supporting research, protecting data: one institution's approach to clinical data warehouse governance. J Am Med Inform Assoc 2021; 29:707-712. [PMID: 34871428 PMCID: PMC8922173 DOI: 10.1093/jamia/ocab259] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 09/21/2021] [Accepted: 11/11/2021] [Indexed: 12/17/2022] Open
Abstract
Institutions must decide how to manage the use of clinical data to support research while ensuring appropriate protections are in place. Questions about data use and sharing often go beyond what the Health Insurance Portability and Accountability Act of 1996 (HIPAA) considers. In this article, we describe our institution’s governance model and approach. Common questions we consider include (1) Is a request limited to the minimum data necessary to carry the research forward? (2) What plans are there for sharing data externally?, and (3) What impact will the proposed use of data have on patients and the institution? In 2020, 302 of the 319 requests reviewed were approved. The majority of requests were approved in less than 2 weeks, with few or no stipulations. For the remaining requests, the governance committee works with researchers to find solutions to meet their needs while also addressing our collective goal of protecting patients.
Collapse
|
24
|
Association of Penicillin or Cephalosporin Allergy Documentation and Antibiotic Use in Hospitalized Patients with Pneumonia. THE JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY-IN PRACTICE 2021; 9:3060-3068.e1. [PMID: 34029776 DOI: 10.1016/j.jaip.2021.04.071] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 03/31/2021] [Accepted: 04/29/2021] [Indexed: 12/19/2022]
Abstract
BACKGROUND Treatment guidelines for pneumonia recommend beta-lactam antibiotic-based therapy. Although reported penicillin allergy is common, more than 90% of patients with reported penicillin allergy are not allergic. OBJECTIVE We evaluated the association of a documented penicillin and/or cephalosporin (P/C) allergy to antibiotic use for the treatment of inpatient pneumonia. METHODS This was a national cross-sectional study conducted among Vizient, Inc., network hospitals that voluntarily contributed data. Among hospitalized patients with pneumonia, we examined the relation of a documented P/C allergy in the electronic health record to prevalence of first-line beta-lactam antibiotic administration and alternative antibiotics using multivariable log-binomial regression with generalized estimating equations. RESULTS Of 2,276 inpatients receiving antibiotics for pneumonia at 95 U.S. hospitals, 450 (20%) had a documented P/C allergy. Compared with pneumonia patients without a documented P/C allergy, patients with a documented P/C allergy had reduced prevalence of first-line beta-lactam antibiotic use (adjusted prevalence ratio [aPR] 0.79; 95% confidence interval [95% CI] 0.69-0.89]). Patients with high-risk P/C reactions (n = 91) had even lower prevalence of first-line beta-lactam antibiotic use (aPR 0.47; 95% CI 0.35-0.64). Alternative antibiotics associated with a higher use in pneumonia patients with a documented P/C allergy included carbapenems (aPR 1.61; 95% CI 1.22-2.13) and fluoroquinolones (aPR 1.52; 95% CI 1.21-1.91). CONCLUSIONS Inpatients with documented P/C allergy and pneumonia were less likely to receive recommended beta-lactams and more likely to receive carbapenems and fluoroquinolones. Inpatient allergy assessment may improve optimal antibiotic therapy for the 20% of inpatients with pneumonia and a documented P/C allergy.
Collapse
|
25
|
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. J Am Med Inform Assoc 2021; 28:427-443. [PMID: 32805036 PMCID: PMC7454687 DOI: 10.1093/jamia/ocaa196] [Citation(s) in RCA: 285] [Impact Index Per Article: 95.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 08/14/2020] [Indexed: 01/12/2023] Open
Abstract
Objective Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Materials and Methods The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Results Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Conclusions The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.
Collapse
|
26
|
Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health. Front Digit Health 2021; 3:620828. [PMID: 33791684 PMCID: PMC8009547 DOI: 10.3389/fdgth.2021.620828] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.
Collapse
|
27
|
Disease network delineates the disease progression profile of cardiovascular diseases. J Biomed Inform 2021; 115:103686. [PMID: 33493631 DOI: 10.1016/j.jbi.2021.103686] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Revised: 01/14/2021] [Accepted: 01/15/2021] [Indexed: 11/20/2022]
Abstract
OBJECTIVE As Electronic Health Records (EHR) data accumulated explosively in recent years, the tremendous amount of patient clinical data provided opportunities to discover real world evidence. In this study, a graphical disease network, named progressive cardiovascular disease network (progCDN), was built to delineate the progression profiles of cardiovascular diseases (CVD). MATERIALS AND METHODS The EHR data of 14.3 million patients with CVD diagnoses were collected for building disease network and further analysis. We applied a new designed method, progression rates (PR), to calculate the progression relationship among different diagnoses. Based on the disease network outcome, 23 disease progression pair were selected to screen for salient features. RESULTS The network depicted the dominant diseases in CVD development, such as the heart failure and coronary arteriosclerosis. Novel progression relationships were also discovered, such as the progression path from long QT syndrome to major depression. In addition, three age-group progCDNs identified a series of age-associated disease progression paths and important successor diseases with age bias. Furthermore, a list of important features with sufficient abundance and high correlation was extracted for building disease risk models. DISCUSSION The PR method designed for identifying the progression relationship could be widely applied in any EHR database due to its flexibility and robust functionality. Meanwhile, researchers could use the progCDN network to validate or explore novel disease relationships in real world data. CONCLUSION The first-time interrogation of such a huge CVD patients cohort enabled us to explore the general and age-specific disease progression patterns in CVD development.
Collapse
|
28
|
Noise-tolerant similarity search in temporal medical data. J Biomed Inform 2020; 113:103667. [PMID: 33359112 DOI: 10.1016/j.jbi.2020.103667] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 12/12/2020] [Accepted: 12/15/2020] [Indexed: 01/12/2023]
Abstract
Temporal medical data are increasingly integrated into the development of data-driven methods to deliver better healthcare. Searching such data for patterns can improve the detection of disease cases and facilitate the design of preemptive interventions. For example, specific temporal patterns could be used to recognize low-prevalence diseases, which are often under-diagnosed. However, searching these patterns in temporal medical data is challenging, as the data are often noisy, complex, and large in scale. In this work, we propose an effective and efficient solution to search for patients who exhibit conditions that resemble the input query. In our solution, we propose a similarity notion based on the Longest Common Subsequence (LCSS), which is used to measure the similarity between the query and the patient's temporal medical data and to ensure robustness against noise in the data. Our solution adopts locality sensitive hashing techniques to address the high dimensionality of medical data, by embedding the recorded clinical events (e.g., medications and diagnosis codes) into compact signatures. To perform pattern search in large EHR datasets, we propose a filtering approach based on tandem patterns, which effectively identifies candidate matches while discarding irrelevant data. The evaluations conducted using a real-world dataset demonstrate that our solution is highly accurate while significantly accelerating the similarity search.
Collapse
|
29
|
ZiMM: A deep learning model for long term and blurry relapses with non-clinical claims data. J Biomed Inform 2020; 110:103531. [PMID: 32818667 DOI: 10.1016/j.jbi.2020.103531] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 07/25/2020] [Accepted: 08/09/2020] [Indexed: 11/28/2022]
Abstract
This paper considers the problems of modeling and predicting a long-term and "blurry" relapse that occurs after a medical act, such as a surgery. We do not consider a short-term complication related to the act itself, but a long-term relapse that clinicians cannot explain easily, since it depends on unknown sets or sequences of past events that occurred before the act. The relapse is observed only indirectly, in a "blurry" fashion, through longitudinal prescriptions of drugs over a long period of time after the medical act. We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) in order to capture long-term and blurry relapses. On top of it, we build an end-to-end deep-learning architecture called ZiMM Encoder-Decoder (ZiMM ED) that can learn from the complex, irregular, highly heterogeneous and sparse patterns of health events that are observed through a claims-only database. ZiMM ED is applied on a "non-clinical" claims database, that contains only timestamped reimbursement codes for drug purchases, medical procedures and hospital diagnoses, the only available clinical feature being the age of the patient. This setting is more challenging than a setting where bedside clinical signals are available. Our motivation for using such a non-clinical claims database is its exhaustivity population-wise, compared to clinical electronic health records coming from a single or a small set of hospitals. Indeed, we consider a dataset containing the claims of almost all French citizens who had surgery for prostatic problems, with a history between 1.5 and 5 years. We consider a long-term (18 months) relapse (urination problems still occur despite surgery), which is blurry since it is observed only through the reimbursement of a specific set of drugs for urination problems. Our experiments show that ZiMM ED improves several baselines, including non-deep learning and deep-learning approaches, and that it allows working on such a dataset with minimal preprocessing work.
Collapse
|
30
|
Distributed learning from multiple EHR databases: Contextual embedding models for medical events. J Biomed Inform 2019; 92:103138. [PMID: 30825539 DOI: 10.1016/j.jbi.2019.103138] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 02/15/2019] [Accepted: 02/16/2019] [Indexed: 11/26/2022]
Abstract
Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https://github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.
Collapse
|
31
|
Abstract
Modern medical research relies on multi-institutional collaborations which enhance the knowledge discovery and data reuse. While these collaborations allow researchers to perform analytics otherwise impossible on individual datasets, they often pose significant challenges in the data integration process. Due to the lack of a unique identifier, data integration solutions often have to rely on patient's protected health information (PHI). In many situations, such information cannot leave the institutions or must be strictly protected. Furthermore, the presence of noisy values for these attributes may result in poor overall utility. While much research has been done to address these challenges, most of the current solutions are designed for a static setting without considering the temporal information of the data (e.g. EHR). In this work, we propose a novel approach that uses non-PHI for linking patient longitudinal data. Specifically, our technique captures the diagnosis dependencies using patterns which are shown to provide important indications for linking patient records. Our solution can be used as a standalone technique to perform temporal record linkage using non-protected health information data or it can be combined with Privacy Preserving Record Linkage solutions (PPRL) when protected health information is available. In this case, our approach can solve ambiguities in results. Experimental evaluations on real datasets demonstrate the effectiveness of our technique.
Collapse
|
32
|
Patient ranking with temporally annotated data. J Biomed Inform 2017; 78:43-53. [PMID: 29277597 DOI: 10.1016/j.jbi.2017.12.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 09/23/2017] [Accepted: 12/13/2017] [Indexed: 11/29/2022]
Abstract
Modern medical information systems enable the collection of massive temporal health data. Albeit these data have great potentials for advancing medical research, the data exploration and extraction of useful knowledge present significant challenges. In this work, we develop a new pattern matching technique which aims to facilitate the discovery of clinically useful knowledge from large temporal datasets. Our approach receives in input a set of temporal patterns modeling specific events of interest (e.g., doctor's knowledge, symptoms of diseases) and it returns data instances matching these patterns (e.g., patients exhibiting the specified symptoms). The resulting instances are ranked according to a significance score based on the p-value. Our experimental evaluations on a real-world dataset demonstrate the efficiency and effectiveness of our approach.
Collapse
|