1
|
Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy's Rule-based & Machine Learning-based methods. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.27.559788. [PMID: 37808763 PMCID: PMC10557629 DOI: 10.1101/2023.09.27.559788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Objective Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.
Collapse
|
2
|
Association Between Socioeconomic Factors, Race, and Use of a Specialty Memory Clinic. Neurology 2023; 101:e1424-e1433. [PMID: 37532510 PMCID: PMC10573139 DOI: 10.1212/wnl.0000000000207674] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 06/06/2023] [Indexed: 08/04/2023] Open
Abstract
BACKGROUND AND OBJECTIVES The capacity of specialty memory clinics in the United States is very limited. If lower socioeconomic status or minoritized racial group is associated with reduced use of memory clinics, this could exacerbate health care disparities, especially if more effective treatments of Alzheimer disease become available. We aimed to understand how use of a memory clinic is associated with neighborhood-level measures of socioeconomic factors and the intersectionality of race. METHODS We conducted an observational cross-sectional study using electronic health record data to compare the neighborhood advantage of patients seen at the Washington University Memory Diagnostic Center with the catchment area using a geographical information system. Furthermore, we compared the severity of dementia at the initial visit between patients who self-identified as Black or White. We used a multinomial logistic regression model to assess the Clinical Dementia Rating at the initial visit and t tests to compare neighborhood characteristics, including Area Deprivation Index, with those of the catchment area. RESULTS A total of 4,824 patients seen at the memory clinic between 2008 and 2018 were included in this study (mean age 72.7 [SD 11.0] years, 2,712 [56%] female, 543 [11%] Black). Most of the memory clinic patients lived in more advantaged neighborhoods within the overall catchment area. The percentage of patients self-identifying as Black (11%) was lower than the average percentage of Black individuals by census tract in the catchment area (16%) (p < 0.001). Black patients lived in less advantaged neighborhoods, and Black patients were more likely than White patients to have moderate or severe dementia at their initial visit (odds ratio 1.59, 95% CI 1.11-2.25). DISCUSSION This study demonstrates that patients living in less affluent neighborhoods were less likely to be seen in one large memory clinic. Black patients were under-represented in the clinic, and Black patients had more severe dementia at their initial visit. These findings suggest that patients with a lower socioeconomic status and who identify as Black are less likely to be seen in memory clinics, which are likely to be a major point of access for any new Alzheimer disease treatments that may become available.
Collapse
|
3
|
Electronic health record data quality assessment and tools: a systematic review. J Am Med Inform Assoc 2023; 30:1730-1740. [PMID: 37390812 PMCID: PMC10531113 DOI: 10.1093/jamia/ocad120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Revised: 05/16/2023] [Accepted: 06/23/2023] [Indexed: 07/02/2023] Open
Abstract
OBJECTIVE We extended a 2013 literature review on electronic health record (EHR) data quality assessment approaches and tools to determine recent improvements or changes in EHR data quality assessment methodologies. MATERIALS AND METHODS We completed a systematic review of PubMed articles from 2013 to April 2023 that discussed the quality assessment of EHR data. We screened and reviewed papers for the dimensions and methods defined in the original 2013 manuscript. We categorized papers as data quality outcomes of interest, tools, or opinion pieces. We abstracted and defined additional themes and methods though an iterative review process. RESULTS We included 103 papers in the review, of which 73 were data quality outcomes of interest papers, 22 were tools, and 8 were opinion pieces. The most common dimension of data quality assessed was completeness, followed by correctness, concordance, plausibility, and currency. We abstracted conformance and bias as 2 additional dimensions of data quality and structural agreement as an additional methodology. DISCUSSION There has been an increase in EHR data quality assessment publications since the original 2013 review. Consistent dimensions of EHR data quality continue to be assessed across applications. Despite consistent patterns of assessment, there still does not exist a standard approach for assessing EHR data quality. CONCLUSION Guidelines are needed for EHR data quality assessment to improve the efficiency, transparency, comparability, and interoperability of data quality assessment. These guidelines must be both scalable and flexible. Automation could be helpful in generalizing this process.
Collapse
|
4
|
Extraction of clinical phenotypes for Alzheimer's disease dementia from clinical notes using natural language processing. JAMIA Open 2023; 6:ooad014. [PMID: 36844369 PMCID: PMC9952043 DOI: 10.1093/jamiaopen/ooad014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Revised: 01/27/2023] [Accepted: 02/10/2023] [Indexed: 02/28/2023] Open
Abstract
Objectives There is much interest in utilizing clinical data for developing prediction models for Alzheimer's disease (AD) risk, progression, and outcomes. Existing studies have mostly utilized curated research registries, image analysis, and structured electronic health record (EHR) data. However, much critical information resides in relatively inaccessible unstructured clinical notes within the EHR. Materials and Methods We developed a natural language processing (NLP)-based pipeline to extract AD-related clinical phenotypes, documenting strategies for success and assessing the utility of mining unstructured clinical notes. We evaluated the pipeline against gold-standard manual annotations performed by 2 clinical dementia experts for AD-related clinical phenotypes including medical comorbidities, biomarkers, neurobehavioral test scores, behavioral indicators of cognitive decline, family history, and neuroimaging findings. Results Documentation rates for each phenotype varied in the structured versus unstructured EHR. Interannotator agreement was high (Cohen's kappa = 0.72-1) and positively correlated with the NLP-based phenotype extraction pipeline's performance (average F1-score = 0.65-0.99) for each phenotype. Discussion We developed an automated NLP-based pipeline to extract informative phenotypes that may improve the performance of eventual machine learning predictive models for AD. In the process, we examined documentation practices for each phenotype relevant to the care of AD patients and identified factors for success. Conclusion Success of our NLP-based phenotype extraction pipeline depended on domain-specific knowledge and focus on a specific clinical domain instead of maximizing generalizability.
Collapse
|
5
|
OpenSep: a generalizable open source pipeline for SOFA score calculation and Sepsis-3 classification. JAMIA Open 2022; 5:ooac105. [PMID: 36570030 PMCID: PMC9772813 DOI: 10.1093/jamiaopen/ooac105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/25/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open
Abstract
EHR-based sepsis research often uses heterogeneous definitions of sepsis leading to poor generalizability and difficulty in comparing studies to each other. We have developed OpenSep, an open-source pipeline for sepsis phenotyping according to the Sepsis-3 definition, as well as determination of time of sepsis onset and SOFA scores. The Minimal Sepsis Data Model was developed alongside the pipeline to enable the execution of the pipeline to diverse sources of electronic health record data. The pipeline's accuracy was validated by applying it to the MIMIC-IV version 1.0 data and comparing sepsis onset and SOFA scores to those produced by the pipeline developed by the curators of MIMIC. We demonstrated high reliability between both the sepsis onsets and SOFA scores, however the use of the Minimal Sepsis Data model developed for this work allows our pipeline to be applied to more broadly to data sources beyond MIMIC.
Collapse
|
6
|
Modifications to student quarantine policies in K-12 schools implementing multiple COVID-19 prevention strategies restores in-person education without increasing SARS-CoV-2 transmission risk, January-March 2021. PLoS One 2022; 17:e0266292. [PMID: 36264919 PMCID: PMC9584452 DOI: 10.1371/journal.pone.0266292] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 09/15/2022] [Indexed: 11/25/2022] Open
Abstract
OBJECTIVE To determine whether modified K-12 student quarantine policies that allow some students to continue in-person education during their quarantine period increase schoolwide SARS-CoV-2 transmission risk following the increase in cases in winter 2020-2021. METHODS We conducted a prospective cohort study of COVID-19 cases and close contacts among students and staff (n = 65,621) in 103 Missouri public schools. Participants were offered free, saliva-based RT-PCR testing. The projected number of school-based transmission events among untested close contacts was extrapolated from the percentage of events detected among tested asymptomatic close contacts and summed with the number of detected events for a projected total. An adjusted Cox regression model compared hazard rates of school-based SARS-CoV-2 infections between schools with a modified versus standard quarantine policy. RESULTS From January-March 2021, a projected 23 (1%) school-based transmission events occurred among 1,636 school close contacts. There was no difference in the adjusted hazard rates of school-based SARS-CoV-2 infections between schools with a modified versus standard quarantine policy (hazard ratio = 1.00; 95% confidence interval: 0.97-1.03). DISCUSSION School-based SARS-CoV-2 transmission was rare in 103 K-12 schools implementing multiple COVID-19 prevention strategies. Modified student quarantine policies were not associated with increased school incidence of COVID-19. Modifications to student quarantine policies may be a useful strategy for K-12 schools to safely reduce disruptions to in-person education during times of increased COVID-19 community incidence.
Collapse
|
7
|
Sepsis Prediction for the General Ward Setting. Front Digit Health 2022; 4:848599. [PMID: 35350226 PMCID: PMC8957791 DOI: 10.3389/fdgth.2022.848599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 01/28/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectiveTo develop and evaluate a sepsis prediction model for the general ward setting and extend the evaluation through a novel pseudo-prospective trial design.DesignRetrospective analysis of data extracted from electronic health records (EHR).SettingSingle, tertiary-care academic medical center in St. Louis, MO, USA.PatientsAdult, non-surgical inpatients admitted between January 1, 2012 and June 1, 2019.InterventionsNone.Measurements and Main ResultsOf the 70,034 included patient encounters, 3.1% were septic based on the Sepsis-3 criteria. Features were generated from the EHR data and were used to develop a machine learning model to predict sepsis 6-h ahead of onset. The best performing model had an Area Under the Receiver Operating Characteristic curve (AUROC or c-statistic) of 0.862 ± 0.011 and Area Under the Precision-Recall Curve (AUPRC) of 0.294 ± 0.021 compared to that of Logistic Regression (0.857 ± 0.008 and 0.256 ± 0.024) and NEWS 2 (0.699 ± 0.012 and 0.092 ± 0.009). In the pseudo-prospective trial, 388 (69.7%) septic patients were alerted on with a specificity of 81.4%. Within 24 h of crossing the alert threshold, 20.9% had a sepsis-related event occur.ConclusionsA machine learning model capable of predicting sepsis in the general ward setting was developed using the EHR data. The pseudo-prospective trial provided a more realistic estimation of implemented performance and demonstrated a 29.1% Positive Predictive Value (PPV) for sepsis-related intervention or outcome within 48 h.
Collapse
|
8
|
Abstract
Objective Materials and Methods Results Conclusion
Collapse
|
9
|
Comparison of early warning scores for sepsis early identification and prediction in the general ward setting. JAMIA Open 2021; 4:ooab062. [PMID: 34820600 PMCID: PMC8607822 DOI: 10.1093/jamiaopen/ooab062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Revised: 06/15/2021] [Accepted: 07/12/2021] [Indexed: 11/15/2022] Open
Abstract
The objective of this study was to directly compare the ability of commonly used early warning scores (EWS) for early identification and prediction of sepsis in the general ward setting. For general ward patients at a large, academic medical center between early-2012 and mid-2018, common EWS and patient acuity scoring systems were calculated from electronic health records (EHR) data for patients that both met and did not meet Sepsis-3 criteria. For identification of sepsis at index time, National Early Warning Score 2 (NEWS 2) had the highest performance (area under the receiver operating characteristic curve: 0.803 [95% confidence interval [CI]: 0.795-0.811], area under the precision recall curves: 0.130 [95% CI: 0.121-0.140]) followed NEWS, Modified Early Warning Score, and quick Sequential Organ Failure Assessment (qSOFA). Using validated thresholds, NEWS 2 also had the highest recall (0.758 [95% CI: 0.736-0.778]) but qSOFA had the highest specificity (0.950 [95% CI: 0.948-0.952]), positive predictive value (0.184 [95% CI: 0.169-0.198]), and F1 score (0.236 [95% CI: 0.220-0.253]). While NEWS 2 outperformed all other compared EWS and patient acuity scores, due to the low prevalence of sepsis, all scoring systems were prone to false positives (low positive predictive value without drastic sacrifices in sensitivity), thus leaving room for more computationally advanced approaches.
Collapse
|
10
|
SARS-CoV-2 screening testing in schools for children with intellectual and developmental disabilities. J Neurodev Disord 2021; 13:31. [PMID: 34465306 PMCID: PMC8407928 DOI: 10.1186/s11689-021-09376-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 08/03/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transmission of SARS-CoV-2 in schools primarily for typically developing children is rare. However, less is known about transmission in schools for children with intellectual and developmental disabilities (IDD), who are often unable to mask or maintain social distancing. The objectives of this study were to determine SARS-CoV-2 positivity and in-school transmission rates using weekly screening tests for school staff and students and describe the concurrent deployment of mitigation strategies in six schools for children with IDD. METHODS From November 23, 2020, to May, 28, 2021, weekly voluntary screening for SARS-CoV-2 with a high sensitivity molecular-based saliva test was offered to school staff and students. Weekly positivity rates were determined and compared to local healthcare system and undergraduate student screening data. School-based transmission was assessed among participants quarantined for in-school exposure. School administrators completed a standardized survey to assess school mitigation strategies. RESULTS A total of 59 students and 416 staff participated. An average of 304 school staff and students were tested per week. Of 7289 tests performed, 21 (0.29%) new SARS-CoV-2 positive cases were identified. The highest weekly positivity rate was 1.2% (n = 4) across all schools, which was less than community positivity rates. Two cases of in-school transmission were identified, each among staff, representing 2% (2/103) of participants quarantined for in-school exposure. Mitigation strategies included higher than expected student mask compliance, reduced room capacity, and phased reopening. CONCLUSIONS During 24 weeks that included the peak of the COVID-19 pandemic in winter 2020-21, we found lower rates of SARS-CoV-2 screening test positivity among staff and students of six schools for children with IDD compared to community rates. In-school transmission of SARS-CoV-2 was low among those quarantined for in-school exposure. However, the impact of the emerging SARS-CoV-2 Delta variant on the effectiveness of these proven mitigation strategies remains unknown. TRIAL REGISTRATION Prior to enrollment, this study was registered at ClinicalTrials.gov on September 25, 2020, identifier NCT04565509 , titled Supporting the Health and Well-being of Children with Intellectual and Developmental Disability During COVID-19 Pandemic.
Collapse
|
11
|
SARS-CoV-2 Screening Testing in Schools for Children with Intellectual and Developmental Disabilities. RESEARCH SQUARE 2021:rs.3.rs-700296. [PMID: 34312616 PMCID: PMC8312901 DOI: 10.21203/rs.3.rs-700296/v1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
BACKGROUNDTransmission of SARS-CoV-2 in schools primarily for typically developing children is rare. However, less is known about transmission in schools for children with intellectual and developmental disabilities (IDD), who are often unable to mask or maintain social distancing. The objectives of this study were to determine SARS-CoV-2 positivity and in-school transmission rates using weekly screening tests for school staff and students and describe the concurrent deployment of mitigation strategies in six schools for children with IDD.METHODSFrom 11/23/20 to 5/28/21, weekly voluntary screening for SARS-CoV-2 with a high sensitivity molecular-based saliva test was offered to school staff and students. Weekly positivity rates were determined and compared to local healthcare system and undergraduate student screening data. School-based transmission was assessed among participants quarantined for in-school exposure. School administrators completed a standardized survey to assess school mitigation strategies.RESULTSA total of 59 students and 416 staff participated. An average of 304 school staff and students were tested per week. Of 7,289 tests performed, 21 (0.29%) new SARS-CoV-2 positive cases were identified. The highest weekly positivity rate was 1.2% (n = 4) across all schools, which was less than community positivity rates. Two cases of in-school transmission were identified, each among staff, representing 2% (2/103) of participants quarantined for in-school exposure. Mitigation strategies included higher than expected student mask compliance, reduced room capacity, and phased reopening.CONCLUSIONSDuring 24 weeks that included the peak of the COVID-19 pandemic, we found no evidence for elevated SARS-CoV-2 screening test positivity among staff and students of six schools for children with IDD compared to community rates. In-school transmission of SARS-CoV-2 was low among those quarantined for in-school exposure.Clinical Trial RegistryPrior to enrollment, this study was registered at ClinicalTrials.gov on 9/25/2020, identifier NCT04565509, titled Supporting the Health and Well-being of Children with Intellectual and Developmental Disability During COVID-19 Pandemic (https://clinicaltrials.gov/ct2/show/NCT04565509?term=NCT04565509).
Collapse
|
12
|
Machine learning for modeling the progression of Alzheimer disease dementia using clinical data: a systematic literature review. JAMIA Open 2021; 4:ooab052. [PMID: 34350389 PMCID: PMC8327375 DOI: 10.1093/jamiaopen/ooab052] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 06/21/2021] [Accepted: 06/30/2021] [Indexed: 11/17/2022] Open
Abstract
OBJECTIVE Alzheimer disease (AD) is the most common cause of dementia, a syndrome characterized by cognitive impairment severe enough to interfere with activities of daily life. We aimed to conduct a systematic literature review (SLR) of studies that applied machine learning (ML) methods to clinical data derived from electronic health records in order to model risk for progression of AD dementia. MATERIALS AND METHODS We searched for articles published between January 1, 2010, and May 31, 2020, in PubMed, Scopus, ScienceDirect, IEEE Explore Digital Library, Association for Computing Machinery Digital Library, and arXiv. We used predefined criteria to select relevant articles and summarized them according to key components of ML analysis such as data characteristics, computational algorithms, and research focus. RESULTS There has been a considerable rise over the past 5 years in the number of research papers using ML-based analysis for AD dementia modeling. We reviewed 64 relevant articles in our SLR. The results suggest that majority of existing research has focused on predicting progression of AD dementia using publicly available datasets containing both neuroimaging and clinical data (neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values). DISCUSSION Identifying individuals at risk for progression of AD dementia could potentially help to personalize disease management to plan future care. Clinical data consisting of both structured data tables and clinical notes can be effectively used in ML-based approaches to model risk for AD dementia progression. Data sharing and reproducibility of results can enhance the impact, adaptation, and generalizability of this research.
Collapse
|
13
|
A Pragmatic Machine Learning Model To Predict Carbapenem Resistance. Antimicrob Agents Chemother 2021; 65:e0006321. [PMID: 33972243 PMCID: PMC8218615 DOI: 10.1128/aac.00063-21] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 04/30/2021] [Indexed: 12/23/2022] Open
Abstract
Infection caused by carbapenem-resistant (CR) organisms is a rising problem in the United States. While the risk factors for antibiotic resistance are well known, there remains a large need for the early identification of antibiotic-resistant infections. Using machine learning (ML), we sought to develop a prediction model for carbapenem resistance. All patients >18 years of age admitted to a tertiary-care academic medical center between 1 January 2012 and 10 October 2017 with ≥1 bacterial culture were eligible for inclusion. All demographic, medication, vital sign, procedure, laboratory, and culture/sensitivity data were extracted from the electronic health record. Organisms were considered CR if a single isolate was reported as intermediate or resistant. Patients with CR and non-CR organisms were temporally matched to maintain the positive/negative case ratio. Extreme gradient boosting was used for model development. In total, 68,472 patients met inclusion criteria, with 1,088 patients identified as having CR organisms. Sixty-seven features were used for predictive modeling. The most important features were number of prior antibiotic days, recent central venous catheter placement, and inpatient surgery. After model training, the area under the receiver operating characteristic curve was 0.846. The sensitivity of the model was 30%, with a positive predictive value (PPV) of 30% and a negative predictive value of 99%. Using readily available clinical data, we were able to create a ML model capable of predicting CR infections at the time of culture collection with a high PPV.
Collapse
|
14
|
Addressing cancer survivors' cardiovascular health using the automated heart health assessment (AH-HA) EHR tool: Initial protocol and modifications to address COVID-19 challenges. Contemp Clin Trials Commun 2021; 22:100808. [PMID: 34189339 PMCID: PMC8220316 DOI: 10.1016/j.conctc.2021.100808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 05/14/2021] [Accepted: 06/13/2021] [Indexed: 11/26/2022] Open
Abstract
Background The purpose of this paper is to describe the Automated Heart-Health Assessment (AH-HA) study protocol, which demonstrates an agile approach to cancer care delivery research. This study aims to assess the effect of a clinical decision support tool for cancer survivors on cardiovascular health (CVH) discussions, referrals, completed visits with primary care providers and cardiologists, and control of modifiable CVH factors and behaviors. The COVID-19 pandemic has caused widespread disruption to clinical trial accrual and operations. Studies conducted with potentially vulnerable populations, including cancer survivors, must shift towards virtual consent, data collection, and study visits to reduce risk for participants and study staff. Studies examining cancer care delivery innovations may also need to accommodate the increased use of virtual visits. Methods/design This group-randomized, mixed methods study will recruit 600 cancer survivors from 12 National Cancer Institute Community Oncology Research Program (NCORP) practices. Survivors at intervention sites will use the AH-HA tool with their oncology provider; survivors at usual care sites will complete routine survivorship visits. Outcomes will be measured immediately after the study visit, with follow-up at 6 and 12 months. The study was amended during the COVID-19 pandemic to allow for virtual consent, data collection, and intervention options, with the goal of minimizing participant-staff in-person contact and accommodating virtual survivorship visits. Conclusions Changes to the study protocol and procedures allow important cancer care delivery research to continue safely during the COVID-19 pandemic and give sites and survivors flexibility to conduct study activities in-person or remotely. We present a protocol to examine the effectiveness of an electronic health record (EHR)-embedded CVH assessment tool for cancer survivors. The protocol was adapted to include virtual data collection and study visits to continue in the COVID-19 era. Flexibility to conduct study activities in-person or remotely supports accrual during the COVID-19 pandemic and beyond.
Collapse
|
15
|
Pilot Investigation of SARS-CoV-2 Secondary Transmission in Kindergarten Through Grade 12 Schools Implementing Mitigation Strategies - St. Louis County and City of Springfield, Missouri, December 2020. MMWR-MORBIDITY AND MORTALITY WEEKLY REPORT 2021; 70:449-455. [PMID: 33764961 PMCID: PMC7993558 DOI: 10.15585/mmwr.mm7012e4] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
16
|
Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open 2020; 3:557-566. [PMID: 33623891 PMCID: PMC7886551 DOI: 10.1093/jamiaopen/ooaa060] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/14/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
Collapse
|
17
|
When past is not a prologue: Adapting informatics practice during a pandemic. J Am Med Inform Assoc 2020; 27:1142-1146. [PMID: 32333757 PMCID: PMC7188126 DOI: 10.1093/jamia/ocaa073] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Accepted: 04/21/2020] [Indexed: 11/21/2022] Open
Abstract
Data and information technology are key to every aspect of our response to the current coronavirus disease 2019 (COVID-19) pandemic—including the diagnosis of patients and delivery of care, the development of predictive models of disease spread, and the management of personnel and equipment. The increasing engagement of informaticians at the forefront of these efforts has been a fundamental shift, from an academic to an operational role. However, the past history of informatics as a scientific domain and an area of applied practice provides little guidance or prologue for the incredible challenges that we are now tasked with performing. Building on our recent experiences, we present 4 critical lessons learned that have helped shape our scalable, data-driven response to COVID-19. We describe each of these lessons within the context of specific solutions and strategies we applied in addressing the challenges that we faced.
Collapse
|
18
|
Transmission dynamics: Data sharing in the COVID-19 era. Learn Health Syst 2020; 5:e10235. [PMID: 32838037 PMCID: PMC7323052 DOI: 10.1002/lrh2.10235] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 06/10/2020] [Accepted: 06/11/2020] [Indexed: 11/16/2022] Open
Abstract
Problem The current coronavirus disease 2019 (COVID‐19) pandemic underscores the need for building and sustaining public health data infrastructure to support a rapid local, regional, national, and international response. Despite a historical context of public health crises, data sharing agreements and transactional standards do not uniformly exist between institutions which hamper a foundational infrastructure to meet data sharing and integration needs for the advancement of public health. Approach There is a growing need to apply population health knowledge with technological solutions to data transfer, integration, and reasoning, to improve health in a broader learning health system ecosystem. To achieve this, data must be combined from healthcare provider organizations, public health departments, and other settings. Public health entities are in a unique position to consume these data, however, most do not yet have the infrastructure required to integrate data sources and apply computable knowledge to combat this pandemic. Outcomes Herein, we describe lessons learned and a framework to address these needs, which focus on: (a) identifying and filling technology “gaps”; (b) pursuing collaborative design of data sharing requirements and transmission mechanisms; (c) facilitating cross‐domain discussions involving legal and research compliance; and (d) establishing or participating in multi‐institutional convening or coordinating activities. Next steps While by no means a comprehensive evaluation of such issues, we envision that many of our experiences are universal. We hope those elucidated can serve as the catalyst for a robust community‐wide dialogue on what steps can and should be taken to ensure that our regional and national health care systems can truly learn, in a rapid manner, so as to respond to this and future emergent public health crises.
Collapse
|
19
|
Novel Visualization of Clostridium difficile Infections in Intensive Care Units. ACI OPEN 2019; 3:e71-e77. [PMID: 33598637 DOI: 10.1055/s-0039-1693651] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
BACKGROUND Accurate and timely surveillance and diagnosis of healthcare-facility onset Clostridium difficile infection (HO-CDI) is vital to controlling infections within the hospital, but there are limited tools to assist with timely outbreak investigations. OBJECTIVES To integrate spatiotemporal factors with HO-CDI cases and develop a map-based dashboard to support infection preventionists (IPs) in performing surveillance and outbreak investigations for HO-CDI. METHODS Clinical laboratory results and Admit-Transfer-Discharge data for admitted patients over two years were extracted from the Information Warehouse of a large academic medical center and processed according to Center for Disease Control (CDC) National Healthcare Safety Network (NHSN) definitions to classify Clostridium difficile infection (CDI) cases by onset date. Results were validated against the internal infection surveillance database maintained by IPs in Clinical Epidemiology of this Academic Medical Center (AMC). Hospital floor plans were combined with HO-CDI case data, to create a dashboard of intensive care units. Usability testing was performed with a think-aloud session and a survey. RESULTS The simple classification algorithm identified all 265 HO-CDI cases from 1/1/15-11/30/15 with a positive predictive value (PPV) of 96.3%. When applied to data from 2014, the PPV was 94.6% All users "strongly agreed" that the dashboard would be a positive addition to Clinical Epidemiology and would enable them to present Hospital Acquired Infection (HAI) information to others more efficiently. CONCLUSIONS The CDI dashboard demonstrates the feasibility of mapping clinical data to hospital patient care units for more efficient surveillance and potential outbreak investigations.
Collapse
|
20
|
Active Use of Electronic Health Records (EHRs) and Personal Health Records (PHRs) for Epidemiologic Research: Sample Representativeness and Nonresponse Bias in a Study of Women During Pregnancy. ACTA ACUST UNITED AC 2017; 5:1263. [PMID: 28303255 PMCID: PMC5340503 DOI: 10.13063/2327-9214.1263] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Introduction: With the growing use of electronic medical records, electronic health records (EHRs), and personal health records (PHRs) for health care delivery, new opportunities have arisen for population health researchers. Our objective was to characterize PHR users and examine sample representativeness and nonresponse bias in a study of pregnant women recruited via the PHR. Design: Demographic characteristics were examined for PHR users and nonusers. Enrolled study participants (responders, n=187) were then compared with nonresponders and a representative sample of the target population. Results: PHR patient portal users (34 percent of eligible persons) were older and more likely to be White, have private health insurance, and develop gestational diabetes than nonusers. Of eligible persons (all PHR users), 11 percent (187/1,713) completed a self-administered PHR based questionnaire. Participants in the research study were more likely to be non-Hispanic White (90 percent versus 79 percent) and married (85 percent versus 77 percent), and were less likely to be Non-Hispanic Black (3 percent versus 12 percent) or Hispanic (3 percent versus 6 percent). Responders and nonresponders were similar regarding age distribution, employment status, and health insurance status. Demographic characteristics were similar between responders and nonresponders. Discussion: Demographic characteristics of the study population differed from the general population, consistent with patterns seen in traditional population-based studies. The PHR may be an efficient method for recruiting and conducting observational research with additional benefits of efficiency and cost-cost-effectiveness.
Collapse
|
21
|
Automatic data source identification for clinical trial eligibility criteria resolution. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:1149-1158. [PMID: 28269912 PMCID: PMC5333255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Clinical trial coordinators refer to both structured and unstructured sources of data when evaluating a subject for eligibility. While some eligibility criteria can be resolved using structured data, some require manual review of clinical notes. An important step in automating the trial screening process is to be able to identify the right data source for resolving each criterion. In this work, we discuss the creation of an eligibility criteria dataset for clinical trials for patients with two disparate diseases, annotated with the preferred data source for each criterion (i.e., structured or unstructured) by annotators with medical training. The dataset includes 50 heart-failure trials with a total of 766 eligibility criteria and 50 trials for chronic lymphocytic leukemia (CLL) with 677 criteria. Further, we developed machine learning models to predict the preferred data source: kernel methods outperform simpler learning models when used with a combination of lexical, syntactic, semantic, and surface features. Evaluation of these models indicates that the performance is consistent across data from both diagnoses, indicating generalizability of our method. Our findings are an important step towards ongoing efforts for automation of clinical trial screening.
Collapse
|
22
|
A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2016; 2016:88-97. [PMID: 27570656 PMCID: PMC5001746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Sentence boundary detection (SBD) is a critical preprocessing task for many natural language processing (NLP) applications. However, there has been little work on evaluating how well existing methods for SBD perform in the clinical domain. We evaluate five popular off-the-shelf NLP toolkits on the task of SBD in various kinds of text using a diverse set of corpora, including the GENIA corpus of biomedical abstracts, a corpus of clinical notes used in the 2010 i2b2 shared task, and two general-domain corpora (the British National Corpus and Switchboard). We find that, with the exception of the cTAKES system, the toolkits we evaluate perform noticeably worse on clinical text than on general-domain text. We identify and discuss major classes of errors, and suggest directions for future work to improve SBD methods in the clinical domain. We also make the code used for SBD evaluation in this paper available for download at http://github.com/drgriffis/SBD-Evaluation.
Collapse
|
23
|
Electronic health record-based assessment of cardiovascular health: The stroke prevention in healthcare delivery environments (SPHERE) study. Prev Med Rep 2016; 4:303-8. [PMID: 27486559 PMCID: PMC4959947 DOI: 10.1016/j.pmedr.2016.07.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 06/21/2016] [Accepted: 07/08/2016] [Indexed: 12/30/2022] Open
Abstract
< 3% of Americans have ideal cardiovascular health (CVH). The primary care encounter provides a setting in which to conduct patient-provider discussions of CVH. We implemented a CVH risk assessment, visualization, and decision-making tool that automatically populates with electronic health record (EHR) data during the encounter in order to encourage patient-centered CVH discussions among at-risk, yet under-treated, populations. We quantified five of the seven CVH behaviors and factors that were available in The Ohio State University Wexner Medical Center's EHR at baseline (May–July 2013) and compared values to those ascertained at one-year (May–July 2014) among intervention (n = 109) and control (n = 42) patients. The CVH of women in the intervention clinic improved relative to the metrics of body mass index (16% to 21% ideal) and diabetes (62% to 68% ideal), but not for smoking, total cholesterol, or blood pressure. Meanwhile, the CVH of women in the control clinic either held constant or worsened slightly as measured using those same metrics. Providers need easy-to-use tools at the point-of-care to help patients improve CVH. We demonstrated that the EHR could deliver such a tool using an existing American Heart Association framework, and we noted small improvements in CVH in our patient population. Future work is needed to assess how to best harness the potential of such tools in order to have the greatest impact on the CVH of a larger patient population. Use and adoption of health information technology advances quality in patient care. Healthcare systems need tools to enhance primary prevention at the point-of-care. Providers and patients have shared accountability for population health metrics.
Collapse
Key Words
- 95% CI, 95% confidence interval
- ACC, American College of Cardiology
- AHA, American Heart Association
- CDS, clinical decision support
- CVH, cardiovascular health
- Disease management
- EHR, electronic health record
- GEE, generalized estimation equation
- Health outcomes
- Medical informatics
- OSUWMC, Ohio State University Wexner Medical Center
- Prevention
- Primary care
- SD, standard deviation
- SPHERE, stroke prevention in healthcare delivery environments
Collapse
|
24
|
The geographic distribution of cardiovascular health in the stroke prevention in healthcare delivery environments (SPHERE) study. J Biomed Inform 2016; 60:95-103. [PMID: 26828957 DOI: 10.1016/j.jbi.2016.01.013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Revised: 01/20/2016] [Accepted: 01/22/2016] [Indexed: 12/25/2022]
Abstract
BACKGROUND Community-level factors have been clearly linked to health outcomes, but are challenging to incorporate into medical practice. Increasing use of electronic health records (EHRs) makes patient-level data available for researchers in a systematic and accessible way, but these data remain siloed from community-level data relevant to health. PURPOSE This study sought to link community and EHR data from an older female patient cohort participating in an ongoing intervention at the Ohio State University Wexner Medical Center to associate community-level data with patient-level cardiovascular health (CVH) as well as to assess the utility of this EHR integration methodology. MATERIALS AND METHODS CVH was characterized among patients using available EHR data collected May through July of 2013. EHR data for 153 patients were linked to United States census-tract level data to explore feasibility and insights gained from combining these disparate data sources. Analyses were conducted in 2014. RESULTS Using the linked data, weekly per capita expenditure on fruits and vegetables was found to be significantly associated with CVH at the p<0.05 level and three other community-level attributes (median income, average household size, and unemployment rate) were associated with CVH at the p<0.10 level. CONCLUSIONS This work paves the way for future integration of community and EHR-based data into patient care as a novel methodology to gain insight into multi-level factors that affect CVH and other health outcomes. Further, our findings demonstrate the specific architectural and functional challenges associated with integrating decision support technologies and geographic information to support tailored and patient-centered decision making therein.
Collapse
|
25
|
Textual inference for eligibility criteria resolution in clinical trials. J Biomed Inform 2015; 58 Suppl:S211-S218. [PMID: 26376462 DOI: 10.1016/j.jbi.2015.09.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 09/02/2015] [Accepted: 09/04/2015] [Indexed: 10/23/2022]
Abstract
Clinical trials are essential for determining whether new interventions are effective. In order to determine the eligibility of patients to enroll into these trials, clinical trial coordinators often perform a manual review of clinical notes in the electronic health record of patients. This is a very time-consuming and exhausting task. Efforts in this process can be expedited if these coordinators are directed toward specific parts of the text that are relevant for eligibility determination. In this study, we describe the creation of a dataset that can be used to evaluate automated methods capable of identifying sentences in a note that are relevant for screening a patient's eligibility in clinical trials. Using this dataset, we also present results for four simple methods in natural language processing that can be used to automate this task. We found that this is a challenging task (maximum F-score=26.25), but it is a promising direction for further research.
Collapse
|
26
|
Comparison of UMLS terminologies to identify risk of heart disease using clinical notes. J Biomed Inform 2015; 58 Suppl:S103-S110. [PMID: 26375493 DOI: 10.1016/j.jbi.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2015] [Revised: 08/23/2015] [Accepted: 08/25/2015] [Indexed: 10/23/2022]
Abstract
The second track of the 2014 i2b2 challenge asked participants to automatically identify risk factors for heart disease among diabetic patients using natural language processing techniques for clinical notes. This paper describes a rule-based system developed using a combination of regular expressions, concepts from the Unified Medical Language System (UMLS), and freely-available resources from the community. With a performance (F1=90.7) that is significantly higher than the median (F1=87.20) and close to the top performing system (F1=92.8), it was the best rule-based system of all the submissions in the challenge. We also used this system to evaluate the utility of different terminologies in the UMLS towards the challenge task. Of the 155 terminologies in the UMLS, 129 (76.78%) have no representation in the corpus. The Consumer Health Vocabulary had very good coverage of relevant concepts and was the most useful terminology for the challenge task. While segmenting notes into sections and lists has a significant impact on the performance, identifying negations and experiencer of the medical event results in negligible gain.
Collapse
|
27
|
Abstract
BACKGROUND Electronic health records (EHRs) have the potential to enhance patient-provider communication and improve patient outcomes. However, in order to impact patient care, clinical decision support (CDS) and communication tools targeting such needs must be integrated into clinical workflow and be flexible with regard to the changing health care landscape. DESIGN The Stroke Prevention in Healthcare Delivery Environments (SPHERE) team developed and implemented the SPHERE tool, an EHR-based CDS visualization, to enhance patient-provider communication around cardiovascular health (CVH) within an outpatient primary care setting of a large academic medical center. IMPLEMENTATION We describe our successful CDS alert implementation strategy and report adoption rates. We also present results of a provider satisfaction survey showing that the SPHERE tool delivers appropriate content in a timely manner. Patient outcomes following implementation of the tool indicate one-year improvements in some CVH metrics, such as body mass index and diabetes. DISCUSSION Clinical decision-making and practices change rapidly and in parallel to simultaneous changes in the health care landscape and EHR usage. Based on these observations and our preliminary results, we have found that an integrated, extensible, and workflow-aware CDS tool is critical to enhancing patient-provider communications and influencing patient outcomes.
Collapse
|
28
|
Assessment of Life's Simple 7™ in the primary care setting: The Stroke Prevention in Healthcare Delivery EnviRonmEnts (SPHERE) study. Contemp Clin Trials 2014; 38:182-9. [DOI: 10.1016/j.cct.2014.03.007] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Revised: 03/24/2014] [Accepted: 03/27/2014] [Indexed: 11/26/2022]
|
29
|
Effectively processing medical term queries on the UMLS Metathesaurus by layered dynamic programming. BMC Med Genomics 2014; 7 Suppl 1:S11. [PMID: 25079259 PMCID: PMC4101532 DOI: 10.1186/1755-8794-7-s1-s11] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Background Mapping medical terms to standardized UMLS concepts is a basic step for leveraging biomedical texts in data management and analysis. However, available methods and tools have major limitations in handling queries over the UMLS Metathesaurus that contain inaccurate query terms, which frequently appear in real world applications. Methods To provide a practical solution for this task, we propose a layered dynamic programming mapping (LDPMap) approach, which can efficiently handle these queries. LDPMap uses indexing and two layers of dynamic programming techniques to efficiently map a biomedical term to a UMLS concept. Results Our empirical study shows that LDPMap achieves much faster query speeds than LCS. In comparison to the UMLS Metathesaurus Browser and MetaMap, LDPMap is much more effective in querying the UMLS Metathesaurus for inaccurately spelled medical terms, long medical terms, and medical terms with special characters. Conclusions These results demonstrate that LDPMap is an efficient and effective method for mapping medical terms to the UMLS Metathesaurus.
Collapse
|
30
|
How essential are unstructured clinical narratives and information fusion to clinical trial recruitment? AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014; 2014:218-23. [PMID: 25717416 PMCID: PMC4333685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/01/2022]
Abstract
Electronic health records capture patient information using structured controlled vocabularies and unstructured narrative text. While structured data typically encodes lab values, encounters and medication lists, unstructured data captures the physician's interpretation of the patient's condition, prognosis, and response to therapeutic intervention. In this paper, we demonstrate that information extraction from unstructured clinical narratives is essential to most clinical applications. We perform an empirical study to validate the argument and show that structured data alone is insufficient in resolving eligibility criteria for recruiting patients onto clinical trials for chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is essential to solving 59% of the CLL trial criteria and 77% of the prostate cancer trial criteria. More specifically, for resolving eligibility criteria with temporal constraints, we show the need for temporal reasoning and information integration with medical events within and across unstructured clinical narratives and structured data.
Collapse
|
31
|
Abstract
Objective To summarize literature describing approaches aimed at automatically identifying patients with a common phenotype. Materials and methods We performed a review of studies describing systems or reporting techniques developed for identifying cohorts of patients with specific phenotypes. Every full text article published in (1) Journal of American Medical Informatics Association, (2) Journal of Biomedical Informatics, (3) Proceedings of the Annual American Medical Informatics Association Symposium, and (4) Proceedings of Clinical Research Informatics Conference within the past 3 years was assessed for inclusion in the review. Only articles using automated techniques were included. Results Ninety-seven articles met our inclusion criteria. Forty-six used natural language processing (NLP)-based techniques, 24 described rule-based systems, 41 used statistical analyses, data mining, or machine learning techniques, while 22 described hybrid systems. Nine articles described the architecture of large-scale systems developed for determining cohort eligibility of patients. Discussion We observe that there is a rise in the number of studies associated with cohort identification using electronic medical records. Statistical analyses or machine learning, followed by NLP techniques, are gaining popularity over the years in comparison with rule-based systems. Conclusions There are a variety of approaches for classifying patients into a particular phenotype. Different techniques and data sources are used, and good performance is reported on datasets at respective institutions. However, no system makes comprehensive use of electronic medical records addressing all of their known weaknesses.
Collapse
|
32
|
Inter-annotator reliability of medical events, coreferences and temporal relations in clinical narratives by annotators with varying levels of clinical expertise. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:1366-1374. [PMID: 23304416 PMCID: PMC3540452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
The manual annotation of clinical narratives is an important step for training and validating the performance of automated systems that utilize these clinical narratives. We build an annotation specification to capture medical events, and coreferences and temporal relations between medical events in clinical text. Unfortunately, the process of clinical data annotation is both time consuming and costly. Many annotation efforts have used physicians to annotate the data. We investigate using annotators that are current students or graduates from diverse clinical backgrounds with varying levels of clinical experience. In spite of this diversity, the annotation agreement across our team of annotators is high; the average inter-annotator kappa statistic for medical events, coreferences, temporal relations, and medical event concept unique identifiers was 0.843, 0.859, 0.833, and 0.806, respectively. We describe methods towards leveraging the annotations to support temporal reasoning with medical events.
Collapse
|
33
|
Time Capture Tool (TimeCaT): development of a comprehensive application to support data capture for Time Motion Studies. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:596-605. [PMID: 23304332 PMCID: PMC3540552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Time Motion Studies (TMS) have proved to be the gold standard method to measure and quantify clinical workflow, and have been widely used to assess the impact of health information systems implementation. Although there are tools available to conduct TMS, they provide different approaches for multitasking, interruptions, inter-observer reliability assessment and task taxonomy, making results across studies not comparable. We postulate that a significant contributing factor towards the standardization and spread of TMS would be the availability and spread of an accessible, scalable and dynamic tool. We present the development of a comprehensive Time Capture Tool (TimeCaT): a web application developed to support data capture for TMS. Ongoing and continuous development of TimeCaT includes the development and validation of a realistic inter-observer reliability scoring algorithm, the creation of an online clinical tasks ontology, and a novel quantitative workflow comparison method.
Collapse
|
34
|
Applying knowledge-anchored hypothesis discovery methods to advance clinical and translational research: the OAMiner project. J Am Med Inform Assoc 2012; 19:1110-4. [DOI: 10.1136/amiajnl-2011-000736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
|
35
|
Proof of concept for the role of glycemic control in the early detection of infections in diabetics. Health Informatics J 2012; 18:26-35. [PMID: 22447875 DOI: 10.1177/1460458211428427] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The relationship of infections and glycemic control in diabetes has been previously investigated but no solid findings have been described. Meanwhile, the detection of any infection at the early stages of disease progression, i.e. during the incubation period, is critical. In order to study this topic, we used the infection evidence and the daily glycemic control data of 248 type-2 diabetics who participated in a large telemedicine study. The results showed that morning blood glucose was significantly elevated and that diabetics performed the measurements at a later time when infected. A simple model for predicting the occurrence of infection based on the glycemic control variables showed good performance (sensitivity: 56%, specificity: 92%). A set of variables that synthesize a diabetic's profile could be included in a dedicated model and facilitate the early detection of infections; other aspects, such as continuous self-monitoring and personalized medical records, should be examined in this direction.
Collapse
|
36
|
Extracting temporal constraints from clinical research eligibility criteria using conditional random fields. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:843-852. [PMID: 22195142 PMCID: PMC3243135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Temporal constraints are present in 38% of clinical research eligibility criteria and are crucial for screening patients. However, eligibility criteria are often written as free text, which is not amenable for computer processing. In this paper, we present an ontology-based approach to extracting temporal information from clinical research eligibility criteria. We generated temporal labels using a frame-based temporal ontology. We manually annotated 150 free-text eligibility criteria using the temporal labels and trained a parser using Conditional Random Fields (CRFs) to automatically extract temporal expressions from eligibility criteria. An evaluation of an additional 60 randomly selected eligibility criteria using manual review achieved an overall precision of 83%, a recall of 79%, and an F-score of 80%. We illustrate the application of temporal extraction with the use cases of question answering and free-text criteria querying.
Collapse
|
37
|
Similarity-based disease risk assessment for personal genomes: proof of concept. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:1524-1531. [PMID: 22195217 PMCID: PMC3243222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The increasing availability of personal genome data has led to escalating needs by consumers to understand the implications of their gene sequences. At present, poorly integrated genetic knowledge has not met these needs. This proof-of-concept study proposes a similarity-based approach to assess the disease risk predisposition for personal genomes. We hypothesize that the semantic similarity between a personal genome and a disease can indicate the disease risks in the person. We developed a knowledge network that integrates existing knowledge of genes, diseases, and symptoms from six sources using the Semantic Web standard, Resource Description Framework (RDF). We then used latent relationships between genes and diseases derived from our knowledge network to measure the semantic similarity between a personal genome and a genetic disease. For demonstration, we showed the feasibility of assessing the disease risks in one personal genome and discussed related methodology issues.
Collapse
|
38
|
How Human Factors Can Influence the Elderly in the Use of Telemedicine. Telemed J E Health 2010; 16:860-6. [DOI: 10.1089/tmj.2010.9948] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
39
|
Abstract
The objective of this evaluation was to determine the effect of redesigning the Informatics for Diabetes Education and Telemedicine (IDEATel) telemedicine architecture on the average upload delay and on the average number of glucose uploads to a central database. These two measures positively influence our ability to deliver timely and accurate patient care to the study population. The redesign was also undertaken to improve the patients' experience in using the system and thereby increase the frequency and timeliness of their self-monitoring behavior. Using the total number of glucose uploads, we compared the delay in glucose upload times according to the type of home telemedicine unit the study participants used and the region where the participants lived. The participants were Medicare beneficiaries with diabetes living in medically underserved neighborhoods in New York City and rural Upstate New York. The populations in these two regions differed considerably in terms of ethnicity, language spoken (Spanish, English), and education level. Participants who had Generation 2 (Gen 2) (mean = 10.75, SD +/- 7.96) home telemedicine units had significantly shorter upload delay times (p < 0.001) as measured in days than those participants with Generation 1 (Gen 1) (mean = 22.44, SD +/- 11.18) and those who were upgraded from Gen 1 (mean = 20.67, SD +/- 8.85) to Gen 2 (mean = 14.93, SD +/- 9.37). Additionally, the delay was significantly shorter for participants living upstate (mean = 24.14 days, SD +/- 11.95 days) than downstate (mean = 15.30 days, SD +/- 7.87 days), t (975) = 13.98, p < 0.01. The system redesign made a significant impact in reducing glucose upload delays of IDEATel participants. However, upload delays were significantly impacted by the region where the participants resided.
Collapse
|
40
|
Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc 2009; 16:354-61. [PMID: 19261941 PMCID: PMC2732227 DOI: 10.1197/jamia.m2922] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Accepted: 01/30/2009] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To assess the performance of electronic health record data for syndromic surveillance and to assess the feasibility of broadly distributed surveillance. DESIGN Two systems were developed to identify influenza-like illness and gastrointestinal infectious disease in ambulatory electronic health record data from a network of community health centers. The first system used queries on structured data and was designed for this specific electronic health record. The second used natural language processing of narrative data, but its queries were developed independently from this health record. Both were compared to influenza isolates and to a verified emergency department chief complaint surveillance system. MEASUREMENTS Lagged cross-correlation and graphs of the three time series. RESULTS For influenza-like illness, both the structured and narrative data correlated well with the influenza isolates and with the emergency department data, achieving cross-correlations of 0.89 (structured) and 0.84 (narrative) for isolates and 0.93 and 0.89 for emergency department data, and having similar peaks during influenza season. For gastrointestinal infectious disease, the structured data correlated fairly well with the emergency department data (0.81) with a similar peak, but the narrative data correlated less well (0.47). CONCLUSIONS It is feasible to use electronic health records for syndromic surveillance. The structured data performed best but required knowledge engineering to match the health record data to the queries. The narrative data illustrated the potential performance of a broadly disseminated system and achieved mixed results.
Collapse
|
41
|
A randomized trial comparing telemedicine case management with usual care in older, ethnically diverse, medically underserved patients with diabetes mellitus: 5 year results of the IDEATel study. J Am Med Inform Assoc 2009; 16:446-56. [PMID: 19390093 DOI: 10.1197/jamia.m3157] [Citation(s) in RCA: 229] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
CONTEXT Telemedicine is a promising but largely unproven technology for providing case management services to patients with chronic conditions and lower access to care. OBJECTIVES To examine the effectiveness of a telemedicine intervention to achieve clinical management goals in older, ethnically diverse, medically underserved patients with diabetes. DESIGN, Setting, and Patients A randomized controlled trial was conducted, comparing telemedicine case management to usual care, with blinded outcome evaluation, in 1,665 Medicare recipients with diabetes, aged >/= 55 years, residing in federally designated medically underserved areas of New York State. Interventions Home telemedicine unit with nurse case management versus usual care. Main Outcome Measures The primary endpoints assessed over 5 years of follow-up were hemoglobin A1c (HgbA1c), low density lipoprotein (LDL) cholesterol, and blood pressure levels. RESULTS Intention-to-treat mixed models showed that telemedicine achieved net overall reductions over five years of follow-up in the primary endpoints (HgbA1c, p = 0.001; LDL, p < 0.001; systolic and diastolic blood pressure, p = 0.024; p < 0.001). Estimated differences (95% CI) in year 5 were 0.29 (0.12, 0.46)% for HgbA1c, 3.84 (-0.08, 7.77) mg/dL for LDL cholesterol, and 4.32 (1.93, 6.72) mm Hg for systolic and 2.64 (1.53, 3.74) mm Hg for diastolic blood pressure. There were 176 deaths in the intervention group and 169 in the usual care group (hazard ratio 1.01 [0.82, 1.24]). CONCLUSIONS Telemedicine case management resulted in net improvements in HgbA1c, LDL-cholesterol and blood pressure levels over 5 years in medically underserved Medicare beneficiaries. Mortality was not different between the groups, although power was limited. Trial Registration http://clinicaltrials.gov Identifier: NCT00271739.
Collapse
|
42
|
Fuzzy temporal constraint networks for clinical information. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:374-378. [PMID: 18999106 PMCID: PMC2655952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/15/2008] [Indexed: 05/27/2023]
Abstract
Modeling the temporal information in the medical record is an important area of research. This paper describes an extension of TimeText, a temporal reasoning system designed to represent, extract, and reason about temporal information in clinical text, to include the use of fuzzy temporal constraints. The addition of fuzzy temporal constraints increases TimeTexts ability to handle uncertainty in temporal relations. We use a three-state, staircase possibility distribution function in conjunction with earlier methods of finding solutions to fuzzy temporal constraint networks. We perform analysis to determine the complexity of using this staircase in conjunction with finding solutions to fuzzy temporal constraint satisfaction problems and show that these solutions can be efficiently computed in O(n3).
Collapse
|
43
|
Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc 2008; 16:37-9. [PMID: 18952938 DOI: 10.1197/jamia.m2862] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Electronic clinical documentation can be useful for activities such as public health surveillance, quality improvement, and research, but existing methods of de-identification may not provide sufficient protection of patient data. The general-purpose natural language processor MedLEE retains medical concepts while excluding the remaining text so, in addition to processing text into structured data, it may be able provide a secondary benefit of de-identification. Without modifying the system, the authors tested the ability of MedLEE to remove protected health information (PHI) by comparing 100 outpatient clinical notes with the corresponding XML-tagged output. Of 809 instances of PHI, 26 (3.2%) were detected in output as a result of processing and identification errors. However, PHI in the output was highly transformed, much appearing as normalized terms for medical concepts, potentially making re-identification more difficult. The MedLEE processor may be a good enhancement to other de-identification systems, both removing PHI and providing coded data from clinical text.
Collapse
|
44
|
Abstract
The objective of the study was to develop and implement an architecture for remote training that can be used in the narrowband home telemedicine environment. A remote training architecture, the REmote Patient Education in a Telemedicine Environment (REPETE) architecture, using a remote control protocol (RCP) was developed. A set of design criteria was specified. The developed architecture was integrated into the IDEATel home telemedicine unit (HTU) and evaluated against these design criteria using a combination of technical and expert evaluations. Technical evaluation of the architecture demonstrated that remote cursor movements and positioning displayed on the HTU were smooth and effectively real-time. The trainers were able to observe within approximately 2 seconds lag what the patient sees on their HTU screen. Evaluation of the architecture by experts was favorable. Responses to a Likert scale questionnaire regarding audio quality and remote control performance indicated that the expert evaluators thought that the audio quality and remote control performance were adequate for remote training. All evaluators strongly agreed that the system would be useful for training patients. The REPETE architecture supports basic training needs over a narrowband dial-up connection. We were able to maintain an audio chat simultaneously with performing a remote training session, while maintaining both acceptable audio quality and remote control performance. The RCP provides a mechanism to provide training without requiring a trainer to go to the patient's home and effectively supports deictic referencing to on screen objects.
Collapse
|
45
|
REPETE2: A next generation home telemedicine architecture. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007:1020. [PMID: 18694118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
As the availability of home broadband increases, there is an increasing need for a broadband-based home telemedicine architecture. A home tele-medicine architecture supporting broadband and remote training is presented.
Collapse
|
46
|
Training digital divide seniors to use a telehealth system: a remote training approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2006; 2006:459-63. [PMID: 17238383 PMCID: PMC1839396] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
As the use of health information technologies continues to proliferate amongst seniors, many of whom lack computer experience, there is a need to develop effective training approaches to foster basic competencies. This paper describes the REmote Patient Education in a Telemedicine Environment (REPETE) system, a component of the IDEATel telemedicine architecture. The REPETE architecture supports simultaneous visual and audio teaching modes over low bandwidth connections. This paper presents an in-depth qualitative analysis of two patients being trained to use the IDEATel patient web portal. The results indicate that this method of instruction was useful in facilitating patients' use of the web application. However, the observations suggest that there is learning curve for the trainer to use the resources effectively to establish common ground and foster competencies in the patient.
Collapse
|
47
|
Architecture for remote training of home telemedicine patients. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2005; 2005:1015. [PMID: 16779302 PMCID: PMC1560648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
In spite of efforts to develop easy-to-use devices, patients may require multiple training sessions to achieve mastery of advanced telehealth devices, especially those incorporating web-access. In geographically-distributed projects, such repeat training can be costly. A software architecture for simultaneous voice conferencing and remote device control over a single telephone line is presented. Evaluation of the pilot implementation is favorable.
Collapse
|
48
|
Abstract
This article is intentionally broad in scope, as a result of a collaboration from the fields of primary care sports medicine, orthopedic surgery, and kinesiology. What has been borne out in the process is a true appreciation of the benefits of a multidisciplinary approach toward providing care for the young athlete with a physical disability. To name a few, joint involvement of parents, coaches, trainers, physical therapists, orthotists, prosthetists, wheelchair engineers, neurologists, physiatrists, nutritionists and most importantly, the athletes themselves, should be further encouraged because each discipline provides a unique perspective in the identification and management of health-related issues. It is the intent of this article to provide readers with at least some new insight that they can carry into their future practice.
Collapse
|
49
|
[Testings of compatibility between diverse active compounds and excipients for tablets (author's transl)]. JOURNAL DE PHARMACIE DE BELGIQUE 1979; 34:96-8. [PMID: 512824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|