1
|
Automatic Extraction of Skin and Soft Tissue Infection Status from Clinical Notes. Stud Health Technol Inform 2024; 310:579-583. [PMID: 38269875 DOI: 10.3233/shti231031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The reliable identification of skin and soft tissue infections (SSTIs) from electronic health records is important for a number of applications, including quality improvement, clinical guideline construction, and epidemiological analysis. However, in the United States, types of SSTIs (e.g. is the infection purulent or non-purulent?) are not captured reliably in structured clinical data. With this work, we trained and evaluated a rule-based clinical natural language processing system using 6,576 manually annotated clinical notes derived from the United States Veterans Health Administration (VA) with the goal of automatically extracting and classifying SSTI subtypes from clinical notes. The trained system achieved mention- and document-level performance metrics of the range 0.39 to 0.80 for mention level classification and 0.49 to 0.98 for document level classification.
Collapse
|
2
|
Designing an Interprofessional Online Course to Foster Learning Health Systems. Stud Health Technol Inform 2024; 310:1241-1245. [PMID: 38270013 DOI: 10.3233/shti231163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
The Learning Health Systems (LHS) framework demonstrates the potential for iterative interrogation of health data in real time and implementation of insights into practice. Yet, the lack of appropriately skilled workforce results in an inability to leverage existing data to design innovative solutions. We developed a tailored professional development program to foster a skilled workforce. The short course is wholly online, for interdisciplinary professionals working in the digital health arena. To transform healthcare systems, the workforce needs an understanding of LHS principles, data driven approaches, and the need for diversly skilled learning communities that can tackle these complex problems together.
Collapse
|
3
|
Analyzing the Spread of Informatics with PubMed. Stud Health Technol Inform 2024; 310:289-293. [PMID: 38269811 DOI: 10.3233/shti230973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
We analyzed PubMed citations since 1988 to explore the dissemination of medical/health informatics concepts between countries and across medical domains. We extracted countries from the PubMed author affiliation field to identify and analyze the top 10 informatics publishing countries. We found that the informatics publications are becoming more similar over time and that the rate of exchange across countries has increased with the introduction of e-publishing. Nonetheless, with the exception of machine learning, the impact of core informatics concepts on mainstream medicine and radiology publications remains small.
Collapse
|
4
|
Telehealth in Cystic Fibrosis. A systematic review incorporating a novel scoring system and expert weighting to identify a 'top 10 manuscripts' to inform future best practices implementation. J Cyst Fibros 2023; 22:598-606. [PMID: 37230808 PMCID: PMC10204901 DOI: 10.1016/j.jcf.2023.05.012] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/10/2023] [Accepted: 05/16/2023] [Indexed: 05/27/2023]
Abstract
The ongoing development and integration of telehealth within CF care has been accelerated in response to the Covid-19 pandemic, with many centres publishing their experiences. Now, as the restrictions of the pandemic ease, the use of telehealth appears to be waning, with many centres returning to routine traditional face-to-face services. For most, telehealth is not integrated into clinical care models, and there is a lack of guidance on how to integrate such a service into clinical care. The aims of this systematic review were to first identify manuscripts which may inform best CF telehealth practices, and second, to analyse these finding to determine how the CF community may use telehealth to improve care for patients, families, and Multidisciplinary Teams into the future. To achieve this, the PRISMA review methodology was utilised, in combination with a modified novel scoring system that consolidates expert weighting from key CF stakeholders, allowing for the manuscripts to be placed in a hierarchy in accordance with their scientific robustness. From the 39 found manuscripts, the top ten are presented and further analysed. The top ten manuscripts are exemplars of where telehealth is used effectively within CF care at this time, and demonstrate specific use cases of its potential best practices. However, there is a lack of guidance for implementation and clinical decision making, which remains an area for improvement. Thus, it is suggested that further work explores and provides guidance for standardised implementation into CF clinical practice.
Collapse
|
5
|
Virtual monitoring in CF - the importance of continuous monitoring in a multi-organ chronic condition. Front Digit Health 2023; 5:1196442. [PMID: 37214343 PMCID: PMC10192704 DOI: 10.3389/fdgth.2023.1196442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 04/17/2023] [Indexed: 05/24/2023] Open
Abstract
Cystic Fibrosis (CF) is a chronic life-limiting condition that affects multiple organs within the body. Patients must adhere to strict medication regimens, physiotherapy, diet, and attend regular clinic appointments to manage their condition effectively. This necessary but burdensome requirement has prompted investigations into how different digital health technologies can enhance current care by providing the opportunity to virtually monitor patients. This review explores how virtual monitoring has been harnessed for assessment or performance of physiotherapy/exercise, diet/nutrition, symptom monitoring, medication adherence, and wellbeing/mental-health in people with CF. This review will also briefly discuss the potential future of CF virtual monitoring and some common barriers to its current adoption and implementation within CF. Due to the multifaceted nature of CF, it is anticipated that this review will be relevant to not only the CF community, but also those investigating and developing digital health solutions for the management of other chronic diseases.
Collapse
|
6
|
Developing a deep learning natural language processing algorithm for automated reporting of adverse drug reactions. J Biomed Inform 2023; 137:104265. [PMID: 36464227 DOI: 10.1016/j.jbi.2022.104265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 11/01/2022] [Accepted: 11/29/2022] [Indexed: 12/03/2022]
Abstract
The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5%-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 1.1 million unlabelled clinical documents; secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. This model was compared to a version without the pre-training step, and a previously published RoBERTa model pretrained on MIMIC III, which has demonstrated strong performance on other pharmacovigilance tasks. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.955 (95% CI 0.933 - 0.978) for the task of identifying discharge summaries containing ADR mentions, significantly outperforming the two comparator models.
Collapse
|
7
|
Artificial Intelligence and Deep Learning for Rheumatologists: A Primer and Review of the Literature. Arthritis Rheumatol 2022; 74:1893-1905. [PMID: 35857865 PMCID: PMC10092842 DOI: 10.1002/art.42296] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 06/15/2022] [Accepted: 07/07/2022] [Indexed: 11/12/2022]
Abstract
Deep Learning has emerged as the leading method in machine learning, spawning a rapidly-growing field of academic research and commercial applications across medicine, and could have particular relevance to rheumatology if utilized correctly. The greatest benefits of deep learning methods are seen with the unstructured data frequently found in rheumatology, such as images and text, where traditional machine learning methods have struggled to unlock the trove of information held within these data formats. The basis for this success comes from the ability of deep learning to learn the structure of the underlying data. It is no surprise that the first areas of medicine that have started to experience impact from deep learning rely heavily on interpreting visual data, such as triaging radiology workflows and computer-assisted colonoscopy. Applications in rheumatology are beginning to emerge, with recent successes in areas as diverse as detecting joint erosions on plain radiography, predicting future rheumatoid arthritis disease activity and identifying the halo sign on temporal artery ultrasound. Given the important role deep learning methods are likely to play in the future of rheumatology, it is imperative that rheumatologists appreciate the methods and assumptions that underlie the deep learning algorithms in widespread use today, their limitations and the landscape of deep learning research that will inform algorithm development and clinical decision support tools of the future. The best applications of deep learning in rheumatology must be informed by the clinical experience of rheumatologists, so that algorithms can be developed to tackle the most relevant clinical problems.
Collapse
|
8
|
Information Extraction From Electronic Health Records to Predict Readmission Following Acute Myocardial Infarction: Does Natural Language Processing Using Clinical Notes Improve Prediction of Readmission? J Am Heart Assoc 2022; 11:e024198. [PMID: 35322668 PMCID: PMC9075435 DOI: 10.1161/jaha.121.024198] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Background Social risk factors influence rehospitalization rates yet are challenging to incorporate into prediction models. Integration of social risk factors using natural language processing (NLP) and machine learning could improve risk prediction of 30‐day readmission following an acute myocardial infarction. Methods and Results Patients were enrolled into derivation and validation cohorts. The derivation cohort included inpatient discharges from Vanderbilt University Medical Center between January 1, 2007, and December 31, 2016, with a primary diagnosis of acute myocardial infarction, who were discharged alive, and not transferred from another facility. The validation cohort included patients from Dartmouth‐Hitchcock Health Center between April 2, 2011, and December 31, 2016, meeting the same eligibility criteria described above. Data from both sites were linked to Centers for Medicare & Medicaid Services administrative data to supplement 30‐day hospital readmissions. Clinical notes from each cohort were extracted, and an NLP model was deployed, counting mentions of 7 social risk factors. Five machine learning models were run using clinical and NLP‐derived variables. Model discrimination and calibration were assessed, and receiver operating characteristic comparison analyses were performed. The 30‐day rehospitalization rates among the derivation (n=6165) and validation (n=4024) cohorts were 15.1% (n=934) and 10.2% (n=412), respectively. The derivation models demonstrated no statistical improvement in model performance with the addition of the selected NLP‐derived social risk factors. Conclusions Social risk factors extracted using NLP did not significantly improve 30‐day readmission prediction among hospitalized patients with acute myocardial infarction. Alternative methods are needed to capture social risk factors.
Collapse
|
9
|
WITHDRAWN: Designing a professional development online short course to foster Learning Healthcare Systems. Int J Med Inform 2022; 158:104666. [PMID: 34971917 DOI: 10.1016/j.ijmedinf.2021.104666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 12/13/2021] [Indexed: 11/16/2022]
Abstract
This article has been withdrawn: please see Elsevier Policy on Article Withdrawal (http://www.elsevier.com/locate/withdrawalpolicy). This article has been withdrawn at the request of the editor and publisher. The publisher regrets that an error occurred which led to the premature publication of this paper. This error bears no reflection on the article or its authors. The publisher apologizes to the authors and the readers for this unfortunate error.
Collapse
|
10
|
Establishing a multidisciplinary initiative for interoperable electronic health record innovations at an academic medical center. JAMIA Open 2021; 4:ooab041. [PMID: 34345802 PMCID: PMC8325485 DOI: 10.1093/jamiaopen/ooab041] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 02/18/2021] [Accepted: 05/04/2021] [Indexed: 12/02/2022] Open
Abstract
Objective To establish an enterprise initiative for improving health and health care through interoperable electronic health record (EHR) innovations. Materials and Methods We developed a unifying mission and vision, established multidisciplinary governance, and formulated a strategic plan. Key elements of our strategy include establishing a world-class team; creating shared infrastructure to support individual innovations; developing and implementing innovations with high anticipated impact and a clear path to adoption; incorporating best practices such as the use of Fast Healthcare Interoperability Resources (FHIR) and related interoperability standards; and maximizing synergies across research and operations and with partner organizations. Results University of Utah Health launched the ReImagine EHR initiative in 2016. Supportive infrastructure developed by the initiative include various FHIR-related tooling and a systematic evaluation framework. More than 10 EHR-integrated digital innovations have been implemented to support preventive care, shared decision-making, chronic disease management, and acute clinical care. Initial evaluations of these innovations have demonstrated positive impact on user satisfaction, provider efficiency, and compliance with evidence-based guidelines. Return on investment has included improvements in care; over $35 million in external grant funding; commercial opportunities; and increased ability to adapt to a changing healthcare landscape. Discussion Key lessons learned include the value of investing in digital innovation initiatives leveraging FHIR; the importance of supportive infrastructure for accelerating innovation; and the critical role of user-centered design, implementation science, and evaluation. Conclusion EHR-integrated digital innovation initiatives can be key assets for enhancing the EHR user experience, improving patient care, and reducing provider burnout.
Collapse
|
11
|
A Scoping Review and Content Analysis of Common Depressive Symptoms of Young People. J Sch Nurs 2021; 38:74-83. [PMID: 33944636 DOI: 10.1177/10598405211012680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
School nurses are the most accessible health care providers for many young people including adolescents and young adults. Early identification of depression results in improved outcomes, but little information is available comprehensively describing depressive symptoms specific to this population. The aim of this study was to develop a taxonomy of depressive symptoms that were manifested and described by young people based on a scoping review and content analysis. Twenty-five journal articles that included narrative descriptions of depressive symptoms in young people were included. A total of 60 depressive symptoms were identified and categorized into five dimensions: behavioral (n = 8), cognitive (n = 14), emotional (n = 15), interpersonal (n = 13), and somatic (n = 10). This comprehensive depression symptom taxonomy can help school nurses to identify young people who may experience depression and will support future research to better screen for depression.
Collapse
|
12
|
Comparative Effectiveness of Carotid Endarterectomy vs Initial Medical Therapy in Patients With Asymptomatic Carotid Stenosis. JAMA Neurol 2021; 77:1110-1121. [PMID: 32478802 PMCID: PMC7265126 DOI: 10.1001/jamaneurol.2020.1427] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Importance Carotid endarterectomy (CEA) among asymptomatic patients involves a trade-off between a higher short-term perioperative risk in exchange for a lower long-term risk of stroke. The clinical benefit observed in randomized clinical trials (RCTs) may not extend to real-world practice. Objective To examine whether early intervention (CEA) was superior to initial medical therapy in real-world practice in preventing fatal and nonfatal strokes among patients with asymptomatic carotid stenosis. Design, Setting, and Participants This comparative effectiveness study was conducted from August 28, 2018, to March 2, 2020, using the Corporate Data Warehouse, Suicide Data Repository, and other databases of the US Department of Veterans Affairs. Data analyzed were those of veterans of the US Armed Forces aged 65 years or older who received carotid imaging between January 1, 2005, and December 31, 2009. Patients without a carotid imaging report, those with carotid stenosis of less than 50% or hemodynamically insignificant stenosis, and those with a history of stroke or transient ischemic attack in the 6 months before index imaging were excluded. A cohort of patients who received initial medical therapy and a cohort of similar patients who received CEA were constructed and followed up for 5 years. The target trial method was used to compute weighted Kaplan-Meier curves and estimate the risk of fatal and nonfatal strokes in each cohort in the pragmatic sample across 5 years of follow-up. This analysis was repeated after restricting the sample to patients who met RCT inclusion criteria. Cumulative incidence functions for fatal and nonfatal strokes were estimated, accounting for nonstroke deaths as competing risks in both the pragmatic and RCT-like samples. Exposures Receipt of CEA vs initial medical therapy. Main Outcomes and Measures Fatal and nonfatal strokes. Results Of the total 5221 patients, 2712 (51.9%; mean [SD] age, 73.6 [6.0] years; 2678 men [98.8%]) received CEA and 2509 (48.1%; mean [SD] age, 73.6 [6.0] years; 2479 men [98.8%]) received initial medical therapy within 1 year after the index carotid imaging. The observed rate of stroke or death (perioperative complications) within 30 days in the CEA cohort was 2.5% (95% CI, 2.0%-3.1%). The 5-year risk of fatal and nonfatal strokes was lower among patients randomized to CEA compared with patients randomized to initial medical therapy (5.6% vs 7.8%; risk difference, -2.3%; 95% CI, -4.0% to -0.3%). In an analysis that incorporated the competing risk of death, the risk difference between the 2 cohorts was lower and not statistically significant (risk difference, -0.8%; 95% CI, -2.1% to 0.5%). Among patients who met RCT inclusion criteria, the 5-year risk of fatal and nonfatal strokes was 5.5% (95% CI, 4.5%-6.5%) among patients randomized to CEA and was 7.6% (95% CI, 5.7%-9.5%) among those randomized to initial medical therapy (risk difference, -2.1%; 95% CI, -4.4% to -0.2%). Accounting for competing risks resulted in a risk difference of -0.9% (95% CI, -2.9% to 0.7%) that was not statistically significant. Conclusions and Relevance This study found that the absolute reduction in the risk of fatal and nonfatal strokes associated with early CEA was less than half the risk difference in trials from 20 years ago and was no longer statistically significant when the competing risk of nonstroke deaths was accounted for in the analysis. Given the nonnegligible perioperative 30-day risks and the improvements in stroke prevention, medical therapy may be an acceptable therapeutic strategy.
Collapse
|
13
|
Development of Electronic Health Record-Based Prediction Models for 30-Day Readmission Risk Among Patients Hospitalized for Acute Myocardial Infarction. JAMA Netw Open 2021; 4:e2035782. [PMID: 33512518 PMCID: PMC7846941 DOI: 10.1001/jamanetworkopen.2020.35782] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
IMPORTANCE In the US, more than 600 000 adults will experience an acute myocardial infarction (AMI) each year, and up to 20% of the patients will be rehospitalized within 30 days. This study highlights the need for consideration of calibration in these risk models. OBJECTIVE To compare multiple machine learning risk prediction models using an electronic health record (EHR)-derived data set standardized to a common data model. DESIGN, SETTING, AND PARTICIPANTS This was a retrospective cohort study that developed risk prediction models for 30-day readmission among all inpatients discharged from Vanderbilt University Medical Center between January 1, 2007, and December 31, 2016, with a primary diagnosis of AMI who were not transferred from another facility. The model was externally validated at Dartmouth-Hitchcock Medical Center from April 2, 2011, to December 31, 2016. Data analysis occurred between January 4, 2019, and November 15, 2020. EXPOSURES Acute myocardial infarction that required hospital admission. MAIN OUTCOMES AND MEASURES The main outcome was thirty-day hospital readmission. A total of 141 candidate variables were considered from administrative codes, medication orders, and laboratory tests. Multiple risk prediction models were developed using parametric models (elastic net, least absolute shrinkage and selection operator, and ridge regression) and nonparametric models (random forest and gradient boosting). The models were assessed using holdout data with area under the receiver operating characteristic curve (AUROC), percentage of calibration, and calibration curve belts. RESULTS The final Vanderbilt University Medical Center cohort included 6163 unique patients, among whom the mean (SD) age was 67 (13) years, 4137 were male (67.1%), 1019 (16.5%) were Black or other race, and 933 (15.1%) were rehospitalized within 30 days. The final Dartmouth-Hitchcock Medical Center cohort included 4024 unique patients, with mean (SD) age of 68 (12) years; 2584 (64.2%) were male, 412 (10.2%) were rehospitalized within 30 days, and most of the cohort were non-Hispanic and White. The final test set AUROC performance was between 0.686 to 0.695 for the parametric models and 0.686 to 0.704 for the nonparametric models. In the validation cohort, AUROC performance was between 0.558 to 0.655 for parametric models and 0.606 to 0.608 for nonparametric models. CONCLUSIONS AND RELEVANCE In this study, 5 machine learning models were developed and externally validated to predict 30-day readmission AMI hospitalization. These models can be deployed within an EHR using routinely collected data.
Collapse
|
14
|
A Proposed Framework on Integrating Health Equity and Racial Justice into the Artificial Intelligence Development Lifecycle. J Health Care Poor Underserved 2021. [DOI: 10.1353/hpu.2021.0065] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
15
|
Abstract
IMPORTANCE As part of the Choosing Wisely campaign, primary care, surgery, and neurology societies have identified carotid imaging ordered for screening, preoperative evaluation, and syncope as frequently low value. OBJECTIVE To determine the changes in overall and indication-specific rates of carotid imaging following Choosing Wisely recommendations. DESIGN, SETTING, AND PARTICIPANTS This serial cross-sectional study compared annual rates of carotid imaging before Choosing Wisely recommendations (ie, 2007 to 2012) and after (ie, 2013 to 2016) among adults receiving care in the Veterans Health Administration (VHA) national health system. Data analysis was performed from April 10, 2019, to November 27, 2019. EXPOSURES Release of the Choosing Wisely recommendations. MAIN OUTCOMES AND MEASURES Annual rates of overall imaging, imaging ordered for stroke workup, imaging ordered for low-value indications (ie, screening owing to carotid bruit, preoperative evaluation, and syncope). Indications were identified using a text lexicon algorithm based on electronic health record review of a stratified random sample of 1000 free-text imaging orders. The subsequent performance of carotid procedures within 6 months after carotid imaging was assessed. RESULTS Between 2007 and 2016, 809 071 carotid imaging examinations were identified (mean [SD] age of patients undergoing imaging, 69 [10] years; 776 632 [96%] men), of which 201 467 images (24.9%) were ordered for low-value indications (67 064 [8.2%] for carotid bruit, 25 032 [3.1%] for preoperative evaluation, and 109 400 [13.5%] for syncope), 257 369 (31.8%) for stroke workup, and 350 235 (43.3%) for other indications. Imaging for carotid bruits declined across the study period while there was no significant change in imaging for syncope or preoperative evaluation. Compared with the 6 years before, during the 4 years following Choosing Wisely recommendations, there was no change in the trend for syncope, a small decline in preoperative imaging (post-Choosing Wisely trend, -0.1 [95% CI, -0.1 to <-0.1] images per 10 000 veterans), and a continued but less steep decline in imaging for carotid bruits (post-Choosing Wisely trend, -0.3 [95% CI, -0.3 to -0.2] images per 10 000 veterans). During the study period, 17 689 carotid procedures were identified, of which 3232 (18.3%) were preceded by carotid imaging ordered for low-value indications. CONCLUSIONS AND RELEVANCE These findings suggest that Choosing Wisely recommendations were not associated with a meaningful change in low-value carotid imaging in a national integrated health system. To reduce low-value testing and utilization cascades, interventions targeting ordering clinicians are needed to augment the impact of public awareness campaigns.
Collapse
|
16
|
Using Natural Language Processing to improve EHR Structured Data-based Surgical Site Infection Surveillance. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:794-803. [PMID: 32308875 PMCID: PMC7153106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Surgical Site Infection surveillance in healthcare systems is labor intensive and plagued by underreporting as current methodology relies heavily on manual chart review. The rapid adoption of electronic health records (EHRs) has the potential to allow the secondary use of EHR data for quality surveillance programs. This study aims to investigate the effectiveness of integrating natural language processing (NLP) outputs with structured EHR data to build machine learning models for SSI identification using real-world clinical data. We examined a set of models using structured data with and without NLP document-level, mention-level, and keyword features. The top-performing model was based on a Random Forest classifier enhanced with NLP document-level features achieving a 0.58 sensitivity, 0.97 specificity, 0.54 PPV, 0.98 NPV, and 0.52 F0.5 score. We further interrogated the feature contributions, analyzed the errors, and discussed future directions.
Collapse
|
17
|
Determination of Marital Status of Patients from Structured and Unstructured Electronic Healthcare Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:267-274. [PMID: 32308819 PMCID: PMC7153091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Social Determinants of Health, including marital status, are becoming increasingly identified as key drivers of health care utilization. This paper describes a robust method to determine the marital status of patients using structured and unstructured electronic healthcare data from a single academic institution in the United States. We developed and validated a natural language processing pipeline (NLP) for the ascertainment of marital status from clinical notes and compared the performance against two baseline methods: a machine learning n-gram model, and structured data obtained from the electronic health record. Overall our NLP engine had excellent performance on both document-level (F1 0.97) and patient-level (F1 0.95) classification. The NLP Engine had superior performance compared with a baseline machine learning n-gram model. We also observed a good correlation between the marital status obtained from our NLP engine and the baseline structured electronic healthcare data (κ 0.6).
Collapse
|
18
|
Impact of Different Electronic Cohort Definitions to Identify Patients With Atrial Fibrillation From the Electronic Medical Record. J Am Heart Assoc 2020; 9:e014527. [PMID: 32098599 PMCID: PMC7335556 DOI: 10.1161/jaha.119.014527] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Background Electronic medical records (EMRs) allow identification of disease‐specific patient populations, but varying electronic cohort definitions could result in different populations. We compared the characteristics of an electronic medical record–derived atrial fibrillation (AF) patient population using 5 different electronic cohort definitions. Methods and Results Adult patients with at least 1 AF billing code from January 1, 2010, to December 31, 2017, were included. Based on different electronic cohort definitions, we trained 5 different logistic regression models using a labeled training data set (n=786). Each model yielded a predicted probability; patients were classified as having AF if the probability was higher than a specified cut point. Test characteristics were calculated for each model. These models were then applied to the full cohort and resulting characteristics were compared. In the training set, the comprehensive model (including demographics, billing codes, and natural language processing results) performed best, with an area under the curve of 0.89, sensitivity of 0.90, and specificity of 0.87. Among a candidate population (n=22 000), the proportion of patients identified as having AF varied from 61% in the model using diagnosis or procedure International Classification of Diseases (ICD) billing codes to 83% in the model using natural language processing of clinical notes. Among identified AF patients, the proportion of patients with a CHA2DS2‐VASc score ≥2 varied from 69% to 85%; oral anticoagulant treatment rates varied from 50% to 66% depending on the model. Conclusions Different electronic cohort definitions result in substantially different AF study samples. This difference threatens the quality and reproducibility of electronic medical record–based research and quality initiatives.
Collapse
|
19
|
Use of Computerized Provider Order Entry Events for Postoperative Complication Surveillance. JAMA Surg 2020; 154:311-318. [PMID: 30586132 DOI: 10.1001/jamasurg.2018.4874] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Importance Conventional approaches for tracking postoperative adverse events requires manual medical record review, thus limiting the scalability of such efforts. Objective To determine if a surveillance system using computerized provider order entry (CPOE) events for selected medications as well as laboratory, microbiologic, and radiologic orders can decrease the manual medical record review burden for surveillance of postoperative complications. Design, Setting, and Participants This cohort study reviewed the medical records of 21 775 patients who underwent surgical procedures at a university-based tertiary referral center (University of Utah, Salt Lake City) from July 1, 2007, to August 31, 2017. Patients were included if their case was selected for review by a surgical clinical reviewer as part of the National Surgical Quality Improvement Program. Patients were excluded if they had incomplete follow-up data. Main Outcomes and Measures Thirty-day postoperative occurrences of superficial surgical site infection, deep surgical site infection, organ space surgical site infection, urinary tract infection, pneumonia, sepsis, septic shock, deep vein thrombosis requiring therapy, and pulmonary embolism, as defined by the National Surgical Quality Improvement Program. A logistic regression model was developed for each postoperative complication using CPOE features as predictors on a development set, and performance was measured on a holdout internal validation set. The models were internally validated using bootstrapping with 10 000 replications to determine the sensitivity, specificity, positive predictive value, and negative predictive value of CPOE-based surveillance system. Results The study included 21 775 patients who underwent surgical procedures. Among these patients, 11 855 (54.4%) were women and 9920 (45.6%) were men, with a mean (SD) age of 51.7 (16.8) years. Overall, the prevalence of postoperative complications was low, ranging from 0.2% (pulmonary embolism) to 2.6% (superficial surgical site infection). Use of CPOE events to detect patients who experienced at least 1 complication had a sensitivity of 74.8% (95% CI, 71.1%-78.4%), specificity of 86.8% (95% CI, 85.5%-88.3%), positive predictive value of 33.8% (95% CI, 31.2%-36.4%), negative predictive value of 97.5% (95% CI, 97.1%-97.8%), and area under the curve of 0.808 (95% CI, 0.791-0.824). The negative predictive value for individual complications ranged from 98.7% to 100%. Use of CPOE events to screen for adverse events was estimated to diminish the burden of manual medical record review by 55.4% to 90.3%. A CPOE-based surveillance system performed well for both inpatient and outpatient procedures. Conclusions and Relevance A CPOE-based surveillance of postoperative complications has high negative predictive value, which demonstrates that this approach can augment the currently used, resource-intensive manual medical record review process.
Collapse
|
20
|
|
21
|
Association of Marital Status with Health Care Utilization after Complex Surgical Procedures. J Am Coll Surg 2019. [DOI: 10.1016/j.jamcollsurg.2019.08.822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
22
|
Bayesian Networks for Detection of Postoperative Health Care-Associated Infections Using Electronic Health Care Record Data. J Am Coll Surg 2019. [DOI: 10.1016/j.jamcollsurg.2019.08.823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
23
|
Interactive NLP in Clinical Care: Identifying Incidental Findings in Radiology Reports. Appl Clin Inform 2019; 10:655-669. [PMID: 31486057 DOI: 10.1055/s-0039-1695791] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Despite advances in natural language processing (NLP), extracting information from clinical text is expensive. Interactive tools that are capable of easing the construction, review, and revision of NLP models can reduce this cost and improve the utility of clinical reports for clinical and secondary use. OBJECTIVES We present the design and implementation of an interactive NLP tool for identifying incidental findings in radiology reports, along with a user study evaluating the performance and usability of the tool. METHODS Expert reviewers provided gold standard annotations for 130 patient encounters (694 reports) at sentence, section, and report levels. We performed a user study with 15 physicians to evaluate the accuracy and usability of our tool. Participants reviewed encounters split into intervention (with predictions) and control conditions (no predictions). We measured changes in model performance, the time spent, and the number of user actions needed. The System Usability Scale (SUS) and an open-ended questionnaire were used to assess usability. RESULTS Starting from bootstrapped models trained on 6 patient encounters, we observed an average increase in F1 score from 0.31 to 0.75 for reports, from 0.32 to 0.68 for sections, and from 0.22 to 0.60 for sentences on a held-out test data set, over an hour-long study session. We found that tool helped significantly reduce the time spent in reviewing encounters (134.30 vs. 148.44 seconds in intervention and control, respectively), while maintaining overall quality of labels as measured against the gold standard. The tool was well received by the study participants with a very good overall SUS score of 78.67. CONCLUSION The user study demonstrated successful use of the tool by physicians for identifying incidental findings. These results support the viability of adopting interactive NLP tools in clinical care settings for a wider range of clinical applications.
Collapse
|
24
|
Recent Advances in Using Natural Language Processing to Address Public Health Research Questions Using Social Media and ConsumerGenerated Data. Yearb Med Inform 2019; 28:208-217. [PMID: 31419834 PMCID: PMC6697505 DOI: 10.1055/s-0039-1677918] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
OBJECTIVE We present a narrative review of recent work on the utilisation of Natural Language Processing (NLP) for the analysis of social media (including online health communities) specifically for public health applications. METHODS We conducted a literature review of NLP research that utilised social media or online consumer-generated text for public health applications, focussing on the years 2016 to 2018. Papers were identified in several ways, including PubMed searches and the inspection of recent conference proceedings from the Association of Computational Linguistics (ACL), the Conference on Human Factors in Computing Systems (CHI), and the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM). Popular data sources included Twitter, Reddit, various online health communities, and Facebook. RESULTS In the recent past, communicable diseases (e.g., influenza, dengue) have been the focus of much social media-based NLP health research. However, mental health and substance use and abuse (including the use of tobacco, alcohol, marijuana, and opioids) have been the subject of an increasing volume of research in the 2016 - 2018 period. Associated with this trend, the use of lexicon-based methods remains popular given the availability of psychologically validated lexical resources suitable for mental health and substance abuse research. Finally, we found that in the period under review "modern" machine learning methods (i.e. deep neural-network-based methods), while increasing in popularity, remain less widely used than "classical" machine learning methods.
Collapse
|
25
|
Natural Language Processing Accurately Identifies Dysphagia Indications for Esophagogastroduodenoscopy Procedures in a Large US Integrated Healthcare System: Implications for Classifying Overuse and Quality Measurement. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:665-671. [PMID: 31259022 PMCID: PMC6568132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Recent evidence suggests almost half of repeat esophagogastroduodenoscopy procedures (EGDs) are overused; this prior research relied on administrative data that are often inaccurate. Our primary objective was to determine and compare the accuracy of natural language processing and administrative data to manual chart review to identify dysphagia indications for EGD procedures within the national VA healthcare system. From 396,856 EGD notes identified from 2008-2014, we classified 119,920 as "index" procedures in 2010-2012. We compared the performance of our NLP to ICD codes to correctly identify dysphagia indications in the index EGD procedures and in repeat EGD procedures. We used linked pathology data to describe esophageal biopsies performed during these EGDs. NLP performed significantly better and identified significantly more index and repeat EGD procedures with dysphagia indications than ICD codes, which has critical implications for determining appropriateness of EGD procedures.
Collapse
|
26
|
Moonstone: a novel natural language processing system for inferring social risk from clinical narratives. J Biomed Semantics 2019; 10:6. [PMID: 30975223 PMCID: PMC6458709 DOI: 10.1186/s13326-019-0198-0] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 03/18/2019] [Indexed: 11/10/2022] Open
Abstract
Background Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone, a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information — in this case, housing situation, living alone, and social support — from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. Results An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest (housing situation, living alone, and social support). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 (homeless/marginally housed) to 0.98 (lives at home/not homeless), accuracy scores ranging from 0.63 (lives in facility) to 0.95 (lives alone), and sensitivity (i.e. recall) scores ranging from 0.75 (lives in facility) to 0.97 (lives alone). Conclusions The Moonstone system is — to the best of our knowledge — the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good (lives in facility) to excellent (lives alone) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.
Collapse
|
27
|
Documentation of ENDS Use in the Veterans Affairs Electronic Health Record (2008-2014). Am J Prev Med 2019; 56:474-475. [PMID: 30777165 DOI: 10.1016/j.amepre.2018.10.019] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2018] [Revised: 10/16/2018] [Accepted: 10/17/2018] [Indexed: 10/27/2022]
|
28
|
Abstract
New scientific knowledge and innovation are often slow to disseminate. In other cases, providers rush into adopting what appears to be a clinically relevant innovation, based on a single clinical trial. In reality, adopting innovations without appropriate translation and repeated testing of practical application is problematic. In this article we provide examples of clinical innovations (for example, tight glucose control in critically ill patients) that were adopted inappropriately and that caused what we term a malfunction. To address the issue of malfunctions, we review various examples and suggest frameworks for the diffusion of knowledge leading to the adoption of useful innovations. The resulting model is termed an integrated road map for coordinating knowledge transformation and innovation adoption. We make recommendations for the targeted development of practice change procedures, practice change assessment, structured descriptions of tested interventions, intelligent knowledge management technologies, and policy support for knowledge transformation, including further standardization to facilitate sharing among institutions.
Collapse
|
29
|
NLPReViz: an interactive tool for natural language processing on clinical text. J Am Med Inform Assoc 2019; 25:81-87. [PMID: 29016825 DOI: 10.1093/jamia/ocx070] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 06/21/2017] [Indexed: 11/14/2022] Open
Abstract
The gap between domain experts and natural language processing expertise is a barrier to extracting understanding from clinical text. We describe a prototype tool for interactive review and revision of natural language processing models of binary concepts extracted from clinical notes. We evaluated our prototype in a user study involving 9 physicians, who used our tool to build and revise models for 2 colonoscopy quality variables. We report changes in performance relative to the quantity of feedback. Using initial training sets as small as 10 documents, expert review led to final F1scores for the "appendiceal-orifice" variable between 0.78 and 0.91 (with improvements ranging from 13.26% to 29.90%). F1for "biopsy" ranged between 0.88 and 0.94 (-1.52% to 11.74% improvements). The average System Usability Scale score was 70.56. Subjective feedback also suggests possible design improvements.
Collapse
|
30
|
Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2019; 2019:173-181. [PMID: 31258969 PMCID: PMC6568127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Background. Family health history (FHH) can be used to identify individuals at elevated risk for familial cancers. Risk criteria for common cancers rely on age of onset, which is documented inconsistently as structured and unstructured data in electronic health records (EHRs). Objective. To investigate a natural language processing (NLP) approach to extract age of onset and age of death from free-text EHR fields. Methods. Using 474,651 FHH entries from 89,814 patients, we investigated two methods - frequent patterns (baseline) and NLP classifier. Results. For age of onset, the NLP classifier outperformed the baseline in precision (96% vs. 83%; 95% CI [94, 97] and [80, 86]) with equivalent recall (both 93%; 95% CI [91, 95]). When applied to the full dataset, the NLP approach increased the percentage of FHH entries for which cancer risk criteria could be applied from 10% to 15%. Conclusion. NLP combined with structured data may improve the computation of familial cancer risk criteria for various use cases.
Collapse
|
31
|
Abstract
IMPORTANCE To improve patient safety, health care systems need reliable methods to detect adverse events in large patient populations. Events are often described in clinical notes, rather than structured data, which make them difficult to identify on a large scale. OBJECTIVE To develop and compare 2 natural language processing methods, a rules-based approach and a machine learning (ML) approach, for identifying bleeding events in clinical notes. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study used deidentified notes from the Medical Information Mart for Intensive Care, which spans 2001 to 2012. A training set of 990 notes and a test set of 660 notes were randomly selected. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. A bleeding dictionary was developed for the rules-based approach; bleeding mentions were then aggregated to arrive at a classification for each note. Three ML models (support vector machine, extra trees, and convolutional neural network) were developed and trained using the 990-note training set. Another instance of each ML model was also trained on a sample of 450 notes, with equal numbers of bleeding-present and bleeding-absent notes. The notes were represented using term frequency-inverse document frequency vectors and global vectors for word representation. MAIN OUTCOMES AND MEASURES The main outcomes were accuracy, sensitivity, specificity, positive predictive value, and negative predictive value for each model. Following training, the models were tested on the test set and sensitivities were compared using a McNemar test. RESULTS The 990-note training set represented 769 patients (296 [38.5%] female; mean [SD] age, 67.42 [14.7] years). The 660-note test set represented 527 patients (211 [40.0%] female; mean [SD] age, 67.86 [14.7] years). Bleeding was present in 146 notes (22.1%). The extra trees down-sampled model and rules-based approaches were similarly sensitive (93.8% vs 91.1%; difference, 2.7%; 95% CI, -3.8% to 7.9%; P = .44). The positive predictive value for the extra trees model, however, was 48.6%. The rules-based model had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value. CONCLUSIONS AND RELEVANCE Bleeding is a common complication in health care, and these results demonstrate an automated and scalable detection method. The rules-based natural language processing approach, compared with ML, had the best performance in identifying bleeding, with high sensitivity and negative predictive value.
Collapse
|
32
|
Detecting Evidence of Intra-abdominal Surgical Site Infections from Radiology Reports Using Natural Language Processing. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:515-524. [PMID: 29854116 PMCID: PMC5977582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Free-text reports in electronic health records (EHRs) contain medically significant information - signs, symptoms, findings, diagnoses - recorded by clinicians during patient encounters. These reports contain rich clinical information which can be leveraged for surveillance of disease and occurrence of adverse events. In order to gain meaningful knowledge from these text reports to support surveillance efforts, information must first be converted into a structured, computable format. Traditional methods rely on manual review of charts, which can be costly and inefficient. Natural language processing (NLP) methods offer an efficient, alternative approach to extracting the information and can achieve a similar level of accuracy. We developed an NLP system to automatically identify mentions of surgical site infections in radiology reports and classify reports containing evidence of surgical site infections leveraging these mentions. We evaluated our system using a reference standard of reports annotated by domain experts, administrative data generated for each patient encounter, and a machine learning-based approach.
Collapse
|
33
|
A Framework for Leveraging "Big Data" to Advance Epidemiology and Improve Quality: Design of the VA Colonoscopy Collaborative. EGEMS (WASHINGTON, DC) 2018; 6:4. [PMID: 29881762 PMCID: PMC5983017 DOI: 10.5334/egems.198] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 01/10/2018] [Indexed: 12/19/2022]
Abstract
OBJECTIVE To describe a framework for leveraging big data for research and quality improvement purposes and demonstrate implementation of the framework for design of the Department of Veterans Affairs (VA) Colonoscopy Collaborative. METHODS We propose that research utilizing large-scale electronic health records (EHRs) can be approached in a 4 step framework: 1) Identify data sources required to answer research question; 2) Determine whether variables are available as structured or free-text data; 3) Utilize a rigorous approach to refine variables and assess data quality; 4) Create the analytic dataset and perform analyses. We describe implementation of the framework as part of the VA Colonoscopy Collaborative, which aims to leverage big data to 1) prospectively measure and report colonoscopy quality and 2) develop and validate a risk prediction model for colorectal cancer (CRC) and high-risk polyps. RESULTS Examples of implementation of the 4 step framework are provided. To date, we have identified 2,337,171 Veterans who have undergone colonoscopy between 1999 and 2014. Median age was 62 years, and 4.6 percent (n = 106,860) were female. We estimated that 2.6 percent (n = 60,517) had CRC diagnosed at baseline. An additional 1 percent (n = 24,483) had a new ICD-9 code-based diagnosis of CRC on follow up. CONCLUSION We hope our framework may contribute to the dialogue on best practices to ensure high quality epidemiologic and quality improvement work. As a result of implementation of the framework, the VA Colonoscopy Collaborative holds great promise for 1) quantifying and providing novel understandings of colonoscopy outcomes, and 2) building a robust approach for nationwide VA colonoscopy quality reporting.
Collapse
|
34
|
Abstract 11: Development and Comparison of Two Natural Language Processing Methods for Identifying Bleeding Events in Clinical Text. Circ Cardiovasc Qual Outcomes 2018. [DOI: 10.1161/circoutcomes.11.suppl_1.11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background:
Learning healthcare systems need techniques that can accurately and automatically identify health outcomes in large populations. Outcomes are often described in clinical narration in the electronic medical record.
Objective:
To develop and compare two natural language processing (NLP) approaches, rules-based (RB) and machine-learning (ML), for identifying bleeding events in clinical notes.
Methods:
We used de-identified notes from the Medical Information Mart for Intensive Care. We randomly selected 990 notes for a training set and 660 notes for a test set. Physicians classified each note as present or absent for a clinically relevant bleeding event during the hospitalization. We developed a dictionary of target and modifier words for the RB approach. In RB, the computer “reads” the text and tags bleeding targets as present or absent based on the modifier words; the mentions are aggregated to arrive at a classification for the note. For the ML approach, each note was represented as a high-dimensional vector where each dimension corresponds to the frequency of a certain word. Similar notes (e.g. bleeding present notes) have similar vectors; the computer learns these patterns to predict the class for an unseen note. One RB and three ML models (support vector machine (SVM), extra trees (ET), convolutional neural network (CNN)) were trained using the full 990-note training set. Another instance of each ML model was also trained on a down-sampled (DS) set of 450 notes, with equal positive and negative notes. We ran the trained models on the 660-note test set and compared classification performance using McNemar’s test.
Results:
The 660 note test set represented 527 unique patients, 40% female. Bleeding events were present in 21% of the notes. The ET-DS model was the most sensitive, followed by the RB approach (93.8% versus 91.1%, p=0.44). The PPV value for the ET-DS model, however, was <50%. The RB had the best performance overall, with 84.6% specificity, 62.7% positive predictive value, and 97.1% negative predictive value (NPV) for identifying clinically relevant bleeding.
Discussion:
A RB NLP approach, compared to ML, has the best overall performance in independently identifying bleeding events among critically ill patients. The current models have high NPV, so could be used to reduce the chart review burden.
Collapse
|
35
|
Understanding patient satisfaction with received healthcare services: A natural language processing approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2017; 2016:524-533. [PMID: 28269848 PMCID: PMC5333198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Important information is encoded in free-text patient comments. We determine the most common topics in patient comments, design automatic topic classifiers, identify comments ' sentiment, and find new topics in negative comments. Our annotation scheme consisted of 28 topics, with positive and negative sentiment. Within those 28 topics, the seven most frequent accounted for 63% of annotations. For automated topic classification, we developed vocabulary-based and Naive Bayes ' classifiers. For sentiment analysis, another Naive Bayes ' classifier was used. Finally, we used topic modeling to search for unexpected topics within negative comments. The seven most common topics were appointment access, appointment wait, empathy, explanation, friendliness, practice environment, and overall experience. The best F-measures from our classifier were 0.52(NB), 0.57(NB), 0.36(Vocab), 0.74(NB), 0.40(NB), and 0.44(Vocab), respectively. F- scores ranged from 0.16 to 0.74. The sentiment classification F-score was 0.84. Negative comment topic modeling revealed complaints about appointment access, appointment wait, and time spent with physician.
Collapse
|
36
|
Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. J Biomed Semantics 2016; 7:43. [PMID: 27370271 PMCID: PMC4930590 DOI: 10.1186/s13326-016-0084-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 06/01/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. METHODS In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. RESULTS The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. CONCLUSION Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.
Collapse
|
37
|
Knowledge Author: facilitating user-driven, domain content development to support clinical information extraction. J Biomed Semantics 2016; 7:42. [PMID: 27338146 PMCID: PMC4919842 DOI: 10.1186/s13326-016-0086-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 06/01/2016] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Clinical Natural Language Processing (NLP) systems require a semantic schema comprised of domain-specific concepts, their lexical variants, and associated modifiers to accurately extract information from clinical texts. An NLP system leverages this schema to structure concepts and extract meaning from the free texts. In the clinical domain, creating a semantic schema typically requires input from both a domain expert, such as a clinician, and an NLP expert who will represent clinical concepts created from the clinician's domain expertise into a computable format usable by an NLP system. The goal of this work is to develop a web-based tool, Knowledge Author, that bridges the gap between the clinical domain expert and the NLP system development by facilitating the development of domain content represented in a semantic schema for extracting information from clinical free-text. RESULTS Knowledge Author is a web-based, recommendation system that supports users in developing domain content necessary for clinical NLP applications. Knowledge Author's schematic model leverages a set of semantic types derived from the Secondary Use Clinical Element Models and the Common Type System to allow the user to quickly create and modify domain-related concepts. Features such as collaborative development and providing domain content suggestions through the mapping of concepts to the Unified Medical Language System Metathesaurus database further supports the domain content creation process. Two proof of concept studies were performed to evaluate the system's performance. The first study evaluated Knowledge Author's flexibility to create a broad range of concepts. A dataset of 115 concepts was created of which 87 (76 %) were able to be created using Knowledge Author. The second study evaluated the effectiveness of Knowledge Author's output in an NLP system by extracting concepts and associated modifiers representing a clinical element, carotid stenosis, from 34 clinical free-text radiology reports using Knowledge Author and an NLP system, pyConText. Knowledge Author's domain content produced high recall for concepts (targeted findings: 86 %) and varied recall for modifiers (certainty: 91 % sidedness: 80 %, neurovascular anatomy: 46 %). CONCLUSION Knowledge Author can support clinical domain content development for information extraction by supporting semantic schema creation by domain experts.
Collapse
|
38
|
Extracting a stroke phenotype risk factor from Veteran Health Administration clinical reports: an information content analysis. J Biomed Semantics 2016; 7:26. [PMID: 27175226 PMCID: PMC4863379 DOI: 10.1186/s13326-016-0065-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2015] [Accepted: 04/19/2016] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time. METHODS In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats (structures) and linguistic descriptions (expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText's, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes. RESULTS We observed that most carotid mentions are recorded in prose using categorical expressions, within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently. CONCLUSION We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.
Collapse
|
39
|
Towards a Generalizable Time Expression Model for Temporal Reasoning in Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2015; 2015:1252-1259. [PMID: 26958265 PMCID: PMC4765564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Accurate temporal identification and normalization is imperative for many biomedical and clinical tasks such as generating timelines and identifying phenotypes. A major natural language processing challenge is developing and evaluating a generalizable temporal modeling approach that performs well across corpora and institutions. Our long-term goal is to create such a model. We initiate our work on reaching this goal by focusing on temporal expression (TIMEX3) identification. We present a systematic approach to 1) generalize existing solutions for automated TIMEX3 span detection, and 2) assess similarities and differences by various instantiations of TIMEX3 models applied on separate clinical corpora. When evaluated on the 2012 i2b2 and the 2015 Clinical TempEval challenge corpora, our conclusion is that our approach is successful - we achieve competitive results for automated classification, and we identify similarities and differences in TIMEX3 modeling that will be informative in the development of a simplified, general temporal model.
Collapse
|
40
|
Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. J Am Med Inform Assoc 2015; 22:143-54. [PMID: 25147248 PMCID: PMC4433360 DOI: 10.1136/amiajnl-2013-002544] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 07/16/2014] [Accepted: 07/21/2014] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners. MATERIALS AND METHODS We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text--199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance. RESULTS For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59. DISCUSSION Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources. CONCLUSIONS The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.
Collapse
|
41
|
Cue-based assertion classification for Swedish clinical text--developing a lexicon for pyConTextSwe. Artif Intell Med 2014; 61:137-44. [PMID: 24556644 PMCID: PMC4104142 DOI: 10.1016/j.artmed.2014.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Revised: 12/19/2013] [Accepted: 01/10/2014] [Indexed: 11/17/2022]
Abstract
OBJECTIVE The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. METHODS AND MATERIAL We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. RESULTS Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. CONCLUSIONS We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
Collapse
|
42
|
Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text. J Biomed Inform 2014; 50:162-72. [PMID: 24859155 PMCID: PMC5627768 DOI: 10.1016/j.jbi.2014.05.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Revised: 05/03/2014] [Accepted: 05/06/2014] [Indexed: 11/26/2022]
Abstract
The Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor method requires removal of 18 types of protected health information (PHI) from clinical documents to be considered “de-identified” prior to use for research purposes. Human review of PHI elements from a large corpus of clinical documents can be tedious and error-prone. Indeed, multiple annotators may be required to consistently redact information that represents each PHI class. Automated de-identification has the potential to improve annotation quality and reduce annotation time. For instance, using machine-assisted annotation by combining de-identification system outputs used as pre-annotations and an interactive annotation interface to provide annotators with PHI annotations for “curation” rather than manual annotation from “scratch” on raw clinical documents. In order to assess whether machine-assisted annotation improves the reliability and accuracy of the reference standard quality and reduces annotation effort, we conducted an annotation experiment. In this annotation study, we assessed the generalizability of the VA Consortium for Healthcare Informatics Research (CHIR) annotation schema and guidelines applied to a corpus of publicly available clinical documents called MTSamples. Specifically, our goals were to (1) characterize a heterogeneous corpus of clinical documents manually annotated for risk-ranked PHI and other annotation types (clinical eponyms and person relations), (2) evaluate how well annotators apply the CHIR schema to the heterogeneous corpus, (3) compare whether machine-assisted annotation (experiment) improves annotation quality and reduces annotation time compared to manual annotation (control), and (4) assess the change in quality of reference standard coverage with each added annotator’s annotations.
Collapse
|
43
|
|
44
|
Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence. Am J Epidemiol 2014; 179:749-58. [PMID: 24488511 DOI: 10.1093/aje/kwt441] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The increasing availability of electronic health records (EHRs) creates opportunities for automated extraction of information from clinical text. We hypothesized that natural language processing (NLP) could substantially reduce the burden of manual abstraction in studies examining outcomes, like cancer recurrence, that are documented in unstructured clinical text, such as progress notes, radiology reports, and pathology reports. We developed an NLP-based system using open-source software to process electronic clinical notes from 1995 to 2012 for women with early-stage incident breast cancers to identify whether and when recurrences were diagnosed. We developed and evaluated the system using clinical notes from 1,472 patients receiving EHR-documented care in an integrated health care system in the Pacific Northwest. A separate study provided the patient-level reference standard for recurrence status and date. The NLP-based system correctly identified 92% of recurrences and estimated diagnosis dates within 30 days for 88% of these. Specificity was 96%. The NLP-based system overlooked 5 of 65 recurrences, 4 because electronic documents were unavailable. The NLP-based system identified 5 other recurrences incorrectly classified as nonrecurrent in the reference standard. If used in similar cohorts, NLP could reduce by 90% the number of EHR charts abstracted to identify confirmed breast cancer recurrence cases at a rate comparable to traditional abstraction.
Collapse
|
45
|
Semantic annotation of clinical events for generating a problem list. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:1032-41. [PMID: 24551392 PMCID: PMC3900128] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
We present a pilot study of an annotation schema representing problems and their attributes, along with their relationship to temporal modifiers. We evaluated the ability for humans to annotate clinical reports using the schema and assessed the contribution of semantic annotations in determining the status of a problem mention as active, inactive, proposed, resolved, negated, or other. Our hypothesis is that the schema captures semantic information useful for generating an accurate problem list. Clinical named entities such as reference events, time points, time durations, aspectual phase, ordering words and their relationships including modifications and ordering relations can be annotated by humans with low to moderate recall. Once identified, most attributes can be annotated with low to moderate agreement. Some attributes - Experiencer, Existence, and Certainty - are more informative than other attributes - Intermittency and Generalized/Conditional - for predicting a problem mention's status. Support vector machine outperformed Naïve Bayes and Decision Tree for predicting a problem's status.
Collapse
|
46
|
Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013; 2013:600-609. [PMID: 24551362 PMCID: PMC3900203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Medical terminologies and ontologies are important tools for natural language processing of health record narratives. To account for the variability of language use, synonyms need to be stored in a semantic resource as textual instantiations of a concept. Developing such resources manually is, however, prohibitively expensive and likely to result in low coverage. To facilitate and expedite the process of lexical resource development, distributional analysis of large corpora provides a powerful data-driven means of (semi-)automatically identifying semantic relations, including synonymy, between terms. In this paper, we demonstrate how distributional analysis of a large corpus of electronic health records - the MIMIC-II database - can be employed to extract synonyms of SNOMED CT preferred terms. A distinctive feature of our method is its ability to identify synonymous relations between terms of varying length.
Collapse
|
47
|
Formative evaluation of ontology learning methods for entity discovery by using existing ontologies as reference standards. Methods Inf Med 2013; 52:308-16. [PMID: 23666409 DOI: 10.3414/me12-01-0029] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2012] [Accepted: 02/02/2013] [Indexed: 11/09/2022]
Abstract
OBJECTIVE Developing a two-step method for formative evaluation of statistical Ontology Learning (OL) algorithms that leverages existing biomedical ontologies as reference standards. METHODS In the first step optimum parameters are established. A 'gap list' of entities is generated by finding the set of entities present in a later version of the ontology that are not present in an earlier version of the ontology. A named entity recognition system is used to identify entities in a corpus of biomedical documents that are present in the 'gap list', generating a reference standard. The output of the algorithm (new entity candidates), produced by statistical methods, is subsequently compared against this reference standard. An OL method that performs perfectly will be able to learn all of the terms in this reference standard. Using evaluation metrics and precision-recall curves for different thresholds and parameters, we compute the optimum parameters for each method. In the second step, human judges with expertise in ontology development evaluate each candidate suggested by the algorithm configured with the optimum parameters previously established. These judgments are used to compute two performance metrics developed from our previous work: Entity Suggestion Rate (ESR) and Entity Acceptance Rate (EAR). RESULTS Using this method, we evaluated two statistical OL methods for OL in two medical domains. For the pathology domain, we obtained 49% ESR, 28% EAR with the Lin method and 52% ESR, 39% EAR with the Church method. For the radiology domain, we obtain 87% ESA, 9% EAR using Lin method and 96% ESR, 16% EAR using Church method. CONCLUSION This method is sufficiently general and flexible enough to permit comparison of any OL method for a specific corpus and ontology of interest.
Collapse
|
48
|
Using chief complaints for syndromic surveillance: a review of chief complaint based classifiers in North America. J Biomed Inform 2013; 46:734-43. [PMID: 23602781 DOI: 10.1016/j.jbi.2013.04.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2012] [Revised: 08/30/2012] [Accepted: 04/03/2013] [Indexed: 11/27/2022]
Abstract
A major goal of Natural Language Processing in the public health informatics domain is the automatic extraction and encoding of data stored in free text patient records. This extracted data can then be utilized by computerized systems to perform syndromic surveillance. In particular, the chief complaint--a short string that describes a patient's symptoms--has come to be a vital resource for syndromic surveillance in the North American context due to its near ubiquity. This paper reviews fifteen systems in North America--at the city, county, state and federal level--that use chief complaints for syndromic surveillance.
Collapse
|
49
|
#wheezing: A Content Analysis of Asthma-Related Tweets. Online J Public Health Inform 2013. [PMCID: PMC3692912] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Objective We present a Content Analysis project using Natural Language Processing to aid in Twitter-based syndromic surveillance of Asthma. Introduction Recently, a growing number of studies have made use of Twitter to track the spread of infectious disease. These investigations show that there are reliable spikes in traffic related to keywords associated with the spread of infectious diseases like Influenza [1], as well as other Syndromes [2]. However, little research has been done using Social Media to monitor chronic conditions like Asthma, which do not spread from sufferer to sufferer. We therefore test the feasibility of using Twitter for Asthma surveillance, using techniques from NLP and machine learning to achieve a deeper understanding of what users Tweet about Asthma, rather than relying only on keyword search. Methods We retrieved a large volume of Tweets from the Twitter API. Search terms included “asthma,” and several misspellings of that word; terms for common medical devices associated with Asthma such as “inhaler” and “nebulizer”; and names of prescription drugs used to treat the condition, including “albuterol” and “Singulair.” A randomly sampled subset of these Tweets (N=3511) was annotated for content, based on an annotation scheme that coded for the following elements: the Experiencer of Asthma symptoms (Self, Family, Friend, Named Other, Unidentified, and All-Non-Self, which was the union of these last four categories); aspects of the type of information being conveyed by each Tweet (Medication, Triggers, Physical Activity, Contacting of a Medical Practitioner, Allergies, Questions, Suggestions, Information, News, Spam); as well as Negative Sentiment, Future temporality, and Non-English content. Further details on the annotation scheme used can be found at http://idiom.ucsd.edu/∼ggilling/annotation.pdf. Inter-annotator agreement on a subset of the Tweets (N=403) fell in an acceptable range for all categories (Cohen’s Kappa >0.6). Once annotation was complete, the Tweets’ texts were stemmed and converted into vectors of unigram and bigram counts. These were then stripped of sparse terms (all those words appearing in fewer than 1 in 200 Tweets), which left multi-dimensional vectors consisting of the counts of the remaining words in all Tweets. Statistical machine-learning classifiers including K-nearest neighbors, Naive Bayes and Support Vector Machines were then trained on the unigram and bigram models. Results SVM with 10-fold cross-validation achieved greatest prediction accuracy with the unigram model, as shown in Table 1. Categories that showed the greatest reduction in classification error using the unigram model were Non-English, Self, All-Non-Self, Medication, Symptoms and Spam. The majority of these categories showed very high Precision, as well as fairly high Recall for the unigram model. Unexpectedly, the bigram model faired far worse than the Unigram model, which suggests that individual words in these Tweets were more reliably predictive of content than pairs of words, which occurred less frequently. Conclusions Text-classification increases the utility of Twitter as a data-source for studying chronic conditions such as Asthma. Using these methods, we can automatically reject Tweets that are non-English or Spam. We can also determine who is experiencing symptoms: the Twitter user or another individual. Fairly simple models are able to predict with good certainty whether a user is talking about their Symptoms, their Medication, or Triggers for their Asthma, as well as whether they are expressing Negative sentiment about their condition. We demonstrate that Social Media such as Twitter is a promising means by which to conduct surveillance for chronic conditions such as Asthma.
Collapse
|
50
|
Natural Language Processing to identify pneumonia from radiology reports. Pharmacoepidemiol Drug Saf 2013; 22:834-41. [PMID: 23554109 DOI: 10.1002/pds.3418] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2012] [Revised: 01/11/2013] [Accepted: 01/14/2013] [Indexed: 11/09/2022]
Abstract
PURPOSE This study aimed to develop Natural Language Processing (NLP) approaches to supplement manual outcome validation, specifically to validate pneumonia cases from chest radiograph reports. METHODS We trained one NLP system, ONYX, using radiograph reports from children and adults that were previously manually reviewed. We then assessed its validity on a test set of 5000 reports. We aimed to substantially decrease manual review, not replace it entirely, and so, we classified reports as follows: (1) consistent with pneumonia; (2) inconsistent with pneumonia; or (3) requiring manual review because of complex features. We developed processes tailored either to optimize accuracy or to minimize manual review. Using logistic regression, we jointly modeled sensitivity and specificity of ONYX in relation to patient age, comorbidity, and care setting. We estimated positive and negative predictive value (PPV and NPV) assuming pneumonia prevalence in the source data. RESULTS Tailored for accuracy, ONYX identified 25% of reports as requiring manual review (34% of true pneumonias and 18% of non-pneumonias). For the remainder, ONYX's sensitivity was 92% (95% CI 90-93%), specificity 87% (86-88%), PPV 74% (72-76%), and NPV 96% (96-97%). Tailored to minimize manual review, ONYX classified 12% as needing manual review. For the remainder, ONYX had sensitivity 75% (72-77%), specificity 95% (94-96%), PPV 86% (83-88%), and NPV 91% (90-91%). CONCLUSIONS For pneumonia validation, ONYX can replace almost 90% of manual review while maintaining low to moderate misclassification rates. It can be tailored for different outcomes and study needs and thus warrants exploration in other settings.
Collapse
|