Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, Peterson NB. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc 2010;17:383-8. [PMID: 20595304 DOI: 10.1136/jamia.2010.004804] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

For:	Denny JC, Peterson JF, Choma NN, Xu H, Miller RA, Bastarache L, Peterson NB. Extracting timing and status descriptors for colonoscopy testing from electronic medical records. J Am Med Inform Assoc 2010;17:383-8. [PMID: 20595304 DOI: 10.1136/jamia.2010.004804] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Number

Cited by Other Article(s)

Eguia H, Sánchez-Bocanegra CL, Vinciarelli F, Alvarez-Lopez F, Saigí-Rubió F. Clinical Decision Support and Natural Language Processing in Medicine: Systematic Literature Review. J Med Internet Res 2024;26:e55315. [PMID: 39348889 PMCID: PMC11474138 DOI: 10.2196/55315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 04/20/2024] [Accepted: 07/24/2024] [Indexed: 10/02/2024] Open

Abstract

BACKGROUND

Ensuring access to accurate and verified information is essential for effective patient treatment and diagnosis. Although health workers rely on the internet for clinical data, there is a need for a more streamlined approach.

OBJECTIVE

This systematic review aims to assess the current state of artificial intelligence (AI) and natural language processing (NLP) techniques in health care to identify their potential use in electronic health records and automated information searches.

METHODS

A search was conducted in the PubMed, Embase, ScienceDirect, Scopus, and Web of Science online databases for articles published between January 2000 and April 2023. The only inclusion criteria were (1) original research articles and studies on the application of AI-based medical clinical decision support using NLP techniques and (2) publications in English. A Critical Appraisal Skills Programme tool was used to assess the quality of the studies.

RESULTS

The search yielded 707 articles, from which 26 studies were included (24 original articles and 2 systematic reviews). Of the evaluated articles, 21 (81%) explained the use of NLP as a source of data collection, 18 (69%) used electronic health records as a data source, and a further 8 (31%) were based on clinical data. Only 5 (19%) of the articles showed the use of combined strategies for NLP to obtain clinical data. In total, 16 (62%) articles presented stand-alone data review algorithms. Other studies (n=9, 35%) showed that the clinical decision support system alternative was also a way of displaying the information obtained for immediate clinical use.

CONCLUSIONS

The use of NLP engines can effectively improve clinical decision systems' accuracy, while biphasic tools combining AI algorithms and human criteria may optimize clinical diagnosis and treatment flows.

TRIAL REGISTRATION

PROSPERO CRD42022373386; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=373386.

Collapse

Fu S, Wang L, He H, Wen A, Zong N, Kumari A, Liu F, Zhou S, Zhang R, Li C, Wang Y, St Sauver J, Liu H, Sohn S. A taxonomy for advancing systematic error analysis in multi-site electronic health record-based clinical concept extraction. J Am Med Inform Assoc 2024;31:1493-1502. [PMID: 38742455 PMCID: PMC11187420 DOI: 10.1093/jamia/ocae101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 03/26/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024] Open

Abstract

BACKGROUND

Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process.

OBJECTIVES

This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks.

MATERIALS AND METHODS

We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator.

RESULTS

The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies.

CONCLUSION

The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.

Collapse

Affiliation(s)

Sunyang Fu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Liwei Wang Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Huan He Department of Biomedical Informatics & Data Science, Yale University, New Haven, CT 06520, United States
Andrew Wen Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Nansu Zong Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States
Anamika Kumari Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Feifan Liu Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, Boston, MA 01655, United States
Sicheng Zhou Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Rui Zhang Division of Computational Health Sciences, University of Minnesota, Minneapolis, MN 55455, United States
Chenyu Li Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Yanshan Wang Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, United States
Jennifer St Sauver Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55902, United States
Hongfang Liu Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
Sunghwan Sohn Department of AI and Informatics, Mayo Clinic, Rochester, MN 55902, United States

Collapse

Sivarajkumar S, Kelley M, Samolyk-Mazzanti A, Visweswaran S, Wang Y. An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study. JMIR Med Inform 2024;12:e55318. [PMID: 38587879 PMCID: PMC11036183 DOI: 10.2196/55318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 02/20/2024] [Accepted: 02/24/2024] [Indexed: 04/09/2024] Open

Abstract

BACKGROUND

Large language models (LLMs) have shown remarkable capabilities in natural language processing (NLP), especially in domains where labeled data are scarce or expensive, such as the clinical domain. However, to unlock the clinical knowledge hidden in these LLMs, we need to design effective prompts that can guide them to perform specific clinical NLP tasks without any task-specific training data. This is known as in-context learning, which is an art and science that requires understanding the strengths and weaknesses of different LLMs and prompt engineering approaches.

OBJECTIVE

The objective of this study is to assess the effectiveness of various prompt engineering techniques, including 2 newly introduced types-heuristic and ensemble prompts, for zero-shot and few-shot clinical information extraction using pretrained language models.

METHODS

This comprehensive experimental study evaluated different prompt types (simple prefix, simple cloze, chain of thought, anticipatory, heuristic, and ensemble) across 5 clinical NLP tasks: clinical sense disambiguation, biomedical evidence extraction, coreference resolution, medication status extraction, and medication attribute extraction. The performance of these prompts was assessed using 3 state-of-the-art language models: GPT-3.5 (OpenAI), Gemini (Google), and LLaMA-2 (Meta). The study contrasted zero-shot with few-shot prompting and explored the effectiveness of ensemble approaches.

RESULTS

The study revealed that task-specific prompt tailoring is vital for the high performance of LLMs for zero-shot clinical NLP. In clinical sense disambiguation, GPT-3.5 achieved an accuracy of 0.96 with heuristic prompts and 0.94 in biomedical evidence extraction. Heuristic prompts, alongside chain of thought prompts, were highly effective across tasks. Few-shot prompting improved performance in complex scenarios, and ensemble approaches capitalized on multiple prompt strengths. GPT-3.5 consistently outperformed Gemini and LLaMA-2 across tasks and prompt types.

CONCLUSIONS

This study provides a rigorous evaluation of prompt engineering methodologies and introduces innovative techniques for clinical information extraction, demonstrating the potential of in-context learning in the clinical domain. These findings offer clear guidelines for future prompt-based clinical NLP research, facilitating engagement by non-NLP experts in clinical NLP advancements. To the best of our knowledge, this is one of the first works on the empirical evaluation of different prompt engineering approaches for clinical NLP in this era of generative artificial intelligence, and we hope that it will inspire and inform future research in this area.

Collapse

van de Burgt BWM, Wasylewicz ATM, Dullemond B, Grouls RJE, Egberts TCG, Bouwman A, Korsten EMM. Combining text mining with clinical decision support in clinical practice: a scoping review. J Am Med Inform Assoc 2022;30:588-603. [PMID: 36512578 PMCID: PMC9933076 DOI: 10.1093/jamia/ocac240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 10/17/2022] [Accepted: 12/01/2022] [Indexed: 12/15/2022] Open

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Vadyala SR, Sherer EA. Natural Language Processing Accurately Categorizes Indications, Findings and Pathology Reports from Multicenter Colonoscopy: Qualitative focus study (Preprint). JMIR Cancer 2021. [DOI: 10.2196/32973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zirikly A, Desmet B, Newman-Griffis D, Marfeo EE, McDonough C, Goldman H, Chan L. Viewpoint: An Information Extraction Framework for Disability Determination Using a Mental Functioning Use-Case (Preprint). JMIR Med Inform 2021;10:e32245. [PMID: 35302510 PMCID: PMC8976250 DOI: 10.2196/32245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/08/2021] [Accepted: 01/16/2022] [Indexed: 01/08/2023] Open

Abstract

Natural language processing (NLP) in health care enables transformation of complex narrative information into high value products such as clinical decision support and adverse event monitoring in real time via the electronic health record (EHR). However, information technologies for mental health have consistently lagged because of the complexity of measuring and modeling mental health and illness. The use of NLP to support management of mental health conditions is a viable topic that has not been explored in depth. This paper provides a framework for the advanced application of NLP methods to identify, extract, and organize information on mental health and functioning to inform the decision-making process applied to assessing mental health. We present a use-case related to work disability, guided by the disability determination process of the US Social Security Administration (SSA). From this perspective, the following questions must be addressed about each problem that leads to a disability benefits claim: When did the problem occur and how long has it existed? How severe is it? Does it affect the person’s ability to work? and What is the source of the evidence about the problem? Our framework includes 4 dimensions of medical information that are central to assessing disability—temporal sequence and duration, severity, context, and information source. We describe key aspects of each dimension and promising approaches for application in mental functioning. For example, to address temporality, a complete functional timeline must be created with all relevant aspects of functioning such as intermittence, persistence, and recurrence. Severity of mental health symptoms can be successfully identified and extracted on a 4-level ordinal scale from absent to severe. Some NLP work has been reported on the extraction of context for specific cases of wheelchair use in clinical settings. We discuss the links between the task of information source assessment and work on source attribution, coreference resolution, event extraction, and rule-based methods. Gaps were identified in NLP applications that directly applied to the framework and in existing relevant annotated data sets. We highlighted NLP methods with the potential for advanced application in the field of mental functioning. Findings of this work will inform the development of instruments for supporting SSA adjudicators in their disability determination process. The 4 dimensions of medical information may have relevance for a broad array of individuals and organizations responsible for assessing mental health function and ability. Further, our framework with 4 specific dimensions presents significant opportunity for the application of NLP in the realm of mental health and functioning beyond the SSA setting, and it may support the development of robust tools and methods for decision-making related to clinical care, program implementation, and other outcomes.

Collapse

Zhao J, Grabowska ME, Kerchberger VE, Smith JC, Eken HN, Feng Q, Peterson JF, Trent Rosenbloom S, Johnson KB, Wei WQ. ConceptWAS: A high-throughput method for early identification of COVID-19 presenting symptoms and characteristics from clinical notes. J Biomed Inform 2021;117:103748. [PMID: 33774203 PMCID: PMC7992296 DOI: 10.1016/j.jbi.2021.103748] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 01/28/2021] [Accepted: 03/07/2021] [Indexed: 01/08/2023]

Abstract

OBJECTIVE

Identifying symptoms and characteristics highly specific to coronavirus disease 2019 (COVID-19) would improve the clinical and public health response to this pandemic challenge. Here, we describe a high-throughput approach - Concept-Wide Association Study (ConceptWAS) - that systematically scans a disease's clinical manifestations from clinical notes. We used this method to identify symptoms specific to COVID-19 early in the course of the pandemic.

METHODS

We created a natural language processing pipeline to extract concepts from clinical notes in a local ER corresponding to the PCR testing date for patients who had a COVID-19 test and evaluated these concepts as predictors for developing COVID-19. We identified predictors from Firth's logistic regression adjusted by age, gender, and race. We also performed ConceptWAS using cumulative data every two weeks to identify the timeline for recognition of early COVID-19-specific symptoms.

RESULTS

We processed 87,753 notes from 19,692 patients subjected to COVID-19 PCR testing between March 8, 2020, and May 27, 2020 (1,483 COVID-19-positive). We found 68 concepts significantly associated with a positive COVID-19 test. We identified symptoms associated with increasing risk of COVID-19, including "anosmia" (odds ratio [OR] = 4.97, 95% confidence interval [CI] = 3.21-7.50), "fever" (OR = 1.43, 95% CI = 1.28-1.59), "cough with fever" (OR = 2.29, 95% CI = 1.75-2.96), and "ageusia" (OR = 5.18, 95% CI = 3.02-8.58). Using ConceptWAS, we were able to detect loss of smell and loss of taste three weeks prior to their inclusion as symptoms of the disease by the Centers for Disease Control and Prevention (CDC).

CONCLUSION

ConceptWAS, a high-throughput approach for exploring specific symptoms and characteristics of a disease like COVID-19, offers a promise for enabling EHR-powered early disease manifestations identification.

Collapse

Zhao J, Grabowska ME, Kerchberger VE, Smith JC, Eken HN, Feng Q, Peterson JF, Rosenbloom ST, Johnson KB, Wei WQ. ConceptWAS: a high-throughput method for early identification of COVID-19 presenting symptoms. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2020:2020.11.06.20227165. [PMID: 33200151 PMCID: PMC7668764 DOI: 10.1101/2020.11.06.20227165] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]

Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, Zhao Y, Sohn S, Liu H. Clinical concept extraction: A methodology review. J Biomed Inform 2020;109:103526. [PMID: 32768446 PMCID: PMC7746475 DOI: 10.1016/j.jbi.2020.103526] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 07/30/2020] [Accepted: 08/02/2020] [Indexed: 01/11/2023]

Baxter SL, Klie AR, Radha Saseendrakumar B, Ye GY, Hogarth M. Text Processing for Detection of Fungal Ocular Involvement in Critical Care Patients: Cross-Sectional Study. J Med Internet Res 2020;22:e18855. [PMID: 32795984 PMCID: PMC7455861 DOI: 10.2196/18855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/21/2020] [Accepted: 06/13/2020] [Indexed: 11/13/2022] Open

Abstract

Background

Fungal ocular involvement can develop in patients with fungal bloodstream infections and can be vision-threatening. Ocular involvement has become less common in the current era of improved antifungal therapies. Retrospectively determining the prevalence of fungal ocular involvement is important for informing clinical guidelines, such as the need for routine ophthalmologic consultations. However, manual retrospective record review to detect cases is time-consuming.

Objective

This study aimed to determine the prevalence of fungal ocular involvement in a critical care database using both structured and unstructured electronic health record (EHR) data.

Methods

We queried microbiology data from 46,467 critical care patients over 12 years (2000-2012) from the Medical Information Mart for Intensive Care III (MIMIC-III) to identify 265 patients with culture-proven fungemia. For each fungemic patient, demographic data, fungal species present in blood culture, and risk factors for fungemia (eg, presence of indwelling catheters, recent major surgery, diabetes, immunosuppressed status) were ascertained. All structured diagnosis codes and free-text narrative notes associated with each patient’s hospitalization were also extracted. Screening for fungal endophthalmitis was performed using two approaches: (1) by querying a wide array of eye- and vision-related diagnosis codes, and (2) by utilizing a custom regular expression pipeline to identify and collate relevant text matches pertaining to fungal ocular involvement. Both approaches were validated using manual record review. The main outcome measure was the documentation of any fungal ocular involvement.

Results

In total, 265 patients had culture-proven fungemia, with Candida albicans (n=114, 43%) and Candida glabrata (n=74, 28%) being the most common fungal species in blood culture. The in-hospital mortality rate was 121 (46%). In total, 7 patients were identified as having eye- or vision-related diagnosis codes, none of whom had fungal endophthalmitis based on record review. There were 26,830 free-text narrative notes associated with these 265 patients. A regular expression pipeline based on relevant terms yielded possible matches in 683 notes from 108 patients. Subsequent manual record review again demonstrated that no patients had fungal ocular involvement. Therefore, the prevalence of fungal ocular involvement in this cohort was 0%.

Conclusions

MIMIC-III contained no cases of ocular involvement among fungemic patients, consistent with prior studies reporting low rates of ocular involvement in fungemia. This study demonstrates an application of natural language processing to expedite the review of narrative notes. This approach is highly relevant for ophthalmology, where diagnoses are often based on physical examination findings that are documented within clinical notes.

Collapse

Liang H, Yang L, Tao L, Shi L, Yang W, Bai J, Zheng D, Wang N, Ji J. Data mining-based model and risk prediction of colorectal cancer by using secondary health data: A systematic review. Chin J Cancer Res 2020;32:242-251. [PMID: 32410801 PMCID: PMC7219096 DOI: 10.21147/j.issn.1000-9604.2020.02.11] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Accepted: 04/01/2020] [Indexed: 01/08/2023] Open

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019;100:103301. [PMID: 31589927 DOI: 10.1016/j.jbi.2019.103301] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 09/04/2019] [Accepted: 10/03/2019] [Indexed: 02/07/2023]

Abstract

OBJECTIVE

There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer.

METHODS

We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps.

RESULTS

Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis.

CONCLUSION

The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.

Collapse

Cho M, Kim JH, Hong KS, Kim JS, Kong HJ, Kim S. Identification of cecum time-location in a colonoscopy video by deep learning analysis of colonoscope movement. PeerJ 2019;7:e7256. [PMID: 31392088 PMCID: PMC6673422 DOI: 10.7717/peerj.7256] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 06/05/2019] [Indexed: 12/11/2022] Open

Abstract

Background

Cecal intubation time is an important component for quality colonoscopy. Cecum is the turning point that determines the insertion and withdrawal phase of the colonoscope. For this reason, obtaining information related with location of the cecum in the endoscopic procedure is very useful. Also, it is necessary to detect the direction of colonoscope's movement and time-location of the cecum.

Methods

In order to analysis the direction of scope's movement, the Horn-Schunck algorithm was used to compute the pixel's motion change between consecutive frames. Horn-Schunk-algorithm applied images were trained and tested through convolutional neural network deep learning methods, and classified to the insertion, withdrawal and stop movements. Based on the scope's movement, the graph was drawn with a value of +1 for insertion, -1 for withdrawal, and 0 for stop. We regarded the turning point as a cecum candidate point when the total graph area sum in a certain section recorded the lowest.

Results

A total of 328,927 frame images were obtained from 112 patients. The overall accuracy, drawn from 5-fold cross-validation, was 95.6%. When the value of "t" was 30 s, accuracy of cecum discovery was 96.7%. In order to increase visibility, the movement of the scope was added to summary report of colonoscopy video. Insertion, withdrawal, and stop movements were mapped to each color and expressed with various scale. As the scale increased, the distinction between the insertion phase and the withdrawal phase became clearer.

Conclusion

Information obtained in this study can be utilized as metadata for proficiency assessment. Since insertion and withdrawal are technically different movements, data of scope's movement and phase can be quantified and utilized to express pattern unique to the colonoscopist and to assess proficiency. Also, we hope that the findings of this study can contribute to the informatics field of medical records so that medical charts can be transmitted graphically and effectively in the field of colonoscopy.

Collapse

Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019;7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 226] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open

Abstract

BACKGROUND

Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset.

OBJECTIVE

The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives.

METHODS

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles.

RESULTS

Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes.

CONCLUSIONS

Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.

Collapse

Carlson J, Laryea J. Electronic Health Record-Based Registries: Clinical Research Using Registries in Colon and Rectal Surgery. Clin Colon Rectal Surg 2019;32:82-90. [PMID: 30647550 DOI: 10.1055/s-0038-1673358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Adekkanattu P, Sholle ET, DeFerio J, Pathak J, Johnson SB, Campion TR. Ascertaining Depression Severity by Extracting Patient Health Questionnaire-9 (PHQ-9) Scores from Clinical Notes. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:147-156. [PMID: 30815052 PMCID: PMC6371338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Smith JC, Chen Q, Denny JC, Roden DM, Johnson KB, Miller RA. Evaluation of a Novel System to Enhance Clinicians' Recognition of Preadmission Adverse Drug Reactions. Appl Clin Inform 2018;9:313-325. [PMID: 29742757 DOI: 10.1055/s-0038-1646963] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Abstract

BACKGROUND

Often unrecognized by providers, adverse drug reactions (ADRs) diminish patients' quality of life, cause preventable admissions and emergency department visits, and increase health care costs.

OBJECTIVE

This article evaluates whether an automated system, the Adverse Drug Effect Recognizer (ADER), could assist clinicians in detecting and addressing inpatients' ongoing preadmission ADRs.

METHODS

ADER uses natural language processing to extract patients' medications, findings, and past diagnoses from admission notes. It compares excerpted information to a database of known medication adverse effects and promptly warns clinicians about potential ongoing ADRs and potential confounders via alerts placed in patients' electronic health records (EHRs). A 3-month intervention trial evaluated ADER's impact on antihypertensive medication ordering behaviors. At the time of patient admission, ADER warned providers on the Internal Medicine wards of Vanderbilt University Hospital about potential ongoing preadmission antihypertensive medication ADRs. A retrospective control group, comprised similar physicians from a period prior to the intervention, received no alerts. The evaluation compared ordering behaviors for each group to determine if preadmission medications changed during hospitalization or at discharge. The study also analyzed intervention group participants' survey responses and user comments.

RESULTS

ADER identified potential preadmission ADRs for 30% of both groups. Compared with controls, intervention providers more often withheld or discontinued suspected ADR-causing medications during the inpatient stay (p < 0.001). Intervention providers who responded to alert-related surveys held or discontinued suspected ADR-causing medications more often at discharge (p < 0.001).

CONCLUSION

Results indicate that ADER helped physicians recognize ADRs and reduced ordering of suspected ADR-causing medications. In hospitals using EHRs, ADER-like systems could improve clinicians' recognition and elimination of ongoing ADRs.

Collapse

Robinson JR, Wei WQ, Roden DM, Denny JC. Defining Phenotypes from Clinical Data to Drive Genomic Research. Annu Rev Biomed Data Sci 2018;1:69-92. [PMID: 34109303 DOI: 10.1146/annurev-biodatasci-080917-013335] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Geraci J, Wilansky P, de Luca V, Roy A, Kennedy JL, Strauss J. Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression. EVIDENCE-BASED MENTAL HEALTH 2017;20:83-87. [PMID: 28739578 PMCID: PMC5566092 DOI: 10.1136/eb-2017-102688] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 06/12/2017] [Accepted: 06/21/2017] [Indexed: 01/11/2023]

Seol JW, Yi W, Choi J, Lee KS. Causality patterns and machine learning for the extraction of problem-action relations in discharge summaries. Int J Med Inform 2016;98:1-12. [PMID: 28034407 DOI: 10.1016/j.ijmedinf.2016.10.021] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Revised: 08/19/2016] [Accepted: 10/29/2016] [Indexed: 10/20/2022]

Arsoniadis EG, Melton GB. Leveraging the electronic health record for research and quality improvement: Current strengths and future challenges. SEMINARS IN COLON AND RECTAL SURGERY 2016. [DOI: 10.1053/j.scrs.2016.01.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Text Mining for Precision Medicine: Bringing Structure to EHRs and Biomedical Literature to Understand Genes and Health. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016;939:139-166. [PMID: 27807747 DOI: 10.1007/978-981-10-1503-8_7] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Yang H, Garibaldi JM. A hybrid model for automatic identification of risk factors for heart disease. J Biomed Inform 2015;58 Suppl:S171-S182. [PMID: 26375492 PMCID: PMC4989091 DOI: 10.1016/j.jbi.2015.09.006] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2015] [Revised: 09/03/2015] [Accepted: 09/04/2015] [Indexed: 11/23/2022]

Mo H, Thompson WK, Rasmussen LV, Pacheco JA, Jiang G, Kiefer R, Zhu Q, Xu J, Montague E, Carrell DS, Lingren T, Mentch FD, Ni Y, Wehbe FH, Peissig PL, Tromp G, Larson EB, Chute CG, Pathak J, Denny JC, Speltz P, Kho AN, Jarvik GP, Bejan CA, Williams MS, Borthwick K, Kitchner TE, Roden DM, Harris PA. Desiderata for computable representations of electronic health records-driven phenotype algorithms. J Am Med Inform Assoc 2015;22:1220-30. [PMID: 26342218 PMCID: PMC4639716 DOI: 10.1093/jamia/ocv112] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2015] [Accepted: 06/24/2015] [Indexed: 11/12/2022] Open

Abstract

BACKGROUND

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM).

METHODS

A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms.

RESULTS

We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility.

CONCLUSION

A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages.

Collapse

Affiliation(s)

Huan Mo Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
William K Thompson Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, IL, USA
Luke V Rasmussen Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Jennifer A Pacheco Center for Genetic Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Guoqian Jiang Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Richard Kiefer Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
Qian Zhu Department of Information Systems, University of Maryland, Baltimore County, Baltimore, MD, USA
Jie Xu Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Enid Montague Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
David S Carrell Group Health Research Institute, Seattle, WA, USA
Todd Lingren Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
Frank D Mentch Center for Applied Genomics, the Children's Hospital of Philadelphia, Philadelphia, PA, USA
Yizhao Ni Division of Biomedical Informatics, Cincinnati Children's Hospital, Cincinnati, OH, USA
Firas H Wehbe Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Peggy L Peissig Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
Gerard Tromp Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Faculty of Medicine and Health Sciences, University of Stellenbosch, Cape Town, South Africa
Eric B Larson Group Health Research Institute, Seattle, WA, USA
Christopher G Chute Division of General Internal Medicine, Johns Hopkins University, Baltimore, MD, USA
Jyotishman Pathak Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, MN, USA
Joshua C Denny Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA Department of Medicine, Vanderbilt University, Nashville, TN, USA
Peter Speltz Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Abel N Kho Department of Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
Gail P Jarvik Department of Medicine (Medical Genetics), University of Washington, Seattle, WA, USA Department of Genome Sciences, University of Washington, Seattle, WA, USA
Cosmin A Bejan Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Marc S Williams Department of Genome Sciences, University of Washington, Seattle, WA, USA
Kenneth Borthwick The Sigfried and Janet Weis Center for Research, Geisinger Health System, Danville, PA, USA
Terrie E Kitchner Marshfield Clinic Research Foundation, Marshfield Clinic, Marshfield, WI, USA
Dan M Roden Department of Medicine, Vanderbilt University, Nashville, TN, USA Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
Paul A Harris Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA

Collapse

Wasfy JH, Singal G, O'Brien C, Blumenthal DM, Kennedy KF, Strom JB, Spertus JA, Mauri L, Normand SLT, Yeh RW. Enhancing the Prediction of 30-Day Readmission After Percutaneous Coronary Intervention Using Data Extracted by Querying of the Electronic Health Record. Circ Cardiovasc Qual Outcomes 2015;8:477-85. [PMID: 26286871 DOI: 10.1161/circoutcomes.115.001855] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 06/22/2015] [Indexed: 01/24/2023]

Abstract

BACKGROUND

Early readmission after percutaneous coronary intervention is an important quality metric, but prediction models from registry data have only moderate discrimination. We aimed to improve ability to predict 30-day readmission after percutaneous coronary intervention from a previously validated registry-based model.

METHODS AND RESULTS

We matched readmitted to non-readmitted patients in a 1:2 ratio by risk of readmission, and extracted unstructured and unconventional structured data from the electronic medical record, including need for medical interpretation, albumin level, medical nonadherence, previous number of emergency department visits, atrial fibrillation/flutter, syncope/presyncope, end-stage liver disease, malignancy, and anxiety. We assessed differences in rates of these conditions between cases/controls, and estimated their independent association with 30-day readmission using logistic regression conditional on matched groups. Among 9288 percutaneous coronary interventions, we matched 888 readmitted with 1776 non-readmitted patients. In univariate analysis, cases and controls were significantly different with respect to interpreter (7.9% for cases and 5.3% for controls; P=0.009), emergency department visits (1.12 for cases and 0.77 for controls; P<0.001), homelessness (3.2% for cases and 1.6% for controls; P=0.007), anticoagulation (33.9% for cases and 22.1% for controls; P<0.001), atrial fibrillation/flutter (32.7% for cases and 28.9% for controls; P=0.045), presyncope/syncope (27.8% for cases and 21.3% for controls; P<0.001), and anxiety (69.4% for cases and 62.4% for controls; P<0.001). Anticoagulation, emergency department visits, and anxiety were independently associated with readmission.

CONCLUSIONS

Patient characteristics derived from review of the electronic health record can be used to refine risk prediction for hospital readmission after percutaneous coronary intervention.

Collapse

Affiliation(s)

Jason H Wasfy From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Gaurav Singal From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Cashel O'Brien From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Daniel M Blumenthal From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Kevin F Kennedy From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Jordan B Strom From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
John A Spertus From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Laura Mauri From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Sharon-Lise T Normand From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.)
Robert W Yeh From the Cardiology Division, Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (J.H.W., C.O'B., D.M.B., R.W.Y.), Cardiovascular Division, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA (L.M.), Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston (G.S.), Cardiovascular Division, Department of Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (J.B.S.), Saint Luke's Mid America Heart Institute/UMKC, Kansas City, MO (K.F.K., J.A.S.); and Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA (S.-L.T.N.).

Collapse

Kim MY, Xu Y, Zaiane OR, Goebel R. Recognition of Patient-Related Named Entities in Noisy Tele-Health Texts. ACM T INTEL SYST TEC 2015. [DOI: 10.1145/2651444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Abstract We explore methods for effectively extracting information from clinical narratives that are captured in a public health consulting phone service called HealthLink. Our research investigates the application of state-of-the-art natural language processing and machine learning to clinical narratives to extract information of interest. The currently available data consist of dialogues constructed by nurses while consulting patients by phone. Since the data are interviews transcribed by nurses during phone conversations, they include a significant volume and variety of noise. When we extract the patient-related information from the noisy data, we have to remove or correct at least two kinds of noise: explicit noise , which includes spelling errors, unfinished sentences, omission of sentence delimiters, and variants of terms, and implicit noise , which includes non-patient information and patient's untrustworthy information. To filter explicit noise, we propose our own biomedical term detection/normalization method: it resolves misspelling, term variations, and arbitrary abbreviation of terms by nurses. In detecting temporal terms, temperature, and other types of named entities (which show patients’ personal information such as age and sex), we propose a bootstrapping-based pattern learning process to detect a variety of arbitrary variations of named entities. To address implicit noise, we propose a dependency path-based filtering method. The result of our denoising is the extraction of normalized patient information, and we visualize the named entities by constructing a graph that shows the relations between named entities. The objective of this knowledge discovery task is to identify associations between biomedical terms and to clearly expose the trends of patients’ symptoms and concern; the experimental results show that we achieve reasonable performance with our noise reduction methods. Collapse

Predicting Non-Adherence with Outpatient Colonoscopy Using a Novel Electronic Tool that Measures Prior Non-Adherence. J Gen Intern Med 2015;30:724-31. [PMID: 25586869 PMCID: PMC4441666 DOI: 10.1007/s11606-014-3165-6] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Revised: 07/09/2014] [Accepted: 12/08/2014] [Indexed: 10/25/2022]

Abstract

BACKGROUND

Accurately predicting the risk of no-show for a scheduled colonoscopy can help target interventions to improve compliance with colonoscopy, and thereby reduce the disease burden of colorectal cancer and enhance the utilization of resources within endoscopy units.

OBJECTIVES

We aimed to utilize information available in an electronic medical record (EMR) and endoscopy scheduling system to create a predictive model for no-show risk, and to simultaneously evaluate the role for natural language processing (NLP) in developing such a model.

DESIGN

This was a retrospective observational study using discovery and validation phases to design a colonoscopy non-adherence prediction model. An NLP-derived variable called the Non-Adherence Ratio ("NAR") was developed, validated, and included in the model.

PARTICIPANTS

Patients scheduled for outpatient colonoscopy at an Academic Medical Center (AMC) that is part of a multi-hospital health system, 2009 to 2011, were included in the study.

MAIN MEASURES

Odds ratios for non-adherence were calculated for all variables in the discovery cohort, and an Area Under the Receiver Operating Curve (AUC) was calculated for the final non-adherence prediction model.

KEY RESULTS

The non-adherence model included six variables: 1) gender; 2) history of psychiatric illness, 3) NAR; 4) wait time in months; 5) number of prior missed endoscopies; and 6) education level. The model achieved discrimination in the validation cohort (AUC= =70.2 %). At a threshold non-adherence score of 0.46, the model's sensitivity and specificity were 33 % and 92 %, respectively. Removing the NAR from the model significantly reduced its predictive power (AUC = 64.3 %, difference = 5.9 %, p < 0.001).

CONCLUSIONS

A six-variable model using readily available clinical and demographic information demonstrated accuracy for predicting colonoscopy non-adherence. The NAR, a novel variable developed using NLP technology, significantly strengthened this model's predictive power.

Collapse

Imler TD, Morea J, Kahi C, Sherer EA, Cardwell J, Johnson CS, Xu H, Ahnen D, Antaki F, Ashley C, Baffy G, Cho I, Dominitz J, Hou J, Korsten M, Nagar A, Promrat K, Robertson D, Saini S, Shergill A, Smalley W, Imperiale TF. Multi-center colonoscopy quality measurement utilizing natural language processing. Am J Gastroenterol 2015;110:543-52. [PMID: 25756240 DOI: 10.1038/ajg.2015.51] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 01/02/2015] [Indexed: 12/11/2022]

Abstract

BACKGROUND

An accurate system for tracking of colonoscopy quality and surveillance intervals could improve the effectiveness and cost-effectiveness of colorectal cancer (CRC) screening and surveillance. The purpose of this study was to create and test such a system across multiple institutions utilizing natural language processing (NLP).

METHODS

From 42,569 colonoscopies with pathology records from 13 centers, we randomly sampled 750 paired reports. We trained (n=250) and tested (n=500) an NLP-based program with 19 measurements that encompass colonoscopy quality measures and surveillance interval determination, using blinded, paired, annotated expert manual review as the reference standard. The remaining 41,819 nonannotated documents were processed through the NLP system without manual review to assess performance consistency. The primary outcome was system accuracy across the 19 measures.

RESULTS

A total of 176 (23.5%) documents with 252 (1.8%) discrepant content points resulted from paired annotation. Error rate within the 500 test documents was 31.2% for NLP and 25.4% for the paired annotators (P=0.001). At the content point level within the test set, the error rate was 3.5% for NLP and 1.9% for the paired annotators (P=0.04). When eight vaguely worded documents were removed, 125 of 492 (25.4%) were incorrect by NLP and 104 of 492 (21.1%) by the initial annotator (P=0.07). Rates of pathologic findings calculated from NLP were similar to those calculated by annotation for the majority of measurements. Test set accuracy was 99.6% for CRC, 95% for advanced adenoma, 94.6% for nonadvanced adenoma, 99.8% for advanced sessile serrated polyps, 99.2% for nonadvanced sessile serrated polyps, 96.8% for large hyperplastic polyps, and 96.0% for small hyperplastic polyps. Lesion location showed high accuracy (87.0-99.8%). Accuracy for number of adenomas was 92%.

CONCLUSIONS

NLP can accurately report adenoma detection rate and the components for determining guideline-adherent colonoscopy surveillance intervals across multiple sites that utilize different methods for reporting colonoscopy findings.

Collapse

Affiliation(s)

Timothy D Imler 1] Division of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA [2] Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA [3] Department of Biomedical Informatics, Regenstrief Institute, LLC, Indianapolis, Indiana, USA
Justin Morea 1] Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA [2] Department of Biomedical Informatics, Regenstrief Institute, LLC, Indianapolis, Indiana, USA
Charles Kahi 1] Division of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA [2] Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA [3] Center of Innovation, Health Services Research and Development, Richard L, Roudebush VA Medical Center, Indianapolis, Indiana, USA
Eric A Sherer
Jon Cardwell Center of Innovation, Health Services Research and Development, Richard L, Roudebush VA Medical Center, Indianapolis, Indiana, USA
Cynthia S Johnson Department of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, USA
Huiping Xu Department of Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana, USA
Dennis Ahnen Division of Gastroenterology, University of Colorado, Denver, Colorado, USA
Fadi Antaki Division of Gastroenterology, Wayne State University, Detroit, Michigan, USA
Christopher Ashley Division of Gastroenterology, Albany Medical College, Albany, New York, USA
Gyorgy Baffy Department of Medicine, VA Boston Healthcare System, Boston, Massachusetts, USA
Ilseung Cho Division of Gastroenterology, New York University School of Medicine, New York, New York, USA
Jason Dominitz Division of Gastroenterology, University of Washington School of Medicine, Seattle, Washington, USA
Jason Hou Division of Gastroenterology and Hepatology, Baylor College of Medicine, Houston, Texas, USA
Mark Korsten Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, Bronx, New York, USA
Anil Nagar Division of Digestive Diseases, Yale School of Medicine, New Haven, Connecticut, USA
Kittichai Promrat Division of Gastroenterology, Brown Medical School, Providence, Rhode Island, USA
Douglas Robertson Division of Gastroenterology, The Dartmouth Institute, Lebanon, New Hampshire, USA
Sameer Saini Division of Gastroenterology, University of Michigan, Ann Arbor, Michigan, USA
Amandeep Shergill Division of Gastroenterology, University of California at San Francisco, San Francisco, California, USA
Walter Smalley Division of Gastroenterology, Vanderbilt University, Nashville, Tennessee, USA
Thomas F Imperiale 1] Division of Gastroenterology and Hepatology, Indiana University School of Medicine, Indianapolis, Indiana, USA [2] Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA [3] Center of Innovation, Health Services Research and Development, Richard L, Roudebush VA Medical Center, Indianapolis, Indiana, USA [4] Health Services Research, Regenstrief Institute, Indianapolis, Indiana, USA

Collapse

Bowton E, Field JR, Wang S, Schildcrout JS, Van Driest SL, Delaney JT, Cowan J, Weeke P, Mosley JD, Wells QS, Karnes JH, Shaffer C, Peterson JF, Denny JC, Roden DM, Pulley JM. Biobanks and electronic medical records: enabling cost-effective research. Sci Transl Med 2014;6:234cm3. [PMID: 24786321 DOI: 10.1126/scitranslmed.3008604] [Citation(s) in RCA: 106] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Hou JK, Imler TD, Imperiale TF. Current and future applications of natural language processing in the field of digestive diseases. Clin Gastroenterol Hepatol 2014;12:1257-61. [PMID: 24858706 DOI: 10.1016/j.cgh.2014.05.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Accepted: 05/15/2014] [Indexed: 02/07/2023]

Heintzman J, Bailey SR, Hoopes MJ, Le T, Gold R, O'Malley JP, Cowburn S, Marino M, Krist A, DeVoe JE. Agreement of Medicaid claims and electronic health records for assessing preventive care quality among adults. J Am Med Inform Assoc 2014;21:720-4. [PMID: 24508767 PMCID: PMC4078280 DOI: 10.1136/amiajnl-2013-002333] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 12/23/2013] [Accepted: 01/20/2014] [Indexed: 11/03/2022] Open

Turina M, Kiran RP. Electronic medical records in colorectal surgery. Clin Colon Rectal Surg 2014;26:17-22. [PMID: 24436643 DOI: 10.1055/s-0033-1333629] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]

Text mining of cancer-related information: review of current status and future directions. Int J Med Inform 2014;83:605-23. [PMID: 25008281 DOI: 10.1016/j.ijmedinf.2014.06.009] [Citation(s) in RCA: 112] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Revised: 06/12/2014] [Accepted: 06/14/2014] [Indexed: 12/21/2022]

Abstract

PURPOSE

This paper reviews the research literature on text mining (TM) with the aim to find out (1) which cancer domains have been the subject of TM efforts, (2) which knowledge resources can support TM of cancer-related information and (3) to what extent systems that rely on knowledge and computational methods can convert text data into useful clinical information. These questions were used to determine the current state of the art in this particular strand of TM and suggest future directions in TM development to support cancer research.

METHODS

A review of the research on TM of cancer-related information was carried out. A literature search was conducted on the Medline database as well as IEEE Xplore and ACM digital libraries to address the interdisciplinary nature of such research. The search results were supplemented with the literature identified through Google Scholar.

RESULTS

A range of studies have proven the feasibility of TM for extracting structured information from clinical narratives such as those found in pathology or radiology reports. In this article, we provide a critical overview of the current state of the art for TM related to cancer. The review highlighted a strong bias towards symbolic methods, e.g. named entity recognition (NER) based on dictionary lookup and information extraction (IE) relying on pattern matching. The F-measure of NER ranges between 80% and 90%, while that of IE for simple tasks is in the high 90s. To further improve the performance, TM approaches need to deal effectively with idiosyncrasies of the clinical sublanguage such as non-standard abbreviations as well as a high degree of spelling and grammatical errors. This requires a shift from rule-based methods to machine learning following the success of similar trends in biological applications of TM. Machine learning approaches require large training datasets, but clinical narratives are not readily available for TM research due to privacy and confidentiality concerns. This issue remains the main bottleneck for progress in this area. In addition, there is a need for a comprehensive cancer ontology that would enable semantic representation of textual information found in narrative reports.

Collapse

Rosenbloom ST, Harris P, Pulley J, Basford M, Grant J, DuBuisson A, Rothman RL. The Mid-South clinical Data Research Network. J Am Med Inform Assoc 2014;21:627-32. [PMID: 24821742 PMCID: PMC4078290 DOI: 10.1136/amiajnl-2014-002745] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open

Wei WQ, Feng Q, Weeke P, Bush W, Waitara MS, Iwuchukwu OF, Roden DM, Wilke RA, Stein CM, Denny JC. Creation and Validation of an EMR-based Algorithm for Identifying Major Adverse Cardiac Events while on Statins. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2014;2014:112-9. [PMID: 25717410 PMCID: PMC4333709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]

McPeek Hinz ER, Bastarache L, Denny JC. A natural language processing algorithm to define a venous thromboembolism phenotype. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2013;2013:975-983. [PMID: 24551388 PMCID: PMC3900229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]

Nikfarjam A, Emadzadeh E, Gonzalez G. Towards generating a patient's timeline: extracting temporal relationships from clinical notes. J Biomed Inform 2013;46 Suppl:S40-S47. [PMID: 24212118 DOI: 10.1016/j.jbi.2013.11.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2013] [Revised: 10/31/2013] [Accepted: 11/01/2013] [Indexed: 10/26/2022]

Sun W, Rumshisky A, Uzuner O. Temporal reasoning over clinical text: the state of the art. J Am Med Inform Assoc 2013;20:814-9. [PMID: 23676245 PMCID: PMC3756277 DOI: 10.1136/amiajnl-2013-001760] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2013] [Revised: 04/17/2013] [Accepted: 04/20/2013] [Indexed: 11/03/2022] Open

Wei WQ, Cronin RM, Xu H, Lasko TA, Bastarache L, Denny JC. Development and evaluation of an ensemble resource linking medications to their indications. J Am Med Inform Assoc 2013;20:954-61. [PMID: 23576672 PMCID: PMC3756263 DOI: 10.1136/amiajnl-2012-001431] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2012] [Revised: 02/25/2013] [Accepted: 03/18/2013] [Indexed: 11/09/2022] Open

Sun W, Rumshisky A, Uzuner O. Annotating temporal information in clinical narratives. J Biomed Inform 2013;46 Suppl:S5-S12. [PMID: 23872518 DOI: 10.1016/j.jbi.2013.07.004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 07/10/2013] [Accepted: 07/10/2013] [Indexed: 11/26/2022]

Natural language processing accurately categorizes findings from colonoscopy and pathology reports. Clin Gastroenterol Hepatol 2013;11:689-94. [PMID: 23313839 PMCID: PMC4026927 DOI: 10.1016/j.cgh.2012.11.035] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Revised: 11/07/2012] [Accepted: 11/27/2012] [Indexed: 02/07/2023]

Tang B, Wu Y, Jiang M, Chen Y, Denny JC, Xu H. A hybrid system for temporal information extraction from clinical text. J Am Med Inform Assoc 2013;20:828-35. [PMID: 23571849 DOI: 10.1136/amiajnl-2013-001635] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Xu Y, Wang Y, Liu T, Tsujii J, Chang EIC. An end-to-end system to identify temporal relation in discharge summaries: 2012 i2b2 challenge. J Am Med Inform Assoc 2013;20:849-58. [PMID: 23467472 DOI: 10.1136/amiajnl-2012-001607] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Abstract

OBJECTIVE

To create an end-to-end system to identify temporal relation in discharge summaries for the 2012 i2b2 challenge. The challenge includes event extraction, timex extraction, and temporal relation identification.

DESIGN

An end-to-end temporal relation system was developed. It includes three subsystems: an event extraction system (conditional random fields (CRF) name entity extraction and their corresponding attribute classifiers), a temporal extraction system (CRF name entity extraction, their corresponding attribute classifiers, and context-free grammar based normalization system), and a temporal relation system (10 multi-support vector machine (SVM) classifiers and a Markov logic networks inference system) using labeled sequential pattern mining, syntactic structures based on parse trees, and results from a coordination classifier. Micro-averaged precision (P), recall (R), averaged P&R (P&R), and F measure (F) were used to evaluate results.

RESULTS

For event extraction, the system achieved 0.9415 (P), 0.8930 (R), 0.9166 (P&R), and 0.9166 (F). The accuracies of their type, polarity, and modality were 0.8574, 0.8585, and 0.8560, respectively. For timex extraction, the system achieved 0.8818, 0.9489, 0.9141, and 0.9141, respectively. The accuracies of their type, value, and modifier were 0.8929, 0.7170, and 0.8907, respectively. For temporal relation, the system achieved 0.6589, 0.7129, 0.6767, and 0.6849, respectively. For end-to-end temporal relation, it achieved 0.5904, 0.5944, 0.5921, and 0.5924, respectively. With the F measure used for evaluation, we were ranked first out of 14 competing teams (event extraction), first out of 14 teams (timex extraction), third out of 12 teams (temporal relation), and second out of seven teams (end-to-end temporal relation).

CONCLUSIONS

The system achieved encouraging results, demonstrating the feasibility of the tasks defined by the i2b2 organizers. The experiment result demonstrates that both global and local information is useful in the 2012 challenge.

Collapse

Denny JC. Chapter 13: Mining electronic health records in the genomics era. PLoS Comput Biol 2012;8:e1002823. [PMID: 23300414 PMCID: PMC3531280 DOI: 10.1371/journal.pcbi.1002823] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Abstract

Abstract: The combination of improved genomic analysis methods, decreasing genotyping costs, and increasing computing resources has led to an explosion of clinical genomic knowledge in the last decade. Similarly, healthcare systems are increasingly adopting robust electronic health record (EHR) systems that not only can improve health care, but also contain a vast repository of disease and treatment data that could be mined for genomic research. Indeed, institutions are creating EHR-linked DNA biobanks to enable genomic and pharmacogenomic research, using EHR data for phenotypic information. However, EHRs are designed primarily for clinical care, not research, so reuse of clinical EHR data for research purposes can be challenging. Difficulties in use of EHR data include: data availability, missing data, incorrect data, and vast quantities of unstructured narrative text data. Structured information includes billing codes, most laboratory reports, and other variables such as physiologic measurements and demographic information. Significant information, however, remains locked within EHR narrative text documents, including clinical notes and certain categories of test results, such as pathology and radiology reports. For relatively rare observations, combinations of simple free-text searches and billing codes may prove adequate when followed by manual chart review. However, to extract the large cohorts necessary for genome-wide association studies, natural language processing methods to process narrative text data may be needed. Combinations of structured and unstructured textual data can be mined to generate high-validity collections of cases and controls for a given condition. Once high-quality cases and controls are identified, EHR-derived cases can be used for genomic discovery and validation. Since EHR data includes a broad sampling of clinically-relevant phenotypic information, it may enable multiple genomic investigations upon a single set of genotyped individuals. This chapter reviews several examples of phenotype extraction and their application to genetic research, demonstrating a viable future for genomic discovery using EHR-linked data.

Collapse

Xu Y, Tsujii J, Chang EIC. Named entity recognition of follow-up and time information in 20,000 radiology reports. J Am Med Inform Assoc 2012;19:792-9. [PMID: 22771530 DOI: 10.1136/amiajnl-2012-000812] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open

Reeves RM, Ong FR, Matheny ME, Denny JC, Aronsky D, Gobbel GT, Montella D, Speroff T, Brown SH. Detecting temporal expressions in medical narratives. Int J Med Inform 2012;82:118-27. [PMID: 22595284 DOI: 10.1016/j.ijmedinf.2012.04.006] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2012] [Revised: 03/30/2012] [Accepted: 04/12/2012] [Indexed: 12/27/2022]

Abstract

BACKGROUND

Clinical practice and epidemiological information aggregation require knowing when, how long, and in what sequence medically relevant events occur. The Temporal Awareness and Reasoning Systems for Question Interpretation (TARSQI) Toolkit (TTK) is a complete, open source software package for the temporal ordering of events within narrative text documents. TTK was developed on newspaper articles. We extended TTK to support medical notes using veterans' affairs (VA) clinical notes and compared it to TTK.

METHODS

We used a development set consisting of 200 VA clinical notes to modify and append rules to TTK's time tagger, creating Med-TTK. We then evaluated the performances of TTK and Med-TTK on an independent random selection of 100 clinical notes. Evaluation tasks were to identify and classify time-referring expressions as one of four temporal classes (DATE, TIME, DURATION, and SET). The reference standard for this test set was generated by dual human manual review with disagreements resolved by a third reviewer. Outcome measures included recall and precision for each class, and inter-rater agreement scores.

RESULTS

There were 3146 temporal expressions in the reference standard. TTK identified 1595 temporal expressions. Recall was 0.15 (95% confidence interval [CI] 0.12-0.15) and precision was 0.27 (95% CI 0.25-0.29) for TTK. Med-TTK identified 3174 expressions. Recall was 0.86 (95% CI 0.84-0.87) and precision was 0.85 (95% CI 0.84-0.86) for Med-TTK.

CONCLUSION

The algorithms for identifying and classifying temporal expressions in medical narratives developed within Med-TTK significantly improved performance compared to TTK. Natural language processing applications such as Med-TTK provide a foundation for meaningful longitudinal mapping of patient history events among electronic health records. The tool can be accessed at the following site: http://code.google.com/p/med-ttk/.

Collapse

Kahn MG, Weng C. Clinical research informatics: a conceptual perspective. J Am Med Inform Assoc 2012;19:e36-42. [PMID: 22523344 PMCID: PMC3392857 DOI: 10.1136/amiajnl-2012-000968] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open

Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA, Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify rheumatoid arthritis in electronic health records. J Am Med Inform Assoc 2012;19:e162-9. [PMID: 22374935 DOI: 10.1136/amiajnl-2011-000583] [Citation(s) in RCA: 164] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Denny JC, Choma NN, Peterson JF, Miller RA, Bastarache L, Li M, Peterson NB. Natural language processing improves identification of colorectal cancer testing in the electronic medical record. Med Decis Making 2012;32:188-197. [PMID: 21393557 PMCID: PMC9616628 DOI: 10.1177/0272989x11400418] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/16/2023]