1
|
Frei J, Kramer F. German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation. JMIR Form Res 2023; 7:e39077. [PMID: 36853741 PMCID: PMC10015355 DOI: 10.2196/39077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/11/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022] Open
Abstract
BACKGROUND Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data. OBJECTIVE We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication. METHODS The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements. RESULTS The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency-averaged F1 score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available. CONCLUSIONS We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work.
Collapse
Affiliation(s)
- Johann Frei
- IT Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany
| | - Frank Kramer
- IT Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany
| |
Collapse
|
2
|
Richter-Pechanski P, Geis NA, Kiriakou C, Schwab DM, Dieterich C. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Digit Health 2021; 7:20552076211057662. [PMID: 34868618 PMCID: PMC8637713 DOI: 10.1177/20552076211057662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open
Abstract
Objective A vast amount of medical data is still stored in unstructured text documents.
We present an automated method of information extraction from German
unstructured clinical routine data from the cardiology domain enabling their
usage in state-of-the-art data-driven deep learning projects. Methods We evaluated pre-trained language models to extract a set of 12
cardiovascular concepts in German discharge letters. We compared three
bidirectional encoder representations from transformers pre-trained on
different corpora and fine-tuned them on the task of cardiovascular concept
extraction using 204 discharge letters manually annotated by cardiologists
at the University Hospital Heidelberg. We compared our results with
traditional machine learning methods based on a long short-term memory
network and a conditional random field. Results Our best performing model, based on publicly available German pre-trained
bidirectional encoder representations from the transformer model, achieved a
token-wise micro-average F1-score of 86% and outperformed the baseline by at
least 6%. Moreover, this approach achieved the best trade-off between
precision (positive predictive value) and recall (sensitivity). Conclusion Our results show the applicability of state-of-the-art deep learning methods
using pre-trained language models for the task of cardiovascular concept
extraction using limited training data. This minimizes annotation efforts,
which are currently the bottleneck of any application of data-driven deep
learning projects in the clinical domain for German and many other European
languages.
Collapse
Affiliation(s)
- Phillip Richter-Pechanski
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| | - Nicolas A Geis
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,Informatics for Life, Heidelberg, Germany
| | - Christina Kiriakou
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Dominic M Schwab
- Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg, Germany.,Department of Internal Medicine III, University Hospital Heidelberg, Heidelberg, Germany.,German Center for Cardiovascular Research (DZHK) - Partner Site Heidelberg/Mannheim, Mannheim, Germany.,Informatics for Life, Heidelberg, Germany
| |
Collapse
|
3
|
Kittner M, Lamping M, Rieke DT, Götze J, Bajwa B, Jelas I, Rüter G, Hautow H, Sänger M, Habibi M, Zettwitz M, de Bortoli T, Ostermann L, Ševa J, Starlinger J, Kohlbacher O, Malek NP, Keilholz U, Leser U. Annotation and initial evaluation of a large annotated German oncological corpus. JAMIA Open 2021; 4:ooab025. [PMID: 33898938 PMCID: PMC8054032 DOI: 10.1093/jamiaopen/ooab025] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 03/08/2021] [Accepted: 03/18/2021] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE We present the Berlin-Tübingen-Oncology corpus (BRONCO), a large and freely available corpus of shuffled sentences from German oncological discharge summaries annotated with diagnosis, treatments, medications, and further attributes including negation and speculation. The aim of BRONCO is to foster reproducible and openly available research on Information Extraction from German medical texts. MATERIALS AND METHODS BRONCO consists of 200 manually deidentified discharge summaries of cancer patients. Annotation followed a structured and quality-controlled process involving 2 groups of medical experts to ensure consistency, comprehensiveness, and high quality of annotations. We present results of several state-of-the-art techniques for different IE tasks as baselines for subsequent research. RESULTS The annotated corpus consists of 11 434 sentences and 89 942 tokens, annotated with 11 124 annotations for medical entities and 3118 annotations of related attributes. We publish 75% of the corpus as a set of shuffled sentences, and keep 25% as held-out data set for unbiased evaluation of future IE tools. On this held-out dataset, our baselines reach depending on the specific entity types F1-scores of 0.72-0.90 for named entity recognition, 0.10-0.68 for entity normalization, 0.55 for negation detection, and 0.33 for speculation detection. DISCUSSION Medical corpus annotation is a complex and time-consuming task. This makes sharing of such resources even more important. CONCLUSION To our knowledge, BRONCO is the first sizable and freely available German medical corpus. Our baseline results show that more research efforts are necessary to lift the quality of information extraction in German medical texts to the level already possible for English.
Collapse
Affiliation(s)
- Madeleine Kittner
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Mario Lamping
- Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Damian T Rieke
- Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
- Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Julian Götze
- Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
| | - Bariya Bajwa
- Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
| | - Ivan Jelas
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Gina Rüter
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Hanjo Hautow
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Mario Sänger
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Maryam Habibi
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Marit Zettwitz
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Till de Bortoli
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Leonie Ostermann
- Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
| | - Jurica Ševa
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Johannes Starlinger
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| | - Oliver Kohlbacher
- Institut für Translationale Bioinformatik, Universitätsklinikum Tübingen, Tübingen, Germany
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany
- Department of Computer Science, University of Tübingen, Tübingen, Germany
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Nisar P Malek
- Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
| | - Ulrich Keilholz
- Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Ulf Leser
- Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
| |
Collapse
|
4
|
König M, Sander A, Demuth I, Diekmann D, Steinhagen-Thiessen E. Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters. PLoS One 2019; 14:e0224916. [PMID: 31774830 PMCID: PMC6881027 DOI: 10.1371/journal.pone.0224916] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Accepted: 10/24/2019] [Indexed: 12/26/2022] Open
Abstract
Objectives The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine. Methods We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction “proton-pump inhibitor [PPI] use and osteoporosis”. Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard. Results Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%. Conclusion We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period.
Collapse
Affiliation(s)
- Maximilian König
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Department of Nephrology and Internal Intensive Care Medicine Berlin, Germany
- * E-mail:
| | - André Sander
- ID Information und Dokumentation im Gesundheitswesen GmbH, Berlin, Germany
| | - Ilja Demuth
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
- Charité - Universitätsmedizin Berlin, BCRT—Berlin Institute of Health Center for Regenerative Therapies, Berlin, Germany
| | - Daniel Diekmann
- ID Information und Dokumentation im Gesundheitswesen GmbH, Berlin, Germany
| | - Elisabeth Steinhagen-Thiessen
- Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Lipid Clinic at Interdisciplinary Metabolism Center, Berlin, Germany
| |
Collapse
|
5
|
Yehia E, Boshnak H, AbdelGaber S, Abdo A, Elzanfaly DS. Ontology-based clinical information extraction from physician's free-text notes. J Biomed Inform 2019; 98:103276. [PMID: 31473365 DOI: 10.1016/j.jbi.2019.103276] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 08/17/2019] [Accepted: 08/28/2019] [Indexed: 11/18/2022]
Abstract
Documenting clinical notes in electronic health records might affect physician's workflow. In this paper, an Ontology-based clinical information extraction system, OB-CIE, has been developed. OB-CIE system provides a method for extracting clinical concepts from physician's free-text notes and converts the unstructured clinical notes to structured information to be accessed in electronic health records. OB-CIE system can help physicians to document visit notes without changing their workflow. For recognizing named entities of clinical concepts, ontology concepts have been used to construct a dictionary of semantic categories, then, exact dictionary matching method has been used to match noun phrases to their semantic categories. A rule-based approach has been used to classify clinical sentences to their predefined categories. The system evaluation results have achieved an F-measure of 94.90% and 97.80% for concepts classification and sentences classification, respectively. The results have showed that OB-CIE system performed well on extracting clinical concepts compared with data mining techniques. The system can be used in another field by adapting its ontology and extraction rule set.
Collapse
Affiliation(s)
- Engy Yehia
- Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Cairo, Egypt; Business Information Systems Department, Faculty of Commerce and Business Administration, Helwan University, Helwan, Cairo, Egypt.
| | - Hussein Boshnak
- General Surgery Department, Faculty of Medicine, Ain Shams University, Cairo, Egypt
| | - Sayed AbdelGaber
- Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Cairo, Egypt
| | - Amany Abdo
- Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Cairo, Egypt
| | - Doaa S Elzanfaly
- Information Systems Department, Faculty of Computers and Information, Helwan University, Helwan, Cairo, Egypt
| |
Collapse
|
6
|
Exploration of Artificial Intelligence Use with ARIES in Multiple Myeloma Research. J Clin Med 2019; 8:jcm8070999. [PMID: 31324026 PMCID: PMC6678083 DOI: 10.3390/jcm8070999] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 07/03/2019] [Accepted: 07/05/2019] [Indexed: 12/31/2022] Open
Abstract
Background: Natural language processing (NLP) is a powerful tool supporting the generation of Real-World Evidence (RWE). There is no NLP system that enables the extensive querying of parameters specific to multiple myeloma (MM) out of unstructured medical reports. We therefore created a MM-specific ontology to accelerate the information extraction (IE) out of unstructured text. Methods: Our MM ontology consists of extensive MM-specific and hierarchically structured attributes and values. We implemented “A Rule-based Information Extraction System” (ARIES) that uses this ontology. We evaluated ARIES on 200 randomly selected medical reports of patients diagnosed with MM. Results: Our system achieved a high F1-Score of 0.92 on the evaluation dataset with a precision of 0.87 and recall of 0.98. Conclusions: Our rule-based IE system enables the comprehensive querying of medical reports. The IE accelerates the extraction of data and enables clinicians to faster generate RWE on hematological issues. RWE helps clinicians to make decisions in an evidence-based manner. Our tool easily accelerates the integration of research evidence into everyday clinical practice.
Collapse
|
7
|
Sander A, Wauer R. Integrating terminologies into standard SQL: a new approach for research on routine data. J Biomed Semantics 2019; 10:7. [PMID: 31014403 PMCID: PMC6480592 DOI: 10.1186/s13326-019-0199-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/26/2019] [Indexed: 11/30/2022] Open
Abstract
Background Most electronic medical records still contain large amounts of free-text data. Semantic evaluation of such data requires the data to be encoded with sufficient classifications or transformed into a knowledge-based database. Methods We present an approach that allows databases accessible via SQL (Structured Query Language) to be searched directly through semantic queries without the need for further transformations. Therefore, we developed I) an extension to SQL named Ontology-SQL (O-SQL) that allows to use semantic expressions, II) a framework that uses a standard terminology server to annotate free-text containing database tables and III) a parser that rewrites O-SQL to SQL, so that such queries can be passed to the database server. Results I) We compared several semantic queries published to date and were able to reproduce them in a reduced, highly condensed form. II) The quality of the annotation process was measured against manual annotation, and we found a sensitivity of 97.62% and a specificity of 100.00%. III) Different semantic queries were analyzed, and measured with F-scores between 0.91 and 0.98. Conclusions We showed that systematic analysis of free-text-containing medical records is possible with standard tools. The seamless connection of ontologies and standard technologies from the database field represents an important constituent of unstructured data analysis. The developed technology can be readily applied to relationally organized data and supports the increasingly important field of translational research. Electronic supplementary material The online version of this article (10.1186/s13326-019-0199-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- André Sander
- ID GmbH & Co. KGaA, Platz vor dem Neuen Tor 2, 10115, Berlin, Germany.
| | - Roland Wauer
- Klinik für Neonatologie, Charité-Universitätsmedizin Berlin, 10098, Berlin, Germany
| |
Collapse
|
8
|
Dietrich G, Krebs J, Fette G, Ertl M, Kaspar M, Störk S, Puppe F. Ad Hoc Information Extraction for Clinical Data Warehouses. Methods Inf Med 2018; 57:e22-e29. [PMID: 29801178 PMCID: PMC6193399 DOI: 10.3414/me17-02-0010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Background:
Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW.
Objectives:
The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values.
Methods:
We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values.
Results:
Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age.
Discussion:
The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.
This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [
1
] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [
2
] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime.
Conclusions:
We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.
Collapse
Affiliation(s)
- Georg Dietrich
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
- Correspondence to: Georg Dietrich University of WuerzburgComputer ScienceAm Hubland97070 WuerzburgGermany
| | - Jonathan Krebs
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
| | - Georg Fette
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Maximilian Ertl
- Service Center Medical Informatics, University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Mathias Kaspar
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Stefan Störk
- Comprehensive Heart Failure Center (CHFC), University Hospital of Wuerzburg, Wuerzburg, Germany
| | - Frank Puppe
- Computer Science, University of Wuerzburg, Wuerzburg, Germany
| |
Collapse
|
9
|
Fusion architectures for automatic subject indexing under concept drift. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2018. [DOI: 10.1007/s00799-018-0240-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
10
|
Eisman AS, Weiner RB, Chen ES, Stey PC, Wadhera RK, Kithcart AP, Sarkar IN. An Automated System for Categorizing Transthoracic Echocardiography Indications According to the Echocardiography Appropriate Use Criteria. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2017:670-678. [PMID: 29854132 PMCID: PMC5977700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The Echocardiography Appropriate Use Criteria (EAUC) are a set of indications for transthoracic echocardiography (TTE) developed to guide physician decision making around ordering of TTE. In this study, an automated rule-based method for processing "indications" listed within TTE reports and classification into one of the major EAUC categories was developed and validated against a clinician-annotated reference standard. The system performed at a comparable level to trained physicians allowing for the automated classification of more than 30,000 TTE indications from a public database in less than ten minutes. The most common indication for TTE was Valvular assessment closely followed by General. Hypertension/Heart Failure/Cardiomyopathy, Acute, and Cardiac Structure assessment each contributed more than ten percent within this patient population. These results suggest potential for automated approaches for tracking appropriate use of TTE, as well as guide the development of systems for prospectively identifying when TTE use is recommended.
Collapse
Affiliation(s)
- Aaron S Eisman
- Center for Biomedical Informatics, Brown University, Providence, RI
| | | | - Elizabeth S Chen
- Center for Biomedical Informatics, Brown University, Providence, RI
| | - Paul C Stey
- Center for Biomedical Informatics, Brown University, Providence, RI
| | | | | | | |
Collapse
|
11
|
Deléger L, Campillos L, Ligozat AL, Névéol A. Design of an extensive information representation scheme for clinical narratives. J Biomed Semantics 2017; 8:37. [PMID: 28893314 PMCID: PMC5594525 DOI: 10.1186/s13326-017-0135-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 07/26/2017] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Knowledge representation frameworks are essential to the understanding of complex biomedical processes, and to the analysis of biomedical texts that describe them. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records (EHRs). This work aims to develop an extensive information representation scheme for clinical information contained in EHR narratives, and to support secondary use of EHR narrative data to answer clinical questions. METHODS We review recent work that proposed information representation schemes and applied them to the analysis of clinical narratives. We then propose a unifying scheme that supports the extraction of information to address a large variety of clinical questions. RESULTS We devised a new information representation scheme for clinical narratives that comprises 13 entities, 11 attributes and 37 relations. The associated annotation guidelines can be used to consistently apply the scheme to clinical narratives and are https://cabernet.limsi.fr/annotation_guide_for_the_merlot_french_clinical_corpus-Sept2016.pdf . CONCLUSION The information scheme includes many elements of the major schemes described in the clinical natural language processing literature, as well as a uniquely detailed set of relations.
Collapse
Affiliation(s)
- Louise Deléger
- French National Institute for Agricultural Research (INRA), Domaine de Vilvert, Jouy en Josas, Paris, 78352, France.,LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France
| | - Leonardo Campillos
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France
| | - Anne-Laure Ligozat
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France.,ENSIIE, 1 square de la résistance, Évry Cedex, 91025, France
| | - Aurélie Névéol
- LIMSI, CNRS, Université Paris - Saclay, Rue John von Neumann, Orsay, 91405, France.
| |
Collapse
|
12
|
Kaspar M, Ertl M, Fette G, Dietrich G, Toepfer M, Angermann C, Störk S, Puppe F. Data Linkage from Clinical to Study Databases via an R Data Warehouse User Interface. Experiences from a Large Clinical Follow-up Study. Methods Inf Med 2016; 55:381-6. [PMID: 27405886 DOI: 10.3414/me15-02-0015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 06/15/2016] [Indexed: 11/09/2022]
Abstract
BACKGROUND Data that needs to be documented for clinical studies has often been acquired and documented in clinical routine. Usually this data is manually transferred to Case Report Forms (CRF) and/or directly into an electronic data capture (EDC) system. OBJECTIVES To enhance the documentation process of a large clinical follow-up study targeting patients admitted for acutely decompensated heart failure by accessing the data created during routine and study visits from a hospital information system (HIS) and by transferring it via a data warehouse (DWH) into the study's EDC system. METHODS This project is based on the clinical DWH developed at the University of Würzburg. The DWH was extended by several new data domains including data created by the study team itself. An R user interface was developed for the DWH that allows to access its source data in all its detail, to transform data as comprehensively as possible by R into study-specific variables and to support the creation of data and catalog tables. RESULTS A data flow was established that starts with labeling patients as study patients within the HIS and proceeds with updating the DWH with this label and further data domains at a daily rate. Several study-specific variables were defined using the implemented R user interface of the DWH. This system was then used to export these variables as data tables ready for import into our EDC system. The data tables were then used to initialize the first 296 patients within the EDC system by pseudonym, visit and data values. Afterwards, these records were filled with clinical data on heart failure, vital parameters and time spent on selected wards. CONCLUSIONS This solution focuses on the comprehensive access and transformation of data for a DWH-EDC system linkage. Using this system in a large clinical study has demonstrated the feasibility of this approach for a study with a complex visit schedule.
Collapse
Affiliation(s)
- Mathias Kaspar
- Dr. Mathias Kaspar, Comprehensive Heart Failure Center / DZHI, University Hospital of Würzburg, Straubmühlweg 2a, Haus A9, 97078 Würzburg, Germany, E-mail:
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Erratum to: Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak 2015; 15:103. [PMID: 26634244 PMCID: PMC4669642 DOI: 10.1186/s12911-015-0226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 11/27/2015] [Indexed: 11/28/2022] Open
|