Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak 2015;15:91. [PMID: 26563260 DOI: 10.1186/s12911-015-0215-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 11/03/2015] [Indexed: 11/30/2022] Open

For:	Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak 2015;15:91. [PMID: 26563260 DOI: 10.1186/s12911-015-0215-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 11/03/2015] [Indexed: 11/30/2022] Open

Number

Cited by Other Article(s)

Frei J, Kramer F. German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation. JMIR Form Res 2023;7:e39077. [PMID: 36853741 PMCID: PMC10015355 DOI: 10.2196/39077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/11/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022] Open

Abstract

BACKGROUND

Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data.

OBJECTIVE

We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication.

METHODS

The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements.

RESULTS

The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency-averaged F₁ score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available.

CONCLUSIONS

We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work.

Collapse

Richter-Pechanski P, Geis NA, Kiriakou C, Schwab DM, Dieterich C. Automatic extraction of 12 cardiovascular concepts from German discharge letters using pre-trained language models. Digit Health 2021;7:20552076211057662. [PMID: 34868618 PMCID: PMC8637713 DOI: 10.1177/20552076211057662] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 10/15/2021] [Indexed: 11/17/2022] Open

Kittner M, Lamping M, Rieke DT, Götze J, Bajwa B, Jelas I, Rüter G, Hautow H, Sänger M, Habibi M, Zettwitz M, de Bortoli T, Ostermann L, Ševa J, Starlinger J, Kohlbacher O, Malek NP, Keilholz U, Leser U. Annotation and initial evaluation of a large annotated German oncological corpus. JAMIA Open 2021;4:ooab025. [PMID: 33898938 PMCID: PMC8054032 DOI: 10.1093/jamiaopen/ooab025] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 03/08/2021] [Accepted: 03/18/2021] [Indexed: 11/15/2022] Open

Affiliation(s)

Madeleine Kittner Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Mario Lamping Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Damian T Rieke Department of Hematology, Oncology and Cancer Immunology, Campus Benjamin Franklin, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
Julian Götze Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
Bariya Bajwa Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
Ivan Jelas Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Gina Rüter Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Hanjo Hautow Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Mario Sänger Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Maryam Habibi Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Marit Zettwitz Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Till de Bortoli Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Leonie Ostermann Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
Jurica Ševa Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Johannes Starlinger Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany
Oliver Kohlbacher Institut für Translationale Bioinformatik, Universitätsklinikum Tübingen, Tübingen, Germany Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany Department of Computer Science, University of Tübingen, Tübingen, Germany Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany
Nisar P Malek Innere Medizin I, Universitätsklinikum Tübingen, Tübingen, Germany
Ulrich Keilholz Charité Comprehensive Cancer Center, Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
Ulf Leser Knowledge Management for Bioinformatics, Humboldt Universität zu Berlin, Berlin, Germany

Collapse

König M, Sander A, Demuth I, Diekmann D, Steinhagen-Thiessen E. Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters. PLoS One 2019;14:e0224916. [PMID: 31774830 PMCID: PMC6881027 DOI: 10.1371/journal.pone.0224916] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Accepted: 10/24/2019] [Indexed: 12/26/2022] Open

Yehia E, Boshnak H, AbdelGaber S, Abdo A, Elzanfaly DS. Ontology-based clinical information extraction from physician's free-text notes. J Biomed Inform 2019;98:103276. [PMID: 31473365 DOI: 10.1016/j.jbi.2019.103276] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2019] [Revised: 08/17/2019] [Accepted: 08/28/2019] [Indexed: 11/18/2022]

Exploration of Artificial Intelligence Use with ARIES in Multiple Myeloma Research. J Clin Med 2019;8:jcm8070999. [PMID: 31324026 PMCID: PMC6678083 DOI: 10.3390/jcm8070999] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2019] [Revised: 07/03/2019] [Accepted: 07/05/2019] [Indexed: 12/31/2022] Open

Sander A, Wauer R. Integrating terminologies into standard SQL: a new approach for research on routine data. J Biomed Semantics 2019;10:7. [PMID: 31014403 PMCID: PMC6480592 DOI: 10.1186/s13326-019-0199-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 03/26/2019] [Indexed: 11/30/2022] Open

Dietrich G, Krebs J, Fette G, Ertl M, Kaspar M, Störk S, Puppe F. Ad Hoc Information Extraction for Clinical Data Warehouses. Methods Inf Med 2018;57:e22-e29. [PMID: 29801178 PMCID: PMC6193399 DOI: 10.3414/me17-02-0010] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Abstract

Background: Clinical Data Warehouses (CDW) reuse Electronic health records (EHR) to make their data retrievable for research purposes or patient recruitment for clinical trials. However, much information are hidden in unstructured data like discharge letters. They can be preprocessed and converted to structured data via information extraction (IE), which is unfortunately a laborious task and therefore usually not available for most of the text data in CDW.

Objectives: The goal of our work is to provide an ad hoc IE service that allows users to query text data ad hoc in a manner similar to querying structured data in a CDW. While search engines just return text snippets, our systems also returns frequencies (e.g. how many patients exist with “heart failure” including textual synonyms or how many patients have an LVEF < 45) based on the content of discharge letters or textual reports for special investigations like heart echo. Three subtasks are addressed: (1) To recognize and to exclude negations and their scopes, (2) to extract concepts, i.e. Boolean values and (3) to extract numerical values.

Methods: We implemented an extended version of the NegEx-algorithm for German texts that detects negations and determines their scope. Furthermore, our document oriented CDW PaDaWaN was extended with query functions, e.g. context sensitive queries and regex queries, and an extraction mode for computing the frequencies for Boolean and numerical values.

Results: Evaluations in chest X-ray reports and in discharge letters showed high F1-scores for the three subtasks: Detection of negated concepts in chest X-ray reports with an F1-score of 0.99 and in discharge letters with 0.97; of Boolean values in chest X-ray reports about 0.99, and of numerical values in chest X-ray reports and discharge letters also around 0.99 with the exception of the concept age.

Discussion: The advantages of an ad hoc IE over a standard IE are the low development effort (just entering the concept with its variants), the promptness of the results and the adaptability by the user to his or her particular question. Disadvantage are usually lower accuracy and confidence.

This ad hoc information extraction approach is novel and exceeds existing systems: Roogle [ 1 ] extracts predefined concepts from texts at preprocessing and makes them retrievable at runtime. Dr. Warehouse [ 2 ] applies negation detection and indexes the produced subtexts which include affirmed findings. Our approach combines negation detection and the extraction of concepts. But the extraction does not take place during preprocessing, but at runtime. That provides an ad hoc, dynamic, interactive and adjustable information extraction of random concepts and even their values on the fly at runtime.

Conclusions: We developed an ad hoc information extraction query feature for Boolean and numerical values within a CDW with high recall and precision based on a pipeline that detects and removes negations and their scope in clinical texts.

Collapse

Fusion architectures for automatic subject indexing under concept drift. INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES 2018. [DOI: 10.1007/s00799-018-0240-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]

Eisman AS, Weiner RB, Chen ES, Stey PC, Wadhera RK, Kithcart AP, Sarkar IN. An Automated System for Categorizing Transthoracic Echocardiography Indications According to the Echocardiography Appropriate Use Criteria. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2017:670-678. [PMID: 29854132 PMCID: PMC5977700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Deléger L, Campillos L, Ligozat AL, Névéol A. Design of an extensive information representation scheme for clinical narratives. J Biomed Semantics 2017;8:37. [PMID: 28893314 PMCID: PMC5594525 DOI: 10.1186/s13326-017-0135-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2016] [Accepted: 07/26/2017] [Indexed: 12/26/2022] Open

Kaspar M, Ertl M, Fette G, Dietrich G, Toepfer M, Angermann C, Störk S, Puppe F. Data Linkage from Clinical to Study Databases via an R Data Warehouse User Interface. Experiences from a Large Clinical Follow-up Study. Methods Inf Med 2016;55:381-6. [PMID: 27405886 DOI: 10.3414/me15-02-0015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2015] [Accepted: 06/15/2016] [Indexed: 11/09/2022]

Abstract

BACKGROUND

Data that needs to be documented for clinical studies has often been acquired and documented in clinical routine. Usually this data is manually transferred to Case Report Forms (CRF) and/or directly into an electronic data capture (EDC) system.

OBJECTIVES

To enhance the documentation process of a large clinical follow-up study targeting patients admitted for acutely decompensated heart failure by accessing the data created during routine and study visits from a hospital information system (HIS) and by transferring it via a data warehouse (DWH) into the study's EDC system.

METHODS

This project is based on the clinical DWH developed at the University of Würzburg. The DWH was extended by several new data domains including data created by the study team itself. An R user interface was developed for the DWH that allows to access its source data in all its detail, to transform data as comprehensively as possible by R into study-specific variables and to support the creation of data and catalog tables.

RESULTS

A data flow was established that starts with labeling patients as study patients within the HIS and proceeds with updating the DWH with this label and further data domains at a daily rate. Several study-specific variables were defined using the implemented R user interface of the DWH. This system was then used to export these variables as data tables ready for import into our EDC system. The data tables were then used to initialize the first 296 patients within the EDC system by pseudonym, visit and data values. Afterwards, these records were filled with clinical data on heart failure, vital parameters and time spent on selected wards.

CONCLUSIONS

This solution focuses on the comprehensive access and transformation of data for a DWH-EDC system linkage. Using this system in a large clinical study has demonstrated the feasibility of this approach for a study with a complex visit schedule.

Collapse

Erratum to: Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak 2015;15:103. [PMID: 26634244 PMCID: PMC4669642 DOI: 10.1186/s12911-015-0226-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 11/27/2015] [Indexed: 11/28/2022] Open