Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Percha B. Modern Clinical Text Mining: A Guide and Review. Annu Rev Biomed Data Sci 2021;4:165-187. [PMID: 34465177 DOI: 10.1146/annurev-biodatasci-030421-030931] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

Seinen TM, Kors JA, van Mulligen EM, Rijnbeek PR. Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools. J Am Med Inform Assoc 2024;31:1725-1734. [PMID: 38934643 PMCID: PMC11258409 DOI: 10.1093/jamia/ocae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/24/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open

Abstract

OBJECTIVE

To explore the feasibility of validating Dutch concept extraction tools using annotated corpora translated from English, focusing on preserving annotations during translation and addressing the scarcity of non-English annotated clinical corpora.

MATERIALS AND METHODS

Three annotated corpora were standardized and translated from English to Dutch using 2 machine translation services, Google Translate and OpenAI GPT-4, with annotations preserved through a proposed method of embedding annotations in the text before translation. The performance of 2 concept extraction tools, MedSpaCy and MedCAT, was assessed across the corpora in both Dutch and English.

RESULTS

The translation process effectively generated Dutch annotated corpora and the concept extraction tools performed similarly in both English and Dutch. Although there were some differences in how annotations were preserved across translations, these did not affect extraction accuracy. Supervised MedCAT models consistently outperformed unsupervised models, whereas MedSpaCy demonstrated high recall but lower precision.

DISCUSSION

Our validation of Dutch concept extraction tools on corpora translated from English was successful, highlighting the efficacy of our annotation preservation method and the potential for efficiently creating multilingual corpora. Further improvements and comparisons of annotation preservation techniques and strategies for corpus synthesis could lead to more efficient development of multilingual corpora and accurate non-English concept extraction tools.

CONCLUSION

This study has demonstrated that translated English corpora can be used to validate non-English concept extraction tools. The annotation preservation method used during translation proved effective, and future research can apply this corpus translation method to additional languages and clinical settings.

Collapse

Gu S, Lee EW, Zhang W, Simpson RL, Hertzberg VS, Ho JC. Evaluating Natural Language Processing Packages for Predicting Hospital-Acquired Pressure Injuries From Clinical Notes. Comput Inform Nurs 2024;42:184-192. [PMID: 37607706 PMCID: PMC10884344 DOI: 10.1097/cin.0000000000001053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023]

Han L, Gladkoff S, Erofeev G, Sorokina I, Galiano B, Nenadic G. Neural machine translation of clinical text: an empirical investigation into multilingual pre-trained language models and transfer-learning. Front Digit Health 2024;6:1211564. [PMID: 38468693 PMCID: PMC10926203 DOI: 10.3389/fdgth.2024.1211564] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 01/12/2024] [Indexed: 03/13/2024] Open

Abstract

Clinical text and documents contain very rich information and knowledge in healthcare, and their processing using state-of-the-art language technology becomes very important for building intelligent systems for supporting healthcare and social good. This processing includes creating language understanding models and translating resources into other natural languages to share domain-specific cross-lingual knowledge. In this work, we conduct investigations on clinical text machine translation by examining multilingual neural network models using deep learning such as Transformer based structures. Furthermore, to address the language resource imbalance issue, we also carry out experiments using a transfer learning methodology based on massive multilingual pre-trained language models (MMPLMs). The experimental results on three sub-tasks including (1) clinical case (CC), (2) clinical terminology (CT), and (3) ontological concept (OC) show that our models achieved top-level performances in the ClinSpEn-2022 shared task on English-Spanish clinical domain data. Furthermore, our expert-based human evaluations demonstrate that the small-sized pre-trained language model (PLM) outperformed the other two extra-large language models by a large margin in the clinical domain fine-tuning, which finding was never reported in the field. Finally, the transfer learning method works well in our experimental setting using the WMT21fb model to accommodate a new language space Spanish that was not seen at the pre-training stage within WMT21fb itself, which deserves more exploitation for clinical knowledge transformation, e.g. to investigate into more languages. These research findings can shed some light on domain-specific machine translation development, especially in clinical and healthcare fields. Further research projects can be carried out based on our work to improve healthcare text analytics and knowledge transformation. Our data is openly available for research purposes at: https://github.com/HECTA-UoM/ClinicalNMT.

Collapse

Chien A, Tang H, Jagessar B, Chang KW, Peng N, Nael K, Salamon N. AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical. AJNR Am J Neuroradiol 2024;45:244-248. [PMID: 38238092 DOI: 10.3174/ajnr.a8102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 11/09/2023] [Indexed: 02/09/2024]

Abstract

BACKGROUND AND PURPOSE

The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization.

MATERIALS AND METHODS

We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts.

RESULTS

In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively).

CONCLUSIONS

AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.

Collapse

Tsai CH, Liu KH, Cheng DC. Remote Diagnosis on Upper Respiratory Tract Infections Based on a Neural Network with Few Symptom Words-A Feasibility Study. Diagnostics (Basel) 2024;14:329. [PMID: 38337845 PMCID: PMC10855815 DOI: 10.3390/diagnostics14030329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Revised: 01/23/2024] [Accepted: 01/30/2024] [Indexed: 02/12/2024] Open

Hacking C, Verbeek H, Hamers JPH, Aarts S. Comparing text mining and manual coding methods: Analysing interview data on quality of care in long-term care for older adults. PLoS One 2023;18:e0292578. [PMID: 37939098 PMCID: PMC10631650 DOI: 10.1371/journal.pone.0292578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/24/2023] [Indexed: 11/10/2023] Open

Szekér S, Fogarassy G, Vathy-Fogarassy Á. A general text mining method to extract echocardiography measurement results from echocardiography documents. Artif Intell Med 2023;143:102584. [PMID: 37673570 DOI: 10.1016/j.artmed.2023.102584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 03/08/2023] [Accepted: 05/16/2023] [Indexed: 09/08/2023]

Abstract

BACKGROUND

In everyday medical practice, the results of cardiac ultrasound examinations are generally recorded in unstructured text, from which extracting relevant information is an important and challenging task. This paper presents a generally applicable language and corpus-independent text mining method for extracting and structuring numerical measurement results and their descriptions from echocardiography reports.

METHOD

The developed method is based on generally applicable text mining preprocessing activities, it automatically identifies and standardizes the descriptions of the cardiac ultrasound measures, and it stores the extracted and standardized measurement descriptions with their measurement results in a structured form for later usage. The method does not contain any regular expression-based search and does not rely on information about the structure of the document.

RESULTS

The method has been tested on a document set containing more than 20,000 echocardiographic reports by examining the efficiency of extracting 12 echocardiography parameters considered important by experts. The method extracted and structured the echocardiography parameters under the study with good sensitivity (lowest value: 0.775, highest value: 1.0, average: 0.904) and excellent specificity (for all cases 1.0). The F1 score ranged between 0.873 and 1.0, and its average value was 0.948.

CONCLUSION

The presented case study has shown that the proposed method can extract measurement results from echocardiography documents with high confidence without performing a direct search or having detailed information about the data recording habits. Furthermore, it effectively handles spelling errors, abbreviations and the highly varied terminology used in descriptions. As it does not rely on any information related to the structure or the language of the documents or data recording habits, it can be applied for processing any free-text written medical texts.

Collapse

Frei J, Kramer F. German Medical Named Entity Recognition Model and Data Set Creation Using Machine Translation and Word Alignment: Algorithm Development and Validation. JMIR Form Res 2023;7:e39077. [PMID: 36853741 PMCID: PMC10015355 DOI: 10.2196/39077] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Revised: 09/11/2022] [Accepted: 11/03/2022] [Indexed: 11/06/2022] Open

Abstract

BACKGROUND

Data mining in the field of medical data analysis often needs to rely solely on the processing of unstructured data to retrieve relevant data. For German natural language processing, few open medical neural named entity recognition (NER) models have been published before this work. A major issue can be attributed to the lack of German training data.

OBJECTIVE

We developed a synthetic data set and a novel German medical NER model for public access to demonstrate the feasibility of our approach. In order to bypass legal restrictions due to potential data leaks through model analysis, we did not make use of internal, proprietary data sets, which is a frequent veto factor for data set publication.

METHODS

The underlying German data set was retrieved by translation and word alignment of a public English data set. The data set served as a foundation for model training and evaluation. For demonstration purposes, our NER model follows a simple network architecture that is designed for low computational requirements.

RESULTS

The obtained data set consisted of 8599 sentences including 30,233 annotations. The model achieved a class frequency-averaged F₁ score of 0.82 on the test set after training across 7 different NER types. Artifacts in the synthesized data set with regard to translation and alignment induced by the proposed method were exposed. The annotation performance was evaluated on an external data set and measured in comparison with an existing baseline model that has been trained on a dedicated German data set in a traditional fashion. We discussed the drop in annotation performance on an external data set for our simple NER model. Our model is publicly available.

CONCLUSIONS

We demonstrated the feasibility of obtaining a data set and training a German medical NER model by the exclusive use of public training data through our suggested method. The discussion on the limitations of our approach includes ways to further mitigate remaining problems in future work.

Collapse

López B, Raya O, Baykova E, Saez M, Rigau D, Cunill R, Mayoral S, Carrion C, Serrano D, Castells X. APPRAISE-RS: Automated, updated, participatory, and personalized treatment recommender systems based on GRADE methodology. Heliyon 2023;9:e13074. [PMID: 36798764 PMCID: PMC9925880 DOI: 10.1016/j.heliyon.2023.e13074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 01/04/2023] [Accepted: 01/16/2023] [Indexed: 01/26/2023] Open

van Laar SA, Kapiteijn E, Gombert-Handoko KB, Guchelaar HJ, Zwaveling J. Application of Electronic Health Record Text Mining: Real-World Tolerability, Safety, and Efficacy of Adjuvant Melanoma Treatments. Cancers (Basel) 2022;14:5426. [PMID: 36358844 PMCID: PMC9657798 DOI: 10.3390/cancers14215426] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/31/2022] [Accepted: 11/02/2022] [Indexed: 08/13/2023] Open

Abstract

Introduction: Nivolumab (N), pembrolizumab (P), and dabrafenib plus trametinib (D + T) have been registered as adjuvant treatments for resected stage III and IV melanoma since 2018. Electronic health records (EHRs) are a real-world data source that can be used to review treatments in clinical practice. In this study, we applied EHR text-mining software to evaluate the real-world tolerability, safety, and efficacy of adjuvant melanoma treatments. Methods: Adult melanoma patients receiving adjuvant treatment between January 2019 and October 2021 at the Leiden University Medical Center, the Netherlands, were included. CTcue text-mining software (v3.1.0, CTcue B.V., Amsterdam, The Netherlands) was used to construct rule-based queries and perform context analysis for patient inclusion and data collection from structured and unstructured EHR data. Results: In total, 122 patients were included: 54 patients treated with nivolumab, 48 with pembrolizumab, and 20 with D + T. Significantly more patients discontinued treatment due to toxicity on D + T (N: 16%, P: 6%, D + T: 40%), and X2 (6, n = 122) = 14.6 and p = 0.024. Immune checkpoint inhibitors (ICIs) mainly showed immune-related treatment-limiting adverse events (AEs), and chronic thyroid-related AE occurred frequently (hyperthyroidism: N: 15%, P: 13%, hypothyroidism: N: 20%, P: 19%). Treatment-limiting toxicity from D + T was primarily a combination of reversible AEs, including pyrexia and fatigue. The 1-year recurrence-free survival was 70.3% after nivolumab, 72.4% after pembrolizumab, and 83.0% after D + T. Conclusions: Text-mining EHR is a valuable method to collect real-world data to evaluate adjuvant melanoma treatments. ICIs were better tolerated than D + T, in line with RCT results. For BRAF+ patients, physicians must weigh the higher risk of reversible treatment-limiting AEs of D + T against the risk of long-term immune-related AEs.

Collapse

Frei J, Soto-Rey I, Kramer F. DrNote: An open medical annotation service. PLOS DIGITAL HEALTH 2022;1:e0000086. [PMID: 36812581 PMCID: PMC9931362 DOI: 10.1371/journal.pdig.0000086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 07/12/2022] [Indexed: 11/19/2022]

Zhao Y, Ren B, Yu W, Zhang H, Zhao D, Lv J, Xie Z, Jiang K, Shang L, Yao H, Xu Y, Zhao G. Construction of an Assisted Model Based on Natural Language Processing for Automatic Early Diagnosis of Autoimmune Encephalitis. Neurol Ther 2022;11:1117-1134. [PMID: 35543808 PMCID: PMC9338198 DOI: 10.1007/s40120-022-00355-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 04/07/2022] [Indexed: 11/25/2022] Open

Abstract

Introduction

Early diagnosis and etiological treatment can effectively improve the prognosis of patients with autoimmune encephalitis (AE). However, anti-neuronal antibody tests which provide the definitive diagnosis require time and are not always abnormal. By using natural language processing (NLP) technology, our study proposes an assisted diagnostic method for early clinical diagnosis of AE and compares its sensitivity with that of previously established criteria.

Methods

Our model is based on the text classification model trained by the history of present illness (HPI) in electronic medical records (EMRs) that present a definite pathological diagnosis of AE or infectious encephalitis (IE). The definitive diagnosis of IE was based on the results of traditional etiological examinations. The definitive diagnosis of AE was based on the results of neuronal antibodies, and the diagnostic criteria of definite autoimmune limbic encephalitis proposed by Graus et al. used as the reference standard for antibody-negative AE. First, we automatically recognized and extracted symptoms for all HPI texts in EMRs by training a dataset of 552 cases. Second, four text classification models trained by a dataset of 199 cases were established for differential diagnosis of AE and IE based on a post-structuring text dataset of every HPI, which was completed using symptoms in English language after the process of normalization of synonyms. The optimal model was identified by evaluating and comparing the performance of the four models. Finally, combined with three typical symptoms and the results of standard paraclinical tests such as cerebrospinal fluid (CSF), magnetic resonance imaging (MRI), or electroencephalogram (EEG) proposed from Graus criteria, an assisted early diagnostic model for AE was established on the basis of the text classification model with the best performance.

Results

The comparison results for the four models applied to the independent testing dataset showed the naïve Bayesian classifier with bag of words achieved the best performance, with an area under the receiver operating characteristic curve of 0.85, accuracy of 84.5% (95% confidence interval [CI] 74.0–92.0%), sensitivity of 86.7% (95% CI 69.3–96.2%), and specificity of 82.9% (95% CI 67.9–92.8%), respectively. Compared with the diagnostic criteria proposed previously, the early diagnostic sensitivity for possible AE using the assisted diagnostic model based on the independent testing dataset was improved from 73.3% (95% CI 54.1–87.7%) to 86.7% (95% CI 69.3–96.2%).

Conclusions

The assisted diagnostic model could effectively increase the early diagnostic sensitivity for AE compared to previous diagnostic criteria, assist physicians in establishing the diagnosis of AE automatically after inputting the HPI and the results of standard paraclinical tests according to their narrative habits for describing symptoms, avoiding misdiagnosis and allowing for prompt initiation of specific treatment.

Supplementary Information

The online version contains supplementary material available at 10.1007/s40120-022-00355-7.

Collapse

Percha B, Pisapati K, Gao C, Schmidt H. Natural language inference for curation of structured clinical registries from unstructured text. J Am Med Inform Assoc 2021;29:97-108. [PMID: 34791282 DOI: 10.1093/jamia/ocab243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 09/26/2021] [Accepted: 10/25/2021] [Indexed: 11/12/2022] Open