Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020;11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open

For:	Kersloot MG, van Putten FJP, Abu-Hanna A, Cornet R, Arts DL. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies. J Biomed Semantics 2020;11:14. [PMID: 33198814 PMCID: PMC7670625 DOI: 10.1186/s13326-020-00231-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 11/03/2020] [Indexed: 12/23/2022] Open

Number

Cited by Other Article(s)

Seinen TM, Kors JA, van Mulligen EM, Rijnbeek PR. Annotation-preserving machine translation of English corpora to validate Dutch clinical concept extraction tools. J Am Med Inform Assoc 2024:ocae159. [PMID: 38934643 DOI: 10.1093/jamia/ocae159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 05/24/2024] [Accepted: 06/10/2024] [Indexed: 06/28/2024] Open

Abstract

OBJECTIVE

To explore the feasibility of validating Dutch concept extraction tools using annotated corpora translated from English, focusing on preserving annotations during translation and addressing the scarcity of non-English annotated clinical corpora.

MATERIALS AND METHODS

Three annotated corpora were standardized and translated from English to Dutch using 2 machine translation services, Google Translate and OpenAI GPT-4, with annotations preserved through a proposed method of embedding annotations in the text before translation. The performance of 2 concept extraction tools, MedSpaCy and MedCAT, was assessed across the corpora in both Dutch and English.

RESULTS

The translation process effectively generated Dutch annotated corpora and the concept extraction tools performed similarly in both English and Dutch. Although there were some differences in how annotations were preserved across translations, these did not affect extraction accuracy. Supervised MedCAT models consistently outperformed unsupervised models, whereas MedSpaCy demonstrated high recall but lower precision.

DISCUSSION

Our validation of Dutch concept extraction tools on corpora translated from English was successful, highlighting the efficacy of our annotation preservation method and the potential for efficiently creating multilingual corpora. Further improvements and comparisons of annotation preservation techniques and strategies for corpus synthesis could lead to more efficient development of multilingual corpora and accurate non-English concept extraction tools.

CONCLUSION

This study has demonstrated that translated English corpora can be used to validate non-English concept extraction tools. The annotation preservation method used during translation proved effective, and future research can apply this corpus translation method to additional languages and clinical settings.

Collapse

Moreno AC, Bitterman DS. Toward Clinical-Grade Evaluation of Large Language Models. Int J Radiat Oncol Biol Phys 2024;118:916-920. [PMID: 38401979 PMCID: PMC11221761 DOI: 10.1016/j.ijrobp.2023.11.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 02/26/2024]

Irrera O, Marchesin S, Silvello G. MetaTron: advancing biomedical annotation empowering relation annotation and collaboration. BMC Bioinformatics 2024;25:112. [PMID: 38486137 PMCID: PMC10941452 DOI: 10.1186/s12859-024-05730-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 03/04/2024] [Indexed: 03/17/2024] Open

Abstract

BACKGROUND

The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools.

RESULTS

We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances.

CONCLUSIONS

MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.

Collapse

Simoulin A, Thiebaut N, Neuberger K, Ibnouhsein I, Brunel N, Viné R, Bousquet N, Latapy J, Reix N, Molière S, Lodi M, Mathelin C. From free-text electronic health records to structured cohorts: Onconum, an innovative methodology for real-world data mining in breast cancer. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;240:107693. [PMID: 37453367 DOI: 10.1016/j.cmpb.2023.107693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 05/25/2023] [Accepted: 06/23/2023] [Indexed: 07/18/2023]

Abstract

PURPOSE

A considerable amount of valuable information is present in electronic health records (EHRs) however it remains inaccessible because it is embedded into unstructured narrative documents that cannot be easily analyzed. We wanted to develop and evaluate a methodology able to extract and structure information from electronic health records in breast cancer.

METHODS

We developed a software platform called Onconum (ClinicalTrials.gov Identifier: NCT02810093) which uses a hybrid method relying on machine learning approaches and rule-based lexical methods. It is based on natural language processing techniques that allows a targeted analysis of free-text medical data related to breast cancer, independently of any pre-existing dictionary, in a French context (available in N files). We then evaluated it on a validation cohort called Senometry.

FINDINGS

Senometry cohort included 9,599 patients with breast cancer (both invasive and in situ), treated between 2000 and 2017 in the breast cancer unit of Strasbourg University Hospitals. Extraction rates ranged from 45 to 100%, depending on the type of each parameter. Precision of extracted information was 68%-94% compared to a structured cohort, and 89%-98% compared to manually structured databases and it retrieved more rare occurrences compared to another database search engine (+17%).

INTERPRETATION

This innovative method can accurately structure relevant medical information embedded in EHRs in the context of breast cancer. Missing data handling is the main limitation of this method however multiple sources can be incorporated to reduce this limit. Nevertheless, this methodology does not need neither pre-existing dictionaries nor manually annotated corpora. It can therefore be easily implemented in non-English-speaking countries and in other diseases outside breast cancer, and it allows prospective inclusion of new patients.

Collapse

Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023;177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]

Abstract

BACKGROUND

Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments.

METHODS

We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries).

RESULTS

We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool.

DISCUSSION

Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.

Collapse

Ismail A, Al-Zoubi T, El Naqa I, Saeed H. The role of artificial intelligence in hastening time to recruitment in clinical trials. BJR Open 2023;5:20220023. [PMID: 37953865 PMCID: PMC10636341 DOI: 10.1259/bjro.20220023] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 03/20/2023] [Accepted: 04/11/2023] [Indexed: 09/01/2023] Open

Eikelboom WS, Singleton EH, van den Berg E, de Boer C, Coesmans M, Goudzwaard JA, Vijverberg EGB, Pan M, Gouw C, Mol MO, Gillissen F, Fieldhouse JLP, Pijnenburg YAL, van der Flier WM, van Swieten JC, Ossenkoppele R, Kors JA, Papma JM. The reporting of neuropsychiatric symptoms in electronic health records of individuals with Alzheimer's disease: a natural language processing study. Alzheimers Res Ther 2023;15:94. [PMID: 37173801 PMCID: PMC10176879 DOI: 10.1186/s13195-023-01240-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 05/05/2023] [Indexed: 05/15/2023]

Abstract

BACKGROUND

Neuropsychiatric symptoms (NPS) are prevalent in the early clinical stages of Alzheimer's disease (AD) according to proxy-based instruments. Little is known about which NPS clinicians report and whether their judgment aligns with proxy-based instruments. We used natural language processing (NLP) to classify NPS in electronic health records (EHRs) to estimate the reporting of NPS in symptomatic AD at the memory clinic according to clinicians. Next, we compared NPS as reported in EHRs and NPS reported by caregivers on the Neuropsychiatric Inventory (NPI).

METHODS

Two academic memory clinic cohorts were used: the Amsterdam UMC (n = 3001) and the Erasmus MC (n = 646). Patients included in these cohorts had MCI, AD dementia, or mixed AD/VaD dementia. Ten trained clinicians annotated 13 types of NPS in a randomly selected training set of n = 500 EHRs from the Amsterdam UMC cohort and in a test set of n = 250 EHRs from the Erasmus MC cohort. For each NPS, a generalized linear classifier was trained and internally and externally validated. Prevalence estimates of NPS were adjusted for the imperfect sensitivity and specificity of each classifier. Intra-individual comparison of the NPS classified in EHRs and NPS reported on the NPI were conducted in a subsample (59%).

RESULTS

Internal validation performance of the classifiers was excellent (AUC range: 0.81-0.91), but external validation performance decreased (AUC range: 0.51-0.93). NPS were prevalent in EHRs from the Amsterdam UMC, especially apathy (adjusted prevalence = 69.4%), anxiety (adjusted prevalence = 53.7%), aberrant motor behavior (adjusted prevalence = 47.5%), irritability (adjusted prevalence = 42.6%), and depression (adjusted prevalence = 38.5%). The ranking of NPS was similar for EHRs from the Erasmus MC, although not all classifiers obtained valid prevalence estimates due to low specificity. In both cohorts, there was minimal agreement between NPS classified in the EHRs and NPS reported on the NPI (all kappa coefficients < 0.28), with substantially more reports of NPS in EHRs than on NPI assessments.

CONCLUSIONS

NLP classifiers performed well in detecting a wide range of NPS in EHRs of patients with symptomatic AD visiting the memory clinic and showed that clinicians frequently reported NPS in these EHRs. Clinicians generally reported more NPS in EHRs than caregivers reported on the NPI.

Collapse

Affiliation(s)

Willem S Eikelboom Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands.
Ellen H Singleton Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Esther van den Berg Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands
Casper de Boer Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Michiel Coesmans Department of Psychiatry, Erasmus MC University Medical Center, Rotterdam, the Netherlands
Jeannette A Goudzwaard Department of Internal Medicine, Section of Geriatrics, Erasmus MC University Medical Center, Rotterdam, the Netherlands
Everard G B Vijverberg Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Michel Pan Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands
Cornalijn Gouw Department of Psychiatry, Erasmus MC University Medical Center, Rotterdam, the Netherlands
Merel O Mol Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands
Freek Gillissen Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Jay L P Fieldhouse Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Yolande A L Pijnenburg Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
Wiesje M van der Flier Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands
John C van Swieten Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands
Rik Ossenkoppele Department of Neurology, Alzheimer Center Amsterdam, Amsterdam University Medical Centers, Amsterdam, the Netherlands Clinical Memory Research Unit, Lund University, Malmö, Sweden
Jan A Kors Department of Medical Informatics, Erasmus MC University Medical Center, Rotterdam, the Netherlands
Janne M Papma Department of Neurology and Alzheimer Center Erasmus MC, Erasmus MC University Medical Center, PO Box 2040, 3000 CA, Rotterdam, the Netherlands

Collapse

Dong H, Suárez-Paniagua V, Zhang H, Wang M, Casey A, Davidson E, Chen J, Alex B, Whiteley W, Wu H. Ontology-driven and weakly supervised rare disease identification from clinical notes. BMC Med Inform Decis Mak 2023;23:86. [PMID: 37147628 PMCID: PMC10162001 DOI: 10.1186/s12911-023-02181-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 04/21/2023] [Indexed: 05/07/2023] Open

Abstract

BACKGROUND

Computational text phenotyping is the practice of identifying patients with certain disorders and traits from clinical notes. Rare diseases are challenging to be identified due to few cases available for machine learning and the need for data annotation from domain experts.

METHODS

We propose a method using ontologies and weak supervision, with recent pre-trained contextual representations from Bi-directional Transformers (e.g. BERT). The ontology-driven framework includes two steps: (i) Text-to-UMLS, extracting phenotypes by contextually linking mentions to concepts in Unified Medical Language System (UMLS), with a Named Entity Recognition and Linking (NER+L) tool, SemEHR, and weak supervision with customised rules and contextual mention representation; (ii) UMLS-to-ORDO, matching UMLS concepts to rare diseases in Orphanet Rare Disease Ontology (ORDO). The weakly supervised approach is proposed to learn a phenotype confirmation model to improve Text-to-UMLS linking, without annotated data from domain experts. We evaluated the approach on three clinical datasets, MIMIC-III discharge summaries, MIMIC-III radiology reports, and NHS Tayside brain imaging reports from two institutions in the US and the UK, with annotations.

RESULTS

The improvements in the precision were pronounced (by over 30% to 50% absolute score for Text-to-UMLS linking), with almost no loss of recall compared to the existing NER+L tool, SemEHR. Results on radiology reports from MIMIC-III and NHS Tayside were consistent with the discharge summaries. The overall pipeline processing clinical notes can extract rare disease cases, mostly uncaptured in structured data (manually assigned ICD codes).

CONCLUSION

The study provides empirical evidence for the task by applying a weakly supervised NLP pipeline on clinical notes. The proposed weak supervised deep learning approach requires no human annotation except for validation and testing, by leveraging ontologies, NER+L tools, and contextual representations. The study also demonstrates that Natural Language Processing (NLP) can complement traditional ICD-based approaches to better estimate rare diseases in clinical notes. We discuss the usefulness and limitations of the weak supervision approach and propose directions for future studies.

Collapse

Graeßner M, Jungwirth B, Frank E, Schaller SJ, Kochs E, Ulm K, Blobner M, Ulm B, Podtschaske AH, Kagerbauer SM. Enabling personalized perioperative risk prediction by using a machine-learning model based on preoperative data. Sci Rep 2023;13:7128. [PMID: 37130884 PMCID: PMC10153050 DOI: 10.1038/s41598-023-33981-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Accepted: 04/21/2023] [Indexed: 05/04/2023] Open

Affiliation(s)

Martin Graeßner Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany
Bettina Jungwirth Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany
Elke Frank Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany Commercial department, Klinikum rechts der isar, Technical University of Munich, Munich, Germany
Stefan Josef Schaller Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany Department of Anaesthesiology and Operative Intensive Care Medicine (CVK, CCM), Charité-Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Berlin, Germany
Eberhard Kochs Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
Kurt Ulm Department of Medical Statistics and Epidemiology, School of Medicine, Technical University of Munich, Munich, Germany
Manfred Blobner Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany
Bernhard Ulm Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany
Armin Horst Podtschaske Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany
Simone Maria Kagerbauer Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, Technical University of Munich, Munich, Germany. Department of Anaesthesiology and Intensive Care Medicine, School of Medicine, University Hospital Ulm, Albert-Einstein-Allee 23, 89081, Ulm, Germany.

Collapse

Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023;138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]

Zhao Y, Howard R, Amorrortu RP, Stewart SC, Wang X, Calip GS, Rollison DE. Assessing the Contribution of Scanned Outside Documents to the Completeness of Real-World Data Abstraction. JCO Clin Cancer Inform 2023;7:e2200118. [PMID: 36791386 DOI: 10.1200/cci.22.00118] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/17/2023] Open

Adverse drug event detection using natural language processing: A scoping review of supervised learning methods. PLoS One 2023;18:e0279842. [PMID: 36595517 PMCID: PMC9810201 DOI: 10.1371/journal.pone.0279842] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 12/15/2022] [Indexed: 01/04/2023] Open

Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.16867.5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 26 terms appeared with a frequency of 0.08 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population. Collapse

Pavel A, Saarimäki LA, Möbus L, Federico A, Serra A, Greco D. The potential of a data centred approach & knowledge graph data representation in chemical safety and drug design. Comput Struct Biotechnol J 2022;20:4837-4849. [PMID: 36147662 PMCID: PMC9464643 DOI: 10.1016/j.csbj.2022.08.061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 08/26/2022] [Accepted: 08/26/2022] [Indexed: 11/20/2022] Open

Karhade AV, Oosterhoff JHF, Groot OQ, Agaronnik N, Ehresman J, Bongers MER, Jaarsma RL, Poonnoose SI, Sciubba DM, Tobert DG, Doornberg JN, Schwab JH. Can We Geographically Validate a Natural Language Processing Algorithm for Automated Detection of Incidental Durotomy Across Three Independent Cohorts From Two Continents? Clin Orthop Relat Res 2022;480:1766-1775. [PMID: 35412473 PMCID: PMC9384904 DOI: 10.1097/corr.0000000000002200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 03/11/2022] [Indexed: 01/31/2023]

Abstract

BACKGROUND

Incidental durotomy is an intraoperative complication in spine surgery that can lead to postoperative complications, increased length of stay, and higher healthcare costs. Natural language processing (NLP) is an artificial intelligence method that assists in understanding free-text notes that may be useful in the automated surveillance of adverse events in orthopaedic surgery. A previously developed NLP algorithm is highly accurate in the detection of incidental durotomy on internal validation and external validation in an independent cohort from the same country. External validation in a cohort with linguistic differences is required to assess the transportability of the developed algorithm, referred to geographical validation. Ideally, the performance of a prediction model, the NLP algorithm, is constant across geographic regions to ensure reproducibility and model validity.

QUESTION/PURPOSE

Can we geographically validate an NLP algorithm for the automated detection of incidental durotomy across three independent cohorts from two continents?

METHODS

Patients 18 years or older undergoing a primary procedure of (thoraco)lumbar spine surgery were included. In Massachusetts, between January 2000 and June 2018, 1000 patients were included from two academic and three community medical centers. In Maryland, between July 2016 and November 2018, 1279 patients were included from one academic center, and in Australia, between January 2010 and December 2019, 944 patients were included from one academic center. The authors retrospectively studied the free-text operative notes of included patients for the primary outcome that was defined as intraoperative durotomy. Incidental durotomy occurred in 9% (93 of 1000), 8% (108 of 1279), and 6% (58 of 944) of the patients, respectively, in the Massachusetts, Maryland, and Australia cohorts. No missing reports were observed. Three datasets (Massachusetts, Australian, and combined Massachusetts and Australian) were divided into training and holdout test sets in an 80:20 ratio. An extreme gradient boosting (an efficient and flexible tree-based algorithm) NLP algorithm was individually trained on each training set, and the performance of the three NLP algorithms (respectively American, Australian, and combined) was assessed by discrimination via area under the receiver operating characteristic curves (AUC-ROC; this measures the model's ability to distinguish patients who obtained the outcomes from those who did not), calibration metrics (which plot the predicted and the observed probabilities) and Brier score (a composite of discrimination and calibration). In addition, the sensitivity (true positives, recall), specificity (true negatives), positive predictive value (also known as precision), negative predictive value, F1-score (composite of precision and recall), positive likelihood ratio, and negative likelihood ratio were calculated.

RESULTS

The combined NLP algorithm (the combined Massachusetts and Australian data) achieved excellent performance on independent testing data from Australia (AUC-ROC 0.97 [95% confidence interval 0.87 to 0.99]), Massachusetts (AUC-ROC 0.99 [95% CI 0.80 to 0.99]) and Maryland (AUC-ROC 0.95 [95% CI 0.93 to 0.97]). The NLP developed based on the Massachusetts cohort had excellent performance in the Maryland cohort (AUC-ROC 0.97 [95% CI 0.95 to 0.99]) but worse performance in the Australian cohort (AUC-ROC 0.74 [95% CI 0.70 to 0.77]).

CONCLUSION

We demonstrated the clinical utility and reproducibility of an NLP algorithm with combined datasets retaining excellent performance in individual countries relative to algorithms developed in the same country alone for detection of incidental durotomy. Further multi-institutional, international collaborations can facilitate the creation of universal NLP algorithms that improve the quality and safety of orthopaedic surgery globally. The combined NLP algorithm has been incorporated into a freely accessible web application that can be found at https://sorg-apps.shinyapps.io/nlp_incidental_durotomy/ . Clinicians and researchers can use the tool to help incorporate the model in evaluating spine registries or quality and safety departments to automate detection of incidental durotomy and optimize prevention efforts.

LEVEL OF EVIDENCE

Level III, diagnostic study.

Collapse

Zhu VJ, Lenert LA, Barth KS, Simpson KN, Li H, Kopscik M, Brady KT. Automatically identifying opioid use disorder in non-cancer patients on chronic opioid therapy. Health Informatics J 2022;28:14604582221107808. [PMID: 35726687 PMCID: PMC10826411 DOI: 10.1177/14604582221107808] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]

Clinical Text Data Categorization and Feature Extraction Using Medical-Fissure Algorithm and Neg-Seq Algorithm. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2022:5759521. [PMID: 35295284 PMCID: PMC8920702 DOI: 10.1155/2022/5759521] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 01/22/2022] [Accepted: 01/27/2022] [Indexed: 12/19/2022]

Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.16867.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 22 terms appeared with a frequency of 0.05 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population. Collapse

Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 22 terms appeared with a frequency of 0.05 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population. Collapse

Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, nine terms appeared with a frequency of 0.10 or greater, while in 2021 43 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population. Collapse

de Oliveira JM, da Costa CA, Antunes RS. Data structuring of electronic health records: a systematic review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00607-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]

Olthof AW, Shouche P, Fennema EM, IJpma FFA, Koolstra RHC, Stirler VMA, van Ooijen PMA, Cornelissen LJ. Machine learning based natural language processing of radiology reports in orthopaedic trauma. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;208:106304. [PMID: 34333208 DOI: 10.1016/j.cmpb.2021.106304] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 07/18/2021] [Indexed: 06/13/2023]

Vashishth S, Newman-Griffis D, Joshi R, Dutt R, Rosé CP. Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets. J Biomed Inform 2021;121:103880. [PMID: 34390853 PMCID: PMC8952339 DOI: 10.1016/j.jbi.2021.103880] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/31/2021] [Accepted: 07/31/2021] [Indexed: 10/28/2022]

Abstract

OBJECTIVES

Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction-extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types.

METHODS

We experiment with five off-the-shelf biomedical NLP toolkits on four benchmark datasets for medical information extraction from scientific literature and clinical notes. All toolkits adopt a staged approach of mention detection followed by two stages of medical entity linking: (1) generating a list of candidate concepts, and (2) picking the best concept among them. We introduce a semantic type prediction module to alleviate the problem of overgeneration of candidate concepts by filtering out irrelevant candidate concepts based on the predicted semantic type of a mention. We present MedType, a fully modular semantic type prediction model which we integrate into the existing NLP toolkits. To address the dearth of broad-coverage training data for medical information extraction, we further present WikiMed and PubMedDS, two large-scale datasets for medical entity linking.

RESULTS

Semantic type filtering improves medical entity linking performance across all toolkits and datasets, often by several percentage points of F-1. Further, pretraining MedType on our novel datasets achieves state-of-the-art performance for semantic type prediction in biomedical text.

CONCLUSIONS

Semantic type prediction is a key part of building accurate NLP pipelines for broad-coverage information extraction from biomedical text. We make our source code and novel datasets publicly available to foster reproducible research.

Collapse

[Standardized diagnosis of pancreatic head carcinoma]. DER PATHOLOGE 2021;42:453-463. [PMID: 34357472 DOI: 10.1007/s00292-021-00971-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 07/07/2021] [Indexed: 10/20/2022]

Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2021. [DOI: 10.12688/wellcomeopenres.16867.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Abstract Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, nine terms appeared with a frequency of 0.10 or greater, while in 2021 43 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population. Collapse

Carriere J, Shafi H, Brehon K, Pohar Manhas K, Churchill K, Ho C, Tavakoli M. Case Report: Utilizing AI and NLP to Assist with Healthcare and Rehabilitation During the COVID-19 Pandemic. Front Artif Intell 2021;4:613637. [PMID: 33733232 PMCID: PMC7907599 DOI: 10.3389/frai.2021.613637] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2020] [Accepted: 01/08/2021] [Indexed: 01/16/2023] Open