1
|
Chomutare T, Lamproudis A, Budrionis A, Svenning TO, Hind LI, Ngo PD, Mikalsen KØ, Dalianis H. Improving Quality of ICD-10 (International Statistical Classification of Diseases, Tenth Revision) Coding Using AI: Protocol for a Crossover Randomized Controlled Trial. JMIR Res Protoc 2024; 13:e54593. [PMID: 38470476 DOI: 10.2196/54593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 01/12/2024] [Accepted: 01/16/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Computer-assisted clinical coding (CAC) tools are designed to help clinical coders assign standardized codes, such as the ICD-10 (International Statistical Classification of Diseases, Tenth Revision), to clinical texts, such as discharge summaries. Maintaining the integrity of these standardized codes is important both for the functioning of health systems and for ensuring data used for secondary purposes are of high quality. Clinical coding is an error-prone cumbersome task, and the complexity of modern classification systems such as the ICD-11 (International Classification of Diseases, Eleventh Revision) presents significant barriers to implementation. To date, there have only been a few user studies; therefore, our understanding is still limited regarding the role CAC systems can play in reducing the burden of coding and improving the overall quality of coding. OBJECTIVE The objective of the user study is to generate both qualitative and quantitative data for measuring the usefulness of a CAC system, Easy-ICD, that was developed for recommending ICD-10 codes. Specifically, our goal is to assess whether our tool can reduce the burden on clinical coders and also improve coding quality. METHODS The user study is based on a crossover randomized controlled trial study design, where we measure the performance of clinical coders when they use our CAC tool versus when they do not. Performance is measured by the time it takes them to assign codes to both simple and complex clinical texts as well as the coding quality, that is, the accuracy of code assignment. RESULTS We expect the study to provide us with a measurement of the effectiveness of the CAC system compared to manual coding processes, both in terms of time use and coding quality. Positive outcomes from this study will imply that CAC tools hold the potential to reduce the burden on health care staff and will have major implications for the adoption of artificial intelligence-based CAC innovations to improve coding practice. Expected results to be published summer 2024. CONCLUSIONS The planned user study promises a greater understanding of the impact CAC systems might have on clinical coding in real-life settings, especially with regard to coding time and quality. Further, the study may add new insights on how to meaningfully exploit current clinical text mining capabilities, with a view to reducing the burden on clinical coders, thus lowering the barriers and paving a more sustainable path to the adoption of modern coding systems, such as the new ICD-11. TRIAL REGISTRATION clinicaltrials.gov NCT06286865; https://clinicaltrials.gov/study/NCT06286865. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) DERR1-10.2196/54593.
Collapse
Affiliation(s)
- Taridzo Chomutare
- Health Data Analytics, Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer Science, UiT The Arctic University of Norway, Tromsø, Norway
| | | | - Andrius Budrionis
- Health Data Analytics, Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
| | | | - Lill Irene Hind
- Clinic for Surgery, Oncology and Women Health, University Hospital of North Norway, Tromsø, Norway
| | - Phuong Dinh Ngo
- Health Data Analytics, Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
| | - Karl Øyvind Mikalsen
- Department of Physics and Technology, UiT The Arctic University of Norway, Tromsø, Norway
- The Norwegian Centre for Clinical Artificial Intelligence, University Hospital of North Norway, Tromsø, Norway
| | - Hercules Dalianis
- Health Data Analytics, Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| |
Collapse
|
2
|
Lamproudis A, Mora S, Svenning TO, Torsvik T, Chomutare T, Ngo PD, Dalianis H. De-identifying Norwegian Clinical Text using Resources from Swedish and Danish. AMIA Annu Symp Proc 2024; 2023:456-464. [PMID: 38222432 PMCID: PMC10785939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
The lack of relevant annotated datasets represents one key limitation in the application of Natural Language Processing techniques in a broad number of tasks, among them Protected Health Information (PHI) identification in Norwegian clinical text. In this work, the possibility of exploiting resources from Swedish, a very closely related language, to Norwegian is explored. The Swedish dataset is annotated with PHI information. Different processing and text augmentation techniques are evaluated, along with their impact in the final performance of the model. The augmentation techniques, such as injection and generation of both Norwegian and Scandinavian Named Entities into the Swedish training corpus, showed to increase the performance in the de-identification task for both Danish and Norwegian text. This trend was also confirmed by the evaluation of model performance on a sample Norwegian gastro surgical clinical text.
Collapse
Affiliation(s)
| | - Sara Mora
- Department of Informatics, Bioengineering, Robotics and System engineering (DIBRIS), University of Genoa, Genoa, Italy
| | | | | | - Taridzo Chomutare
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Phuong Dinh Ngo
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Physics and Technology, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Hercules Dalianis
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer and Systems Science (DSV), Stockholm University, Kista, Sweden
| |
Collapse
|
3
|
Lamproudis A, Svenning TO, Torsvik T, Chomutare T, Budrionis A, Dinh Ngo P, Vakili T, Dalianis H. Using a Large Open Clinical Corpus for Improved ICD-10 Diagnosis Coding. AMIA Annu Symp Proc 2024; 2023:465-473. [PMID: 38222373 PMCID: PMC10785868] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
With the recent advances in natural language processing and deep learning, the development of tools that can assist medical coders in ICD-10 diagnosis coding and increase their efficiency in coding discharge summaries is significantly more viable than before. To that end, one important component in the development of these models is the datasets used to train them. In this study, such datasets are presented, and it is shown that one of them can be used to develop a BERT-based language model that can consistently perform well in assigning ICD-10 codes to discharge summaries written in Swedish. Most importantly, it can be used in a coding support setup where a tool can recommend potential codes to the coders. This reduces the range of potential codes to consider and, in turn, reduces the workload of the coder. Moreover, the de-identified and pseudonymised dataset is open to use for academic users.
Collapse
Affiliation(s)
| | | | | | - Taridzo Chomutare
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer Science, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Andrius Budrionis
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Physics and Technology, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Phuong Dinh Ngo
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Physics and Technology, UiT - The Arctic University of Norway, Tromsø, Norway
| | - Thomas Vakili
- Department of Computer and Systems Science (DSV), Stockholm University, Kista, Sweden
| | - Hercules Dalianis
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer and Systems Science (DSV), Stockholm University, Kista, Sweden
| |
Collapse
|
4
|
Valik JK, Ward L, Tanushi H, Johansson AF, Färnert A, Mogensen ML, Pickering BW, Herasevich V, Dalianis H, Henriksson A, Nauclér P. Predicting sepsis onset using a machine learned causal probabilistic network algorithm based on electronic health records data. Sci Rep 2023; 13:11760. [PMID: 37474597 PMCID: PMC10359402 DOI: 10.1038/s41598-023-38858-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 07/16/2023] [Indexed: 07/22/2023] Open
Abstract
Sepsis is a leading cause of mortality and early identification improves survival. With increasing digitalization of health care data automated sepsis prediction models hold promise to aid in prompt recognition. Most previous studies have focused on the intensive care unit (ICU) setting. Yet only a small proportion of sepsis develops in the ICU and there is an apparent clinical benefit to identify patients earlier in the disease trajectory. In this cohort of 82,852 hospital admissions and 8038 sepsis episodes classified according to the Sepsis-3 criteria, we demonstrate that a machine learned score can predict sepsis onset within 48 h using sparse routine electronic health record data outside the ICU. Our score was based on a causal probabilistic network model-SepsisFinder-which has similarities with clinical reasoning. A prediction was generated hourly on all admissions, providing a new variable was registered. Compared to the National Early Warning Score (NEWS2), which is an established method to identify sepsis, the SepsisFinder triggered earlier and had a higher area under receiver operating characteristic curve (AUROC) (0.950 vs. 0.872), as well as area under precision-recall curve (APR) (0.189 vs. 0.149). A machine learning comparator based on a gradient-boosting decision tree model had similar AUROC (0.949) and higher APR (0.239) than SepsisFinder but triggered later than both NEWS2 and SepsisFinder. The precision of SepsisFinder increased if screening was restricted to the earlier admission period and in episodes with bloodstream infection. Furthermore, the SepsisFinder signaled median 5.5 h prior to antibiotic administration. Identifying a high-risk population with this method could be used to tailor clinical interventions and improve patient care.
Collapse
Affiliation(s)
- John Karlsson Valik
- Division of Infectious Diseases, Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden.
- Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden.
| | - Logan Ward
- Treat Systems ApS, Aalborg, Denmark
- Department of Health Science and Technology, Center for Model-Based Medical Decision Support, Aalborg University, Aalborg, Denmark
| | - Hideyuki Tanushi
- Division of Infectious Diseases, Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Anders F Johansson
- Department of Clinical Microbiology and the Laboratory for Molecular Infection Medicine (MIMS), Umeå University, Umeå, Sweden
| | - Anna Färnert
- Division of Infectious Diseases, Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
- Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | | | - Brian W Pickering
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN, USA
| | - Vitaly Herasevich
- Department of Anesthesiology and Perioperative Medicine, Mayo Clinic, Rochester, MN, USA
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - Aron Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - Pontus Nauclér
- Division of Infectious Diseases, Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
- Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
5
|
Blanco A, Remmer S, Pérez A, Dalianis H, Casillas A. Implementation of specialised attention mechanisms: ICD-10 classification of Gastrointestinal discharge summaries in English, Spanish and Swedish. J Biomed Inform 2022; 130:104050. [DOI: 10.1016/j.jbi.2022.104050] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 01/31/2022] [Accepted: 03/07/2022] [Indexed: 11/30/2022]
|
6
|
van der Werff SD, Thiman E, Tanushi H, Valik JK, Henriksson A, Ul Alam M, Dalianis H, Ternhag A, Nauclér P. The accuracy of fully automated algorithms for surveillance of healthcare-associated urinary tract infections in hospitalized patients. J Hosp Infect 2021; 110:139-147. [PMID: 33548370 DOI: 10.1016/j.jhin.2021.01.023] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/27/2021] [Accepted: 01/27/2021] [Indexed: 01/06/2023]
Abstract
BACKGROUND Surveillance for healthcare-associated infections such as healthcare-associated urinary tract infections (HA-UTI) is important for directing resources and evaluating interventions. However, traditional surveillance methods are resource-intensive and subject to bias. AIM To develop and validate a fully automated surveillance algorithm for HA-UTI using electronic health record (EHR) data. METHODS Five algorithms were developed using EHR data from 2979 admissions at Karolinska University Hospital from 2010 to 2011: (1) positive urine culture (UCx); (2) positive UCx + UTI codes (International Statistical Classification of Diseases and Related Health Problems, 10th revision); (3) positive UCx + UTI-specific antibiotics; (4) positive UCx + fever and/or UTI symptoms; (5) algorithm 4 with negation for fever without UTI symptoms. Natural language processing (NLP) was used for processing free-text medical notes. The algorithms were validated in 1258 potential UTI episodes from January to March 2012 and results extrapolated to all UTI episodes within this period (N = 16,712). The reference standard for HA-UTIs was manual record review according to the European Centre for Disease Prevention and Control (and US Centers for Disease Control and Prevention) definitions by trained healthcare personnel. FINDINGS Of the 1258 UTI episodes, 163 fulfilled the ECDC HA-UTI definition and the algorithms classified 391, 150, 189, 194, and 153 UTI episodes, respectively, as HA-UTI. Algorithms 1, 2, and 3 had insufficient performances. Algorithm 4 achieved better performance and algorithm 5 performed best for surveillance purposes with sensitivity 0.667 (95% confidence interval: 0.594-0.733), specificity 0.997 (0.996-0.998), positive predictive value 0.719 (0.624-0.807) and negative predictive value 0.997 (0.996-0.997). CONCLUSION A fully automated surveillance algorithm based on NLP to find UTI symptoms in free-text had acceptable performance to detect HA-UTI compared to manual record review. Algorithms based on administrative and microbiology data only were not sufficient.
Collapse
Affiliation(s)
- S D van der Werff
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden.
| | - E Thiman
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - H Tanushi
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden; Department of Data Processing & Analysis, Karolinska University Hospital, Stockholm, Sweden
| | - J K Valik
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - A Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - M Ul Alam
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - H Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Stockholm, Sweden
| | - A Ternhag
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - P Nauclér
- Department of Medicine Solna, Division of Infectious Disease, Karolinska Institutet, Stockholm, Sweden; Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
7
|
Caccamisi A, Jørgensen L, Dalianis H, Rosenlund M. Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records. Ups J Med Sci 2020; 125:316-324. [PMID: 32696698 PMCID: PMC7594865 DOI: 10.1080/03009734.2020.1792010] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND The electronic medical record (EMR) offers unique possibilities for clinical research, but some important patient attributes are not readily available due to its unstructured properties. We applied text mining using machine learning to enable automatic classification of unstructured information on smoking status from Swedish EMR data. METHODS Data on patients' smoking status from EMRs were used to develop 32 different predictive models that were trained using Weka, changing sentence frequency, classifier type, tokenization, and attribute selection in a database of 85,000 classified sentences. The models were evaluated using F-score and accuracy based on out-of-sample test data including 8500 sentences. The error weight matrix was used to select the best model, assigning a weight to each type of misclassification and applying it to the model confusion matrices. The best performing model was then compared to a rule-based method. RESULTS The best performing model was based on the Support Vector Machine (SVM) Sequential Minimal Optimization (SMO) classifier using a combination of unigrams and bigrams as tokens. Sentence frequency and attributes selection did not improve model performance. SMO achieved 98.14% accuracy and 0.981 F-score versus 79.32% and 0.756 for the rule-based model. CONCLUSION A model using machine-learning algorithms to automatically classify patients' smoking status was successfully developed. Such algorithms may enable automatic assessment of smoking status and other unstructured data directly from EMRs without manual classification of complete case notes.
Collapse
Affiliation(s)
- Andrea Caccamisi
- Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden
- Department of Computer and Systems Sciences (DSV), Stockholm University, Stockholm, Sweden
| | | | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Stockholm, Sweden
| | - Mats Rosenlund
- Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden
- IQVIA Solutions Sweden AB, Solna, Sweden
- CONTACT Mats Rosenlund Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Stockholm, SE-171 77, Sweden
| |
Collapse
|
8
|
Chomutare T, Yigzaw KY, Budrionis A, Makhlysheva A, Godtliebsen F, Dalianis H. De-Identifying Swedish EHR Text Using Public Resources in the General Domain. Stud Health Technol Inform 2020; 270:148-152. [PMID: 32570364 DOI: 10.3233/shti200140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Sensitive data is normally required to develop rule-based or train machine learning-based models for de-identifying electronic health record (EHR) clinical notes; and this presents important problems for patient privacy. In this study, we add non-sensitive public datasets to EHR training data; (i) scientific medical text and (ii) Wikipedia word vectors. The data, all in Swedish, is used to train a deep learning model using recurrent neural networks. Tests on pseudonymized Swedish EHR clinical notes showed improved precision and recall from 55.62% and 80.02% with the base EHR embedding layer, to 85.01% and 87.15% when Wikipedia word vectors are added. These results suggest that non-sensitive text from the general domain can be used to train robust models for de-identifying Swedish clinical text; and this could be useful in cases where the data is both sensitive and in low-resource languages.
Collapse
Affiliation(s)
| | | | | | | | - Fred Godtliebsen
- Norwegian Centre for E-health Research, Tromsø, Norway
- Faculty of Science & Technology, UiT - The Arctic University of Norway
| | - Hercules Dalianis
- Norwegian Centre for E-health Research, Tromsø, Norway
- Department of Computer and Systems Sciences, Stockholm University, Sweden
| |
Collapse
|
9
|
Valik JK, Ward L, Tanushi H, Müllersdorf K, Ternhag A, Aufwerber E, Färnert A, Johansson AF, Mogensen ML, Pickering B, Dalianis H, Henriksson A, Herasevich V, Nauclér P. Validation of automated sepsis surveillance based on the Sepsis-3 clinical criteria against physician record review in a general hospital population: observational study using electronic health records data. BMJ Qual Saf 2020; 29:735-745. [PMID: 32029574 PMCID: PMC7467502 DOI: 10.1136/bmjqs-2019-010123] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2019] [Revised: 01/19/2020] [Accepted: 01/21/2020] [Indexed: 12/20/2022]
Abstract
BACKGROUND Surveillance of sepsis incidence is important for directing resources and evaluating quality-of-care interventions. The aim was to develop and validate a fully-automated Sepsis-3 based surveillance system in non-intensive care wards using electronic health record (EHR) data, and demonstrate utility by determining the burden of hospital-onset sepsis and variations between wards. METHODS A rule-based algorithm was developed using EHR data from a cohort of all adult patients admitted at an academic centre between July 2012 and December 2013. Time in intensive care units was censored. To validate algorithm performance, a stratified random sample of 1000 hospital admissions (674 with and 326 without suspected infection) was classified according to the Sepsis-3 clinical criteria (suspected infection defined as having any culture taken and at least two doses of antimicrobials administered, and an increase in Sequential Organ Failure Assessment (SOFA) score by >2 points) and the likelihood of infection by physician medical record review. RESULTS In total 82 653 hospital admissions were included. The Sepsis-3 clinical criteria determined by physician review were met in 343 of 1000 episodes. Among them, 313 (91%) had possible, probable or definite infection. Based on this reference, the algorithm achieved sensitivity 0.887 (95% CI: 0.799 to 0.964), specificity 0.985 (95% CI: 0.978 to 0.991), positive predictive value 0.881 (95% CI: 0.833 to 0.926) and negative predictive value 0.986 (95% CI: 0.973 to 0.996). When applied to the total cohort taking into account the sampling proportions of those with and without suspected infection, the algorithm identified 8599 (10.4%) sepsis episodes. The burden of hospital-onset sepsis (>48 hour after admission) and related in-hospital mortality varied between wards. CONCLUSIONS A fully-automated Sepsis-3 based surveillance algorithm using EHR data performed well compared with physician medical record review in non-intensive care wards, and exposed variations in hospital-onset sepsis incidence between wards.
Collapse
Affiliation(s)
- John Karlsson Valik
- Division of Infectious Diseases, Department of Medicine, Solna (MedS), Karolinska Institutet, Stockholm, Sweden .,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Logan Ward
- Treat Systems ApS, Aalborg, Denmark.,Center for Model-based Medical Decision Support, Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Hideyuki Tanushi
- Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Kajsa Müllersdorf
- Division of Infectious Diseases, Department of Medicine, Solna (MedS), Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anders Ternhag
- Division of Infectious Diseases, Department of Medicine, Solna (MedS), Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Ewa Aufwerber
- Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anna Färnert
- Division of Infectious Diseases, Department of Medicine, Solna (MedS), Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| | - Anders F Johansson
- Department of Clinical microbiology and the Laboratory for Molecular Infection Medicine (MIMS), Umeå University, Umeå, Sweden
| | | | - Brian Pickering
- Department of Anesthesiology and Perioperative medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| | - Aron Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Kista, Sweden
| | - Vitaly Herasevich
- Department of Anesthesiology and Perioperative medicine, Mayo Clinic, Rochester, Minnesota, USA
| | - Pontus Nauclér
- Division of Infectious Diseases, Department of Medicine, Solna (MedS), Karolinska Institutet, Stockholm, Sweden.,Department of Infectious Diseases, Karolinska University Hospital, Stockholm, Sweden
| |
Collapse
|
10
|
Abstract
This article describes the development and evaluation of a set of knowledge patterns that provide guidelines and implications of design for developers of mental health portals. The knowledge patterns were based on three foundations: (1) knowledge integration of language technology approaches; (2) experiments with language technology applications and (3) user studies of portal interaction. A mixed-methods approach was employed for the evaluation of the knowledge patterns: formative workshops with knowledge pattern experts and summative surveys with experts in specific domains. The formative evaluation improved the cohesion of the patterns. The results of the summative evaluation showed that the problems discussed in the patterns were relevant for the domain, and that the knowledge embedded was useful to solve them. Ten patterns out of thirteen achieved an average score above 4.0, which is a positive result that leads us to conclude that they can be used as guidelines for developing health portals.
Collapse
|
11
|
Névéol A, Dalianis H, Velupillai S, Savova G, Zweigenbaum P. Clinical Natural Language Processing in languages other than English: opportunities and challenges. J Biomed Semantics 2018; 9:12. [PMID: 29602312 PMCID: PMC5877394 DOI: 10.1186/s13326-018-0179-8] [Citation(s) in RCA: 83] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Accepted: 02/14/2018] [Indexed: 01/22/2023] Open
Abstract
Background Natural language processing applied to clinical text or aimed at a clinical outcome has been thriving in recent years. This paper offers the first broad overview of clinical Natural Language Processing (NLP) for languages other than English. Recent studies are summarized to offer insights and outline opportunities in this area. Main Body We envision three groups of intended readers: (1) NLP researchers leveraging experience gained in other languages, (2) NLP researchers faced with establishing clinical text processing in a language other than English, and (3) clinical informatics researchers and practitioners looking for resources in their languages in order to apply NLP techniques and tools to clinical practice and/or investigation. We review work in clinical NLP in languages other than English. We classify these studies into three groups: (i) studies describing the development of new NLP systems or components de novo, (ii) studies describing the adaptation of NLP architectures developed for English to another language, and (iii) studies focusing on a particular clinical application. Conclusion We show the advantages and drawbacks of each method, and highlight the appropriate application context. Finally, we identify major challenges and opportunities that will affect the impact of NLP on clinical practice and public health studies in a context that encompasses English as well as other languages.
Collapse
Affiliation(s)
- Aurélie Névéol
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| | | | - Sumithra Velupillai
- School of Computer Science and Communication, KTH, Stockholm, Sweden.,Institute of Psychiatry, Psychology and Neuroscience, King's College, London, UK
| | - Guergana Savova
- Children's Hospital Boston and Harvard Medical School, Boston, Massachusetts, USA
| | - Pierre Zweigenbaum
- LIMSI, CNRS, Université Paris Saclay, Rue John von Neumann, Paris, F-91405 Orsay, France
| |
Collapse
|
12
|
Ehrentraut C, Ekholm M, Tanushi H, Tiedemann J, Dalianis H. Detecting hospital-acquired infections: A document classification approach using support vector machines and gradient tree boosting. Health Informatics J 2018; 24:24-42. [PMID: 27496862 PMCID: PMC5802538 DOI: 10.1177/1460458216656471] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Hospital-acquired infections pose a significant risk to patient health, while their surveillance is an additional workload for hospital staff. Our overall aim is to build a surveillance system that reliably detects all patient records that potentially include hospital-acquired infections. This is to reduce the burden of having the hospital staff manually check patient records. This study focuses on the application of text classification using support vector machines and gradient tree boosting to the problem. Support vector machines and gradient tree boosting have never been applied to the problem of detecting hospital-acquired infections in Swedish patient records, and according to our experiments, they lead to encouraging results. The best result is yielded by gradient tree boosting, at 93.7 percent recall, 79.7 percent precision and 85.7 percent F1 score when using stemming. We can show that simple preprocessing techniques and parameter tuning can lead to high recall (which we aim for in screening patient records) with appropriate precision for this task.
Collapse
|
13
|
Pérez A, Weegar R, Casillas A, Gojenola K, Oronoz M, Dalianis H. Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora. J Biomed Inform 2017; 71:16-30. [PMID: 28526460 DOI: 10.1016/j.jbi.2017.05.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 05/04/2017] [Accepted: 05/11/2017] [Indexed: 11/29/2022]
Abstract
OBJECTIVE The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. METHODS The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. RESULTS The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. CONCLUSIONS The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.
Collapse
Affiliation(s)
- Alicia Pérez
- IXA Group, University of the Basque Country (UPV-EHU), Spain(1)
| | - Rebecka Weegar
- Clinical Text Mining Group, Department of Computer and System Sciences (DSV), Stockholm University, Sweden
| | - Arantza Casillas
- IXA Group, University of the Basque Country (UPV-EHU), Spain(1).
| | - Koldo Gojenola
- IXA Group, University of the Basque Country (UPV-EHU), Spain(1)
| | - Maite Oronoz
- IXA Group, University of the Basque Country (UPV-EHU), Spain(1)
| | - Hercules Dalianis
- Clinical Text Mining Group, Department of Computer and System Sciences (DSV), Stockholm University, Sweden
| |
Collapse
|
14
|
Henriksson A, Kvist M, Dalianis H. Detecting Protected Health Information in Heterogeneous Clinical Notes. Stud Health Technol Inform 2017; 245:393-397. [PMID: 29295123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
To enable secondary use of healthcare data in a privacy-preserving manner, there is a need for methods capable of automatically identifying protected health information (PHI) in clinical text. To that end, learning predictive models from labeled examples has emerged as a promising alternative to rule-based systems. However, little is known about differences with respect to PHI prevalence in different types of clinical notes and how potential domain differences may affect the performance of predictive models trained on one particular type of note and applied to another. In this study, we analyze the performance of a predictive model trained on an existing PHI corpus of Swedish clinical notes and applied to a variety of clinical notes: written (i) in different clinical specialties, (ii) under different headings, and (iii) by persons in different professions. The results indicate that domain adaption is needed for effective detection of PHI in heterogeneous clinical notes.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
| | - Maria Kvist
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
| |
Collapse
|
15
|
Henriksson A, Kvist M, Dalianis H. Prevalence Estimation of Protected Health Information in Swedish Clinical Text. Stud Health Technol Inform 2017; 235:216-220. [PMID: 28423786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Obscuring protected health information (PHI) in the clinical text of health records facilitates the secondary use of healthcare data in a privacy-preserving manner. Although automatic de-identification of clinical text using machine learning holds much promise, little is known about the relative prevalence of PHI in different types of clinical text and whether there is a need for domain adaptation when learning predictive models from one particular domain and applying it to another. In this study, we address these questions by training a predictive model and using it to estimate the prevalence of PHI in clinical text written (1) in different clinical specialties, (2) in different types of notes (i.e., under different headings), and (3) by persons in different professional roles. It is demonstrated that the overall PHI density is 1.57%; however, substantial differences exist across domains.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Sweden
| | - Maria Kvist
- Department of Computer and Systems Sciences, Stockholm University, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Sweden
| |
Collapse
|
16
|
Erasmie U, Dalianis H, Ringertz H. A Computer-Based System for Measurements and Analyses in Radiology. Acta Radiol 2016. [DOI: 10.1177/028418519003100621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A computer-based system for direct measurements on images and for analyses of data is presented. Distances, angles, and areas are measured on a backlighted digitizer table. Calibration corrects for actual magnification. Data are analyzed and compared to normal values by a microcomputer. The system is precise and time saving.
Collapse
Affiliation(s)
- U. Erasmie
- Departments of Pediatric Radiology, Sachska Barnsjukhuset, and Diagnostic Radiology, Karolinska Sjukhuset, The Karolinska Institute, Stockholm, Sweden
| | - H. Dalianis
- Departments of Pediatric Radiology, Sachska Barnsjukhuset, and Diagnostic Radiology, Karolinska Sjukhuset, The Karolinska Institute, Stockholm, Sweden
| | - H. Ringertz
- Departments of Pediatric Radiology, Sachska Barnsjukhuset, and Diagnostic Radiology, Karolinska Sjukhuset, The Karolinska Institute, Stockholm, Sweden
| |
Collapse
|
17
|
Abstract
BACKGROUND Learning deep representations of clinical events based on their distributions in electronic health records has been shown to allow for subsequent training of higher-performing predictive models compared to the use of shallow, count-based representations. The predictive performance may be further improved by utilizing multiple representations of the same events, which can be obtained by, for instance, manipulating the representation learning procedure. The question, however, remains how to make best use of a set of diverse representations of clinical events - modeled in an ensemble of semantic spaces - for the purpose of predictive modeling. METHODS Three different ways of exploiting a set of (ten) distributed representations of four types of clinical events - diagnosis codes, drug codes, measurements, and words in clinical notes - are investigated in a series of experiments using ensembles of randomized trees. Here, the semantic space ensembles are obtained by varying the context window size in the representation learning procedure. The proposed method trains a forest wherein each tree is built from a bootstrap replicate of the training set whose entire original feature set is represented in a randomly selected set of semantic spaces - corresponding to the considered data types - of a given context window size. RESULTS The proposed method significantly outperforms concatenating the multiple representations of the bagged dataset; it also significantly outperforms representing, for each decision tree, only a subset of the features in a randomly selected set of semantic spaces. A follow-up analysis indicates that the proposed method exhibits less diversity while significantly improving average tree performance. It is also shown that the size of the semantic space ensemble has a significant impact on predictive performance and that performance tends to improve as the size increases. CONCLUSIONS The strategy for utilizing a set of diverse distributed representations of clinical events when constructing ensembles of randomized trees has a significant impact on predictive performance. The most successful strategy - significantly outperforming the considered alternatives - involves randomly sampling distributed representations of the clinical events when building each decision tree in the forest.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden.
| | - Jing Zhao
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| | - Henrik Boström
- Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 12, Kista, SE-16407, Sweden
| |
Collapse
|
18
|
Weegar R, Kvist M, Sundström K, Brunak S, Dalianis H. Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach and NegEx. AMIA Annu Symp Proc 2015; 2015:1296-1305. [PMID: 26958270 PMCID: PMC4765575] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Detection of early symptoms in cervical cancer is crucial for early treatment and survival. To find symptoms of cervical cancer in clinical text, Named Entity Recognition is needed. In this paper the Clinical Entity Finder, a machine-learning tool trained on annotated clinical text from a Swedish internal medicine emergency unit, is evaluated on cervical cancer records. The Clinical Entity Finder identifies entities of the types body part, finding and disorder and is extended with negation detection using the rule-based tool NegEx, to distinguish between negated and non-negated entities. To measure the performance of the tools on this new domain, two physicians annotated a set of clinical notes from the health records of cervical cancer patients. The inter-annotator agreement for finding, disorder and body part obtained an average F-score of 0.677 and the Clinical Entity Finder extended with NegEx had an average F-score of 0.667.
Collapse
Affiliation(s)
- Rebecka Weegar
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
| | - Maria Kvist
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Stockholm, Sweden
| | - Karin Sundström
- Department of Laboratory medicine (LABMED), Karolinska Institutet, Stockholm, Sweden
| | - Søren Brunak
- NNF Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
| | - Hercules Dalianis
- Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden
| |
Collapse
|
19
|
Henriksson A, Kvist M, Dalianis H, Duneld M. Identifying adverse drug event information in clinical notes with distributional semantic representations of context. J Biomed Inform 2015; 57:333-49. [PMID: 26291578 DOI: 10.1016/j.jbi.2015.08.013] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2015] [Revised: 07/19/2015] [Accepted: 08/10/2015] [Indexed: 10/23/2022]
Abstract
For the purpose of post-marketing drug safety surveillance, which has traditionally relied on the voluntary reporting of individual cases of adverse drug events (ADEs), other sources of information are now being explored, including electronic health records (EHRs), which give us access to enormous amounts of longitudinal observations of the treatment of patients and their drug use. Adverse drug events, which can be encoded in EHRs with certain diagnosis codes, are, however, heavily underreported. It is therefore important to develop capabilities to process, by means of computational methods, the more unstructured EHR data in the form of clinical notes, where clinicians may describe and reason around suspected ADEs. In this study, we report on the creation of an annotated corpus of Swedish health records for the purpose of learning to identify information pertaining to ADEs present in clinical notes. To this end, three key tasks are tackled: recognizing relevant named entities (disorders, symptoms, drugs), labeling attributes of the recognized entities (negation, speculation, temporality), and relationships between them (indication, adverse drug event). For each of the three tasks, leveraging models of distributional semantics - i.e., unsupervised methods that exploit co-occurrence information to model, typically in vector space, the meaning of words - and, in particular, combinations of such models, is shown to improve the predictive performance. The ability to make use of such unsupervised methods is critical when faced with large amounts of sparse and high-dimensional data, especially in domains where annotated resources are scarce.
Collapse
Affiliation(s)
- Aron Henriksson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| | - Maria Kvist
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Sweden.
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| | - Martin Duneld
- Department of Computer and Systems Sciences (DSV), Stockholm University, Sweden.
| |
Collapse
|
20
|
Abstract
OBJECTIVES We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis. METHODS We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers. RESULTS Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications. CONCLUSIONS There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.
Collapse
Affiliation(s)
- S Velupillai
- Sumithra Velupillai, Department of Computer and Systems Sciences, Stockholm University, Postbox 7003, 164 07 Kista, Sweden, Tel: +46 8 161 174, Fax: +46 8 703 9025, E-mail:
| | | | | | | | | |
Collapse
|
21
|
Velupillai S, Duneld M, Henriksson A, Kvist M, Skeppstedt M, Dalianis H. Louhi 2014: Special issue on health text mining and information analysis. BMC Med Inform Decis Mak 2015; 15 Suppl 2:S1. [PMID: 26099575 PMCID: PMC4474544 DOI: 10.1186/1472-6947-15-s2-s1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
22
|
Velupillai S, Skeppstedt M, Kvist M, Mowery D, Chapman BE, Dalianis H, Chapman WW. Cue-based assertion classification for Swedish clinical text--developing a lexicon for pyConTextSwe. Artif Intell Med 2014; 61:137-44. [PMID: 24556644 PMCID: PMC4104142 DOI: 10.1016/j.artmed.2014.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2013] [Revised: 12/19/2013] [Accepted: 01/10/2014] [Indexed: 11/17/2022]
Abstract
OBJECTIVE The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. METHODS AND MATERIAL We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. RESULTS Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83% F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. CONCLUSIONS We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Maria Skeppstedt
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Maria Kvist
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden; Department of Learning, Informatics, Management and Ethics (LIME), Karolinska Institutet, Widerström Building, Tomtebodavägen 18A, Solna, Sweden.
| | - Danielle Mowery
- Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, BAUM 423, Pittsburgh, PA 15206-3701, United States.
| | - Brian E Chapman
- Department of Radiology, University of Utah, 729 Arapeen Drive, Salt Lake City, UT 84108, United States.
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, 164 40 Kista, Sweden.
| | - Wendy W Chapman
- Department of Biomedical Informatics, University of Utah, 26 South 2000 East, Room 5775 HSEB, Salt Lake City, UT 84112-5775, United States.
| |
Collapse
|
23
|
Skeppstedt M, Kvist M, Nilsson GH, Dalianis H. Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study. J Biomed Inform 2014; 49:148-58. [DOI: 10.1016/j.jbi.2014.01.012] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 01/17/2014] [Accepted: 01/23/2014] [Indexed: 10/25/2022]
|
24
|
Ahltorp M, Skeppstedt M, Dalianis H, Kvist M. Using text prediction for facilitating input and improving readability of clinical text. Stud Health Technol Inform 2013; 192:1149. [PMID: 23920923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Text prediction has the potential for facilitating and speeding up the documentation work within health care, making it possible for health personnel to allocate less time to documentation and more time to patient care. It also offers a way to produce clinical text with fewer misspellings and abbreviations, increasing readability. We have explored how text prediction can be used for input of clinical text, and how the specific challenges of text prediction in this domain can be addressed. A text prediction prototype was constructed using data from a medical journal and from medical terminologies. This prototype achieved keystroke savings of 26% when evaluated on texts mimicking authentic clinical text. The results are encouraging, indicating that there are feasible methods for text prediction in the clinical domain.
Collapse
|
25
|
Allvin H, Carlsson E, Dalianis H, Danielsson-Ojala R, Daudaravičius V, Hassel M, Kokkinakis D, Lundgrén-Laine H, Nilsson GH, Nytrø Ø, Salanterä S, Skeppstedt M, Suominen H, Velupillai S. Characteristics of Finnish and Swedish intensive care nursing narratives: a comparative analysis to support the development of clinical language technologies. J Biomed Semantics 2011; 2 Suppl 3:S1. [PMID: 21992572 PMCID: PMC3194173 DOI: 10.1186/2041-1480-2-s3-s1] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Free text is helpful for entering information into electronic health records, but reusing it is a challenge. The need for language technology for processing Finnish and Swedish healthcare text is therefore evident; however, Finnish and Swedish are linguistically very dissimilar. In this paper we present a comparison of characteristics in Finnish and Swedish free-text nursing narratives from intensive care. This creates a framework for characterising and comparing clinical text and lays the groundwork for developing clinical language technologies. METHODS Our material included daily nursing narratives from one intensive care unit in Finland and one in Sweden. Inclusion criteria for patients were an inpatient period of least five days and an age of at least 16 years. We performed a comparative analysis as part of a collaborative effort between Finnish- and Swedish-speaking healthcare and language technology professionals that included both qualitative and quantitative aspects. The qualitative analysis addressed the content and structure of three average-sized health records from each country. In the quantitative analysis 514 Finnish and 379 Swedish health records were studied using various language technology tools. RESULTS Although the two languages are not closely related, nursing narratives in Finland and Sweden had many properties in common. Both made use of specialised jargon and their content was very similar. However, many of these characteristics were challenging regarding development of language technology to support producing and using clinical documentation. CONCLUSIONS The way Finnish and Swedish intensive care nursing was documented, was not country or language dependent, but shared a common context, principles and structural features and even similar vocabulary elements. Technology solutions are therefore likely to be applicable to a wider range of natural languages, but they need linguistic tailoring. AVAILABILITY The Finnish and Swedish data can be found at: http://www.dsv.su.se/hexanord/data/.
Collapse
Affiliation(s)
- Helen Allvin
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Elin Carlsson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Hercules Dalianis
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Riitta Danielsson-Ojala
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Vidas Daudaravičius
- Faculty of Informatics, Vytautas Magnus University, S. Daukanto g. 27 (301–309), LT-44249 Kaunas, Lithuania
| | - Martin Hassel
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Dimitrios Kokkinakis
- Department of Swedish, University of Gothenburg, Box 200, SE-405 30 Gothenburg, Sweden
| | - Heljä Lundgrén-Laine
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Gunnar H Nilsson
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Øystein Nytrø
- Department of Computer and Information Science, Norwegian University of Science and Technology, Sem Sælands vei 7-9, NO-7491 Trondheim, Norway
| | - Sanna Salanterä
- Department of Nursing Science, University of Turku and Hospital District of Southwest Finland, FI-20014 University of Turku, Turku, Finland
| | - Maria Skeppstedt
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | - Hanna Suominen
- NICTA, Canberra Research Laboratory and Australian National University, College of Engineering and Computer Science, Locked Bag 8001, ACT-2601, Canberra, Australia
| | - Sumithra Velupillai
- Department of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| |
Collapse
|
26
|
Velupillai S, Dalianis H, Kvist M. Factuality levels of diagnoses in Swedish clinical text. Stud Health Technol Inform 2011; 169:559-563. [PMID: 21893811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Different levels of knowledge certainty, or factuality levels, are expressed in clinical health record documentation. This information is currently not fully exploited, as the subtleties expressed in natural language cannot easily be machine analyzed. Extracting relevant information from knowledge-intensive resources such as electronic health records can be used for improving health care in general by e.g. building automated information access systems. We present an annotation model of six factuality levels linked to diagnoses in Swedish clinical assessments from an emergency ward. Our main findings are that overall agreement is fairly high (0.7/0.58 F-measure, 0.73/0.6 Cohen's κ, Intra/Inter). These distinctions are important for knowledge models, since only approx. 50% of the diagnoses are affirmed with certainty. Moreover, our results indicate that there are patterns inherent in the diagnosis expressions themselves conveying factuality levels, showing that certainty is not only dependent on context cues.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Dept. of Computer and Systems Sciences (DSV), Stockholm University, Forum 100, SE-164 40 Kista, Sweden
| | | | | |
Collapse
|
27
|
Dalianis H, Velupillai S. De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields. J Biomed Semantics 2010; 1:6. [PMID: 20618985 PMCID: PMC2895734 DOI: 10.1186/2041-1480-1-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Accepted: 04/12/2010] [Indexed: 12/05/2022] Open
Abstract
Background In order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-evident. Results We present work on the creation of two refined variants of a manually annotated Gold standard for de-identification, one created automatically, and one created through discussions among the annotators. The data is a subset from the Stockholm EPR Corpus, a data set available within our research group. These are used for the training and evaluation of an automatic system based on the Conditional Random Fields algorithm. Evaluating with four-fold cross-validation on sets of around 4-6 000 annotation instances, we obtained very promising results for both Gold Standards: F-score around 0.80 for a number of experiments, with higher results for certain annotation classes. Moreover, 49 false positives that were verified true positives were found by the system but missed by the annotators. Conclusions Our intention is to make this Gold standard, The Stockholm EPR PHI Corpus, available to other research groups in the future. Despite being slightly more time-consuming we believe the manual consensus gold standard is the most valuable for further research. We also propose a set of annotation classes to be used for similar de-identification tasks.
Collapse
Affiliation(s)
- Hercules Dalianis
- Department of Computer and Systems Sciences, (DSV), Stockholm University Forum 100, 164 40 Kista, Sweden.
| | | |
Collapse
|
28
|
Velupillai S, Dalianis H, Hassel M, Nilsson GH. Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial. Int J Med Inform 2009; 78:e19-26. [PMID: 19482543 DOI: 10.1016/j.ijmedinf.2009.04.005] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2008] [Revised: 03/02/2009] [Accepted: 04/09/2009] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish. METHODS This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure. RESULTS This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results. CONCLUSION Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.
Collapse
Affiliation(s)
- Sumithra Velupillai
- Department of Computer and Systems Sciences, Stockholm University/KTH, Kista, Sweden.
| | | | | | | |
Collapse
|
29
|
|
30
|
Erasmie U, Dalianis H, Ringertz H. A computer-based system for measurements and analyses in radiology. Acta Radiol 1990; 31:629-30. [PMID: 2278793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
A computer-based system for direct measurements on images and for analyses of data is presented. Distances, angles, and areas are measured on a backlighted digitizer table. Calibration corrects for actual magnification. Data are analyzed and compared to normal values by a microcomputer. The system is precise and time saving.
Collapse
Affiliation(s)
- U Erasmie
- Department of Pediatric Radiology, Sachska Barnsjukhuset, Stockholm, Sweden
| | | | | |
Collapse
|
31
|
Erasmie U, Dalianis H, Ringertz H. A Computer-Based System for Measurements and Analyses in Radiology. Acta Radiol 1990. [DOI: 10.3109/02841859009173113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
32
|
Erasmie U, Dalianis H, Ringertz H. A Computer-Based System for Measurements and Analyses in Radiology. Acta Radiol 1990. [DOI: 10.1080/02841859009173113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|