1
|
Oommen C, Howlett-Prieto Q, Carrithers MD, Hier DB. Inter-rater agreement for the annotation of neurologic signs and symptoms in electronic health records. Front Digit Health 2023; 5:1075771. [PMID: 37383943 PMCID: PMC10294690 DOI: 10.3389/fdgth.2023.1075771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
The extraction of patient signs and symptoms recorded as free text in electronic health records is critical for precision medicine. Once extracted, signs and symptoms can be made computable by mapping to signs and symptoms in an ontology. Extracting signs and symptoms from free text is tedious and time-consuming. Prior studies have suggested that inter-rater agreement for clinical concept extraction is low. We have examined inter-rater agreement for annotating neurologic concepts in clinical notes from electronic health records. After training on the annotation process, the annotation tool, and the supporting neuro-ontology, three raters annotated 15 clinical notes in three rounds. Inter-rater agreement between the three annotators was high for text span and category label. A machine annotator based on a convolutional neural network had a high level of agreement with the human annotators but one that was lower than human inter-rater agreement. We conclude that high levels of agreement between human annotators are possible with appropriate training and annotation tools. Furthermore, more training examples combined with improvements in neural networks and natural language processing should make machine annotators capable of high throughput automated clinical concept extraction with high levels of agreement with human annotators.
Collapse
Affiliation(s)
- Chelsea Oommen
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Quentin Howlett-Prieto
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Michael D. Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Daniel B. Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| |
Collapse
|
2
|
Hier DB, Yelugam R, Carrithers MD, Wunsch DC. The visualization of Orphadata neurology phenotypes. Front Digit Health 2023; 5:1064936. [PMID: 36778102 PMCID: PMC9911440 DOI: 10.3389/fdgth.2023.1064936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Accepted: 01/10/2023] [Indexed: 01/28/2023] Open
Abstract
Disease phenotypes are characterized by signs (what a physician observes during the examination of a patient) and symptoms (the complaints of a patient to a physician). Large repositories of disease phenotypes are accessible through the Online Mendelian Inheritance of Man, Human Phenotype Ontology, and Orphadata initiatives. Many of the diseases in these datasets are neurologic. For each repository, the phenotype of neurologic disease is represented as a list of concepts of variable length where the concepts are selected from a restricted ontology. Visualizations of these concept lists are not provided. We address this limitation by using subsumption to reduce the number of descriptive features from 2,946 classes into thirty superclasses. Phenotype feature lists of variable lengths were converted into fixed-length vectors. Phenotype vectors were aggregated into matrices and visualized as heat maps that allowed side-by-side disease comparisons. Individual diseases (representing a row in the matrix) were visualized as word clouds. We illustrate the utility of this approach by visualizing the neuro-phenotypes of 32 dystonic diseases from Orphadata. Subsumption can collapse phenotype features into superclasses, phenotype lists can be vectorized, and phenotypes vectors can be visualized as heat maps and word clouds.
Collapse
Affiliation(s)
- Daniel B Hier
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States.,Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Raghu Yelugam
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
| | - Michael D Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C Wunsch
- National Institute of Diabetes and Digestive and Kidney Diseases, Liver Diseases Branch, Bethesda, MD, United States
| |
Collapse
|
3
|
Howlett-Prieto Q, Oommen C, Carrithers MD, Wunsch DC, Hier DB. Subtypes of relapsing-remitting multiple sclerosis identified by network analysis. Front Digit Health 2023; 4:1063264. [PMID: 36714613 PMCID: PMC9874946 DOI: 10.3389/fdgth.2022.1063264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/22/2022] [Indexed: 01/12/2023] Open
Abstract
We used network analysis to identify subtypes of relapsing-remitting multiple sclerosis subjects based on their cumulative signs and symptoms. The electronic medical records of 113 subjects with relapsing-remitting multiple sclerosis were reviewed, signs and symptoms were mapped to classes in a neuro-ontology, and classes were collapsed into sixteen superclasses by subsumption. After normalization and vectorization of the data, bipartite (subject-feature) and unipartite (subject-subject) network graphs were created using NetworkX and visualized in Gephi. Degree and weighted degree were calculated for each node. Graphs were partitioned into communities using the modularity score. Feature maps visualized differences in features by community. Network analysis of the unipartite graph yielded a higher modularity score (0.49) than the bipartite graph (0.25). The bipartite network was partitioned into five communities which were named fatigue, behavioral, hypertonia/weakness, abnormal gait/sphincter, and sensory, based on feature characteristics. The unipartite network was partitioned into five communities which were named fatigue, pain, cognitive, sensory, and gait/weakness/hypertonia based on features. Although we did not identify pure subtypes (e.g., pure motor, pure sensory, etc.) in this cohort of multiple sclerosis subjects, we demonstrated that network analysis could partition these subjects into different subtype communities. Larger datasets and additional partitioning algorithms are needed to confirm these findings and elucidate their significance. This study contributes to the literature investigating subtypes of multiple sclerosis by combining feature reduction by subsumption with network analysis.
Collapse
Affiliation(s)
- Quentin Howlett-Prieto
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Chelsea Oommen
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Michael D. Carrithers
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C. Wunsch
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States
| | - Daniel B. Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States,Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, United States,Correspondence: Daniel B. Hier
| |
Collapse
|
4
|
Original Research: Practice Variations in Documenting Neurologic Examinations in Non-Neuroscience ICUs. Am J Nurs 2023; 123:24-30. [PMID: 36546384 DOI: 10.1097/01.naj.0000905564.83124.2d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
BACKGROUND In critical care units, the neurologic examination (neuro exam) is used to detect changes in neurologic function. Serial neuro exams are a hallmark of monitoring in neuroscience ICUs. But less is known about neuro exams that are performed in non-neuroscience ICUs. This knowledge gap likely contributes to the insufficient guidance on what constitutes an adequate neuro exam for patients admitted to a non-neuroscience ICU. PURPOSE The study purpose was to explore existing practices for documenting neuro exams in ICUs that don't routinely admit patients with a primary neurologic injury. METHODS A single-center, prospective, observational study examined documented neuro exams performed in medical, surgical, and cardiovascular ICUs. A comprehensive neuro exam assesses seven domains that can be divided into 20 components. In this study, each component was scored as present (documentation was found) or absent (documentation was not found); a domain was scored as present if one or more of its components had been documented. RESULTS There were 1,482 assessments documented on 120 patients over a one-week period. A majority of patients were male (56%), White (71%), non-Hispanic (77%), and over 60 years of age (50%). Overall, assessments of the domains of consciousness, injury severity, and cranial nerve function were documented 80% of the time or more. Assessments of the domains of pain, motor function, and sensory function were documented less than 60% of the time, and that of speech less than 5% of the time. Statistically significant differences in documentation were found between the medical, surgical, and cardiovascular ICUs for the domains of speech, cranial nerve function, and pain. There were no significant differences in documentation frequency between day and night shift nurses. Documentation practices were significantly different for RNs versus providers. CONCLUSIONS Our findings show that the frequency and specific components of neuro exam documentation vary significantly across nurses, providers, and ICUs. These findings are relevant for nurses and providers and may help to improve guidance for neurologic assessment of patients in non-neurologic ICUs. Further studies exploring variance in documentation practices and their implications for courses of treatment and patient outcomes are warranted.
Collapse
|
5
|
Azizi S, Hier DB, Wunsch II DC. Enhanced neurologic concept recognition using a named entity recognition model based on transformers. Front Digit Health 2022; 4:1065581. [PMID: 36569804 PMCID: PMC9772022 DOI: 10.3389/fdgth.2022.1065581] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022] Open
Abstract
Although deep learning has been applied to the recognition of diseases and drugs in electronic health records and the biomedical literature, relatively little study has been devoted to the utility of deep learning for the recognition of signs and symptoms. The recognition of signs and symptoms is critical to the success of deep phenotyping and precision medicine. We have developed a named entity recognition model that uses deep learning to identify text spans containing neurological signs and symptoms and then maps these text spans to the clinical concepts of a neuro-ontology. We compared a model based on convolutional neural networks to one based on bidirectional encoder representation from transformers. Models were evaluated for accuracy of text span identification on three text corpora: physician notes from an electronic health record, case histories from neurologic textbooks, and clinical synopses from an online database of genetic diseases. Both models performed best on the professionally-written clinical synopses and worst on the physician-written clinical notes. Both models performed better when signs and symptoms were represented as shorter text spans. Consistent with prior studies that examined the recognition of diseases and drugs, the model based on bidirectional encoder representations from transformers outperformed the model based on convolutional neural networks for recognizing signs and symptoms. Recall for signs and symptoms ranged from 59.5% to 82.0% and precision ranged from 61.7% to 80.4%. With further advances in NLP, fully automated recognition of signs and symptoms in electronic health records and the medical literature should be feasible.
Collapse
Affiliation(s)
- Sima Azizi
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
| | - Daniel B. Hier
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, United States
| | - Donald C. Wunsch II
- Applied Computational Intelligence Laboratory, Department of Electrical & Computer Engineering, Missouri University of Science & Technology, Rolla, MO, United States
- National Science Foundation, ECCS Division, Arlington, VA, United States
| |
Collapse
|
6
|
Rossander A, Lindsköld L, Ranerup A, Karlsson D. A State-of-the Art Review of SNOMED CT Terminology Binding and Recommendations for Practice and Research. Methods Inf Med 2021; 60:e76-e88. [PMID: 34583415 PMCID: PMC8714300 DOI: 10.1055/s-0041-1735167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/20/2021] [Indexed: 11/21/2022]
Abstract
BACKGROUND Unambiguous sharing of data requires information models and terminology in combination, but there is a lack of knowledge as to how they should be combined, leading to impaired interoperability. OBJECTIVES To facilitate creation of guidelines for SNOMED CT terminology binding we have performed a literature review to find existing recommendations and expose knowledge gaps. The primary audience is practitioners and researchers working with terminology binding. METHODS PubMed, Scopus, and Web of Science were searched for papers containing "terminology binding," "subset," "map," "information model" or "implement" and the term "SNOMED." RESULTS The search yielded 616 unique papers published from 2004 to 2020, from which 55 papers were selected and analyzed inductively. Topics described in the papers include problems related to input material, SNOMED CT, information models, and lack of appropriate tools as well as recommendations regarding competence. CONCLUSION Recommendations are given for practitioners and researchers. Many of the stated problems can be solved by better co-operation between domain experts and informaticians and better knowledge of SNOMED CT. Settings where these competences either work together or where staff with knowledge of both act as brokers are well equipped for terminology binding. Tooling is not thoroughly researched and might be a possible way to facilitate terminology binding.
Collapse
Affiliation(s)
- Anna Rossander
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Lars Lindsköld
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Agneta Ranerup
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Daniel Karlsson
- eHealth and Structured Information Unit, National Board of Health and Welfare, Stockholm, Sweden
| |
Collapse
|
7
|
Wunsch DC, Hier DB. Subsumption reduces dataset dimensionality without decreasing performance of a machine learning classifier. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021; 2021:1618-1621. [PMID: 34891595 DOI: 10.1109/embc46164.2021.9629897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
When features in a high dimension dataset are organized hierarchically, there is an inherent opportunity to reduce dimensionality. Since more specific concepts are subsumed by more general concepts, subsumption can be applied successively to reduce dimensionality. We tested whether sub-sumption could reduce the dimensionality of a disease dataset without impairing classification accuracy. We started with a dataset that had 168 neurological patients, 14 diagnoses, and 293 unique features. We applied subsumption repeatedly to create eight successively smaller datasets, ranging from 293 dimensions in the largest dataset to 11 dimensions in the smallest dataset. We tested a MLP classifier on all eight datasets. Precision, recall, accuracy, and validation declined only at the lowest dimensionality. Our preliminary results suggest that when features in a high dimension dataset are derived from a hierarchical ontology, subsumption is a viable strategy to reduce dimensionality.Clinical relevance- Datasets derived from electronic health records are often of high dimensionality. If features in the dataset are based on concepts from a hierarchical ontology, subsumption can reduce dimensionality.
Collapse
|
8
|
Dhombres F, Charlet J. Knowledge Representation and Management: Interest in New Solutions for Ontology Curation. Yearb Med Inform 2021; 30:185-190. [PMID: 34479390 PMCID: PMC8416227 DOI: 10.1055/s-0041-1726508] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Objective:
To select, present and summarize some of the best papers in the field of Knowledge Representation and Management (KRM) published in 2020.
Methods:
A comprehensive and standardized review of the medical informatics literature was performed to select the most interesting papers of KRM published in 2020, based on PubMed queries. This review was conducted according to the IMIA Yearbook guidelines.
Results:
Four best papers were selected among 1,175 publications. In contrast with the papers selected last year, the four best papers of 2020 demonstrated a significant focus on methods and tools for ontology curation and design. The usual KRM application domains (bioinformatics, machine learning, and electronic health records) were also represented.
Conclusion:
In 2020, ontology curation emerges as a significant topic of research interest. Bioinformatics, machine learning, and electronics health records remain significant research areas in the KRM community with various applications. Knowledge representations are key to advance machine learning by providing context and to develop novel bioinformatics metrics. As in 2019, representations serve a great variety of applications across many medical domains, with actionable results and now with growing adhesion to the open science initiative.
Collapse
Affiliation(s)
- Ferdinand Dhombres
- Sorbonne Université, INSERM, Univ Sorbonne Paris Nord, LIMICS, Paris, France.,Sorbonne Université, Service de Médecine Fœtale, DMU Origyne, AP-HP, Hôpital Armand Trousseau, Paris, France
| | - Jean Charlet
- Sorbonne Université, INSERM, Univ Sorbonne Paris Nord, LIMICS, Paris, France.,AP-HP, DRCI, Paris, France
| | | |
Collapse
|
9
|
Ibrahim M, Gauch S, Salman O, Alqahtani M. An automated method to enrich consumer health vocabularies using GloVe word embeddings and an auxiliary lexical resource. PeerJ Comput Sci 2021; 7:e668. [PMID: 34458573 PMCID: PMC8371999 DOI: 10.7717/peerj-cs.668] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical terminology which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies. Our approach further improves the consumer health vocabularies by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. Furthermore, the enhanced GloVe showed a statistical significance over the two ground truth datasets with P < 0.001. CONCLUSIONS This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms' ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.
Collapse
|
10
|
Ma H, Shen L, Sun H, Xu Z, Hou L, Wu S, Fang A, Li J, Qian Q. COVID term: a bilingual terminology for COVID-19. BMC Med Inform Decis Mak 2021; 21:231. [PMID: 34344385 PMCID: PMC8329642 DOI: 10.1186/s12911-021-01593-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 07/10/2021] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The coronavirus disease (COVID-19), a pneumonia caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has shown its destructiveness with more than one million confirmed cases and dozens of thousands of death, which is highly contagious and still spreading globally. World-wide studies have been conducted aiming to understand the COVID-19 mechanism, transmission, clinical features, etc. A cross-language terminology of COVID-19 is essential for improving knowledge sharing and scientific discovery dissemination. METHODS We developed a bilingual terminology of COVID-19 named COVID Term with mapping Chinese and English terms. The terminology was constructed as follows: (1) Classification schema design; (2) Concept representation model building; (3) Term source selection and term extraction; (4) Hierarchical structure construction; (5) Quality control (6) Web service. We built open access for the terminology, providing search, browse, and download services. RESULTS The proposed COVID Term include 10 categories: disease, anatomic site, clinical manifestation, demographic and socioeconomic characteristics, living organism, qualifiers, psychological assistance, medical equipment, instruments and materials, epidemic prevention and control, diagnosis and treatment technique respectively. In total, COVID Terms covered 464 concepts with 724 Chinese terms and 887 English terms. All terms are openly available online (COVID Term URL: http://covidterm.imicams.ac.cn ). CONCLUSIONS COVID Term is a bilingual terminology focused on COVID-19, the epidemic pneumonia with a high risk of infection around the world. It will provide updated bilingual terms of the disease to help health providers and medical professionals retrieve and exchange information and knowledge in multiple languages. COVID Term was released in machine-readable formats (e.g., XML and JSON), which would contribute to the information retrieval, machine translation and advanced intelligent techniques application.
Collapse
Affiliation(s)
- Hetong Ma
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Liu Shen
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Haixia Sun
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Zidu Xu
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Li Hou
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Sizhu Wu
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - An Fang
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China
| | - Qing Qian
- Institute of Medical Information/Library, Chinese Academy of Medical Sciences/Peking Union Medical College, Beijing, China.
| |
Collapse
|
11
|
Hier DB, Kopel J, Brint SU, Wunsch DC, Olbricht GR, Azizi S, Allen B. Evaluation of standard and semantically-augmented distance metrics for neurology patients. BMC Med Inform Decis Mak 2020; 20:203. [PMID: 32843023 PMCID: PMC7448345 DOI: 10.1186/s12911-020-01217-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 08/12/2020] [Indexed: 12/23/2022] Open
Abstract
Background Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features. We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics. Results Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric. Conclusion Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.
Collapse
Affiliation(s)
- Daniel B Hier
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA.
| | - Jonathan Kopel
- Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA
| | - Steven U Brint
- Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA
| | - Donald C Wunsch
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Gayla R Olbricht
- Department of Mathematics and Statistics, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Sima Azizi
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| | - Blaine Allen
- Department of Electrical and Computer Engineering, Missouri University of Science and Technology, Rolla, MO, 65401, USA
| |
Collapse
|