1
|
Abeysinghe R, Zheng F, Shi J, Lhatoo SD, Cui L. Leveraging logical definitions and lexical features to detect missing IS-A relations in biomedical terminologies. J Biomed Semantics 2024; 15:6. [PMID: 38693592 PMCID: PMC11062929 DOI: 10.1186/s13326-024-00309-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 04/22/2024] [Indexed: 05/03/2024] Open
Abstract
Biomedical terminologies play a vital role in managing biomedical data. Missing IS-A relations in a biomedical terminology could be detrimental to its downstream usages. In this paper, we investigate an approach combining logical definitions and lexical features to discover missing IS-A relations in two biomedical terminologies: SNOMED CT and the National Cancer Institute (NCI) thesaurus. The method is applied to unrelated concept-pairs within non-lattice subgraphs: graph fragments within a terminology likely to contain various inconsistencies. Our approach first compares whether the logical definition of a concept is more general than that of the other concept. Then, we check whether the lexical features of the concept are contained in those of the other concept. If both constraints are satisfied, we suggest a potentially missing IS-A relation between the two concepts. The method identified 982 potential missing IS-A relations for SNOMED CT and 100 for NCI thesaurus. In order to assess the efficacy of our approach, a random sample of results belonging to the "Clinical Findings" and "Procedure" subhierarchies of SNOMED CT and results belonging to the "Drug, Food, Chemical or Biomedical Material" subhierarchy of the NCI thesaurus were evaluated by domain experts. The evaluation results revealed that 118 out of 150 suggestions are valid for SNOMED CT and 17 out of 20 are valid for NCI thesaurus.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Fengbo Zheng
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jay Shi
- Intermountain Healthcare, Denver, CO, USA
| | - Samden D Lhatoo
- Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Licong Cui
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
2
|
Massen GM, Stone PW, Kwok HHY, Jenkins G, Allen RJ, Wain LV, Stewart I, Quint JK. Review of codelists used to define hypertension in electronic health records and development of a codelist for research. Open Heart 2024; 11:e002640. [PMID: 38626934 PMCID: PMC11029375 DOI: 10.1136/openhrt-2024-002640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND AND AIMS Hypertension is a leading risk factor for cardiovascular disease. Electronic health records (EHRs) are routinely collected throughout a person's care, recording all aspects of health status, including current and past conditions, prescriptions and test results. EHRs can be used for epidemiological research. However, there are nuances in the way conditions are recorded using clinical coding; it is important to understand the methods which have been applied to define exposures, covariates and outcomes to enable interpretation of study findings. This study aimed to identify codelists used to define hypertension in studies that use EHRs and generate recommended codelists to support reproducibility and consistency. ELIGIBILITY CRITERIA Studies included populations with hypertension defined within an EHR between January 2010 and August 2023 and were systematically identified using MEDLINE and Embase. A summary of the most frequently used sources and codes is described. Due to an absence of Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) codelists in the literature, a recommended SNOMED CT codelist was developed to aid consistency and standardisation of hypertension research using EHRs. FINDINGS 375 manuscripts met the study criteria and were eligible for inclusion, and 112 (29.9%) reported codelists. The International Classification of Diseases (ICD) was the most frequently used clinical terminology, 59 manuscripts provided ICD 9 codelists (53%) and 58 included ICD 10 codelists (52%). Informed by commonly used ICD and Read codes, usage recommendations were made. We derived SNOMED CT codelists informed by National Institute for Health and Care Excellence guidelines for hypertension management. It is recommended that these codelists be used to identify hypertension in EHRs using SNOMED CT codes. CONCLUSIONS Less than one-third of hypertension studies using EHRs included their codelists. Transparent methodology for codelist creation is essential for replication and will aid interpretation of study findings. We created SNOMED CT codelists to support and standardise hypertension definitions in EHR studies.
Collapse
Affiliation(s)
| | - Philip W Stone
- School of Public Health, Imperial College London, London, UK
| | - Harley H Y Kwok
- School of Public Health, Imperial College London, London, UK
| | - Gisli Jenkins
- National Heart and Lung Institute, Imperial College London, London, UK
| | - Richard J Allen
- Department of Population Health Sciences, University of Leicester, Leicester, UK
- NIHR Biomedical Research Centre, University of Leicester, Leicester, UK
| | - Louise V Wain
- Department of Population Health Sciences, University of Leicester, Leicester, UK
- NIHR Biomedical Research Centre, University of Leicester, Leicester, UK
| | - Iain Stewart
- National Heart and Lung Institute, Imperial College London, London, UK
| | | |
Collapse
|
3
|
Zisser M, Aran D. Transformer-based time-to-event prediction for chronic kidney disease deterioration. J Am Med Inform Assoc 2024; 31:980-990. [PMID: 38349850 PMCID: PMC10990547 DOI: 10.1093/jamia/ocae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Revised: 01/08/2024] [Accepted: 01/29/2024] [Indexed: 02/15/2024] Open
Abstract
OBJECTIVE Deep-learning techniques, particularly the Transformer model, have shown great potential in enhancing the prediction performance of longitudinal health records. Previous methods focused on fixed-time risk prediction, however, time-to-event prediction is often more appropriate for clinical scenarios. Here, we present STRAFE, a generalizable survival analysis Transformer-based architecture for electronic health records. MATERIALS AND METHODS The input for STRAFE is a sequence of visits with SNOMED-CT codes in OMOP-CDM format. A Transformer-based architecture was developed to calculate probabilities of the occurrence of the event in each of 48 months. Performance was evaluated using a real-world claims dataset of over 130 000 individuals with stage 3 chronic kidney disease (CKD). RESULTS STRAFE showed improved mean absolute error (MAE) compared to other time-to-event algorithms in predicting the time to deterioration to stage 5 CKD. Additionally, STRAFE showed an improved area under the receiver operating curve compared to binary outcome algorithms. We show that STRAFE predictions can improve the positive predictive value of high-risk patients by 3-fold. Finally, we suggest a novel visualization approach to predictions on a per-patient basis. DISCUSSION Time-to-event predictions are the most appropriate approach for clinical predictions. Our deep-learning algorithm outperformed not only other time-to-event prediction algorithms but also fixed-time algorithms, possibly due to its ability to train on censored data. We demonstrated possible clinical usage by identifying the highest-risk patients. CONCLUSIONS The ability to accurately identify patients at high risk and prioritize their needs can result in improved health outcomes, reduced costs, and more efficient use of resources.
Collapse
Affiliation(s)
- Moshe Zisser
- Faculty of Data and Decision Sciences, Technion-Israel Institute of Technology, Haifa, 3200003, Israel
| | - Dvir Aran
- Faculty of Biology, Technion-Israel Institute of Technology, Haifa, 3200003, Israel
- The Taub Faculty of Computer Science, Technion-Israel Institute of Technology, Haifa, 3200003, Israel
| |
Collapse
|
4
|
Yap SHA, Philip S, Graveling AJ, Abraham P, Downs D. Creating a SNOMED CT reference set for common endocrine disorders based on routine clinic correspondence. Clin Endocrinol (Oxf) 2024; 100:343-349. [PMID: 37555365 DOI: 10.1111/cen.14951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 06/13/2023] [Accepted: 07/13/2023] [Indexed: 08/10/2023]
Abstract
BACKGROUND Routine clinical coding of clinical outcomes in outpatient consultations still lags behind the coding of episodes of inpatient care. Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) offers an opportunity for standardised coding of key clinical information. Identifying the most commonly required SNOMED terms and grouping these into a reference set will aid future adoption in routine clinical care. OBJECTIVE To create a common endocrinology reference set to standardise the coding for outcomes of outpatient endocrine consultations, using a semi-automated extraction of information from existing clinical correspondence. METHODS Retrospective review of data from an adult tertiary outpatient endocrine clinic between 2018 and 2019. A total of 1870 patients from postcodes within two regional areas of NHS Grampian (Aberdeen City and Aberdeenshire) attended the clinic. Following consultation, an automated script extracted each problem statement which was manually coded using the 'disorder' concepts from SNOMED CT (UK edition). RESULTS The review identified 298 relevant endocrine diagnoses, 99 findings and 142 procedures. There were a total of 88 (29.5%) commonly seen endocrine conditions (e.g., Graves' disease, anterior hypopituitarism and Addison's disease) and 210 (70.5%) less commonly seen endocrine conditions. Subsequently, consultant endocrinologists completed a survey regarding the common endocrine conditions; 28 conditions have 100% agreement, 25 have 90%-99% agreement, 31 have 50%-89% agreement and 4 have less than 59% agreement (which were excluded). CONCLUSION Automated text parsing of structured endocrine correspondence allowed the creation of a SNOMED CT reference set for common endocrine disorders. This will facilitate funding and planning of service provision in endocrinology by allowing more accurate characterisation of the patient cohorts needing specialist endocrine care.
Collapse
Affiliation(s)
- Shao Hao Alan Yap
- JJR Macleod Centre for Diabetes & Endocrinology, Aberdeen Royal Infirmary, Aberdeen, UK
| | - Sam Philip
- JJR Macleod Centre for Diabetes & Endocrinology, Aberdeen Royal Infirmary, Aberdeen, UK
| | - Alex J Graveling
- JJR Macleod Centre for Diabetes & Endocrinology, Aberdeen Royal Infirmary, Aberdeen, UK
| | - Prakash Abraham
- JJR Macleod Centre for Diabetes & Endocrinology, Aberdeen Royal Infirmary, Aberdeen, UK
| | | |
Collapse
|
5
|
Lloyd L, Swan WI, Jent S, Vivanti A, Pertel DG. Worldwide Release of SNOMED CT Nutrition Care Process Terminology Problem List. J Acad Nutr Diet 2024; 124:531-534. [PMID: 38278351 DOI: 10.1016/j.jand.2024.01.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 01/19/2024] [Indexed: 01/28/2024]
Affiliation(s)
- Lyn Lloyd
- Te Toka Tumai, Te Whatu Ora - Health New Zealand, Aukland, New Zealand
| | | | - Sandra Jent
- Bachelor in Nutrition and Dietetics, Bern University of Applied Sciences, Bern, Switzerland
| | - Angela Vivanti
- Department of Nutrition and Dietetics, Princess Alexandra Hospital, Brisbane, Queensland, Australia; School of Human Movement and Nutrition Studies, University of Queensland, Queensland, Australia
| | - Donna G Pertel
- Commission on Dietetic Registration, Academy of Nutrition and Dietetics, Chicago, Illinois.
| |
Collapse
|
6
|
Nikiema JN, Liang J, Liang MQ, Dos Anjos D, Motulsky A. Improving the interoperability of drugs terminologies: Infusing local standardization with an international perspective. J Biomed Inform 2024; 151:104614. [PMID: 38395099 DOI: 10.1016/j.jbi.2024.104614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 02/10/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024]
Abstract
OBJECTIVES The objective of this study is to describe how OCRx (Canadian Drug Ontology) has been built to address the dual need for local drug information integration in Canada and alignment with international standards requirements. METHODS This paper delves into (i) the implementation efforts to meet the Identification of Medicinal Product (IDMP) requirements in OCRx, alongside the ontology update strategy, (ii) the structure of the ontology itself, (iii) the alignment approach with several reference Knowledge Organization Systems, including SNOMED CT, RxNorm, and the list of "Code Identifiant de Spécialité" (CIS-Code), and (iv) the look-up services developed to facilitate its access and utilization. RESULTS Each OCRx release contains two distinct versions: the full and the up-to-date version. The full version encompasses all drugs with a DIN code sanctioned by Health Canada, while the up-to-date version is limited to drugs currently marketed in Canada. In the last release of OCRx, the full version comprises 162,400 classes; meanwhile, the up-to-date version consists of 36,909 classes. In terms of mappings with OCRx, substances in RxNorm and SNOMED CT fall below 40%, registering at 37% and 22% respectively. Meanwhile, mappings for CIS-Code achieve coverage of 61%. The strength mappings are notably low for RxNorm at 40% and for CIS-code at 28%. This affects the mapping of clinical drugs, which are predominantly alignable through post-coordinated expressions: 56% for RxNorm, 80% for SNOMED CT, and 35% for CIS-Code. The main support service of OCRx is a look-up service known as PaperRx that displays OCRx's entities based on description logic queries (DL-queries) performed through the classified structure of OCRx. The look-up services also contain a SPARQL endpoint, an OCRx OWL file downloader, and a RESTful API. DISCUSSION The OCRx ontology demonstrates a significant effort towards integrating Canadian drug information with international standards. However, there are areas for improvement. In the future, our focus will be on refining the structure of OCRx for better classification capability and improvement of dosage conversion. Additionally, we aim to harness OCRx in constructing an ontology-based annotator, setting our sights on its deployment in real-world data integration scenarios.
Collapse
Affiliation(s)
- Jean Noël Nikiema
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Canada; Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Canada; Laboratoire Transformation Numérique en Santé (LabTNS), Canada.
| | - James Liang
- Centre de recherche en santé publique, Université de Montréal et CIUSSS du Centre-Sud-de-l'Île-de-Montréal, Canada; Laboratoire Transformation Numérique en Santé (LabTNS), Canada
| | - Man Qing Liang
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Canada; Laboratoire Transformation Numérique en Santé (LabTNS), Canada; Research Center, Centre hospitalier de l'Université de Montréal (CRCHUM), Canada
| | - Davllyn Dos Anjos
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Canada; Laboratoire Transformation Numérique en Santé (LabTNS), Canada; Research Center, Centre hospitalier de l'Université de Montréal (CRCHUM), Canada
| | - Aude Motulsky
- Department of Management, Evaluation and Health Policy, School of Public Health, Université de Montréal, Canada; Laboratoire Transformation Numérique en Santé (LabTNS), Canada; Research Center, Centre hospitalier de l'Université de Montréal (CRCHUM), Canada
| |
Collapse
|
7
|
Del-Pinto W, Schmidt RA, Gao Y, Alghamdi G, Osornio AL, Roy S. International Patient Summary Terminology. Stud Health Technol Inform 2024; 310:63-67. [PMID: 38269766 DOI: 10.3233/shti230928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
SNOMED CT is a comprehensive medical ontology used in health care sectors across the world covering a wide range of concepts that support diversity at the point of healthcare. However, not all these concepts are needed for every use case; it is better to concentrate on those parts that apply to the particular application while preserving the meaning of relevant concepts. This paper considers the application of a novel subontology extraction method to create a new resource, called the IPS terminology, which functions as a standalone ontology with the same features as SNOMED CT, but is designed for cross-border patient care. The IPS terminology has been released for free use under an open license, with the intention of promoting interoperability of health information worldwide.
Collapse
Affiliation(s)
| | | | | | - Ghadah Alghamdi
- Department of Computer Science, University of Manchester, UK
- Department of Computer Science, Dar Al-Hekma University, Kingdom of Saudi Arabia
| | | | | |
Collapse
|
8
|
Pengput A, Ceusters W. Setting the Scene to Link SNOMED CT to Realism-Based Ontologies. Stud Health Technol Inform 2024; 310:84-88. [PMID: 38269770 DOI: 10.3233/shti230932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
In a proof of concept study, we assessed the feasibility of designing a first-order logic (FOL) framework capable of translating SNOMED CT's terminological view on patient data as referencing concepts, into the realism-based view of the Basic Formal Ontology and the Ontology for General Medical Science according to which patient data represent instances of types. Because within the subject domain of this study, SNOMED CT's terminological coverage was excellent, and its EL++ axioms can be automatically translated into FOL as well as the antecedent part of bridging axioms between SNOMED CT and realism-based ontologies, we conclude that this is an area of R&D that deserves further attention and that may lead to new ways of federating terminologies with ontologies.
Collapse
Affiliation(s)
- Anuwat Pengput
- Department of Biomedical Informatics, University at Buffalo, USA
| | - Werner Ceusters
- Department of Biomedical Informatics, University at Buffalo, USA
| |
Collapse
|
9
|
Kim S, Shin SY, Hwang JE, Park HA. Current Status of SNOMED CT National Extensions and Terminology Managements. Stud Health Technol Inform 2024; 310:1345-1346. [PMID: 38270036 DOI: 10.3233/shti231187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
We reviewed and surveyed 15 SNOMEDCT national member countries for SNOMED CT national extensions and terminology managements. We found that national extensions were used for adding new contents, developing reference sets, translating, and mapping with other classification system; and terminology management varies in composition and content due to healthcare environment of each member country, eHealth strategy, and infrastructure of national release centers.
Collapse
Affiliation(s)
- Seeun Kim
- Dept. of Digital Health, SAIHST, Sungkyunkwan University, Seoul 06351, Korea
| | - Soo-Yong Shin
- Dept. of Digital Health, SAIHST, Sungkyunkwan University, Seoul 06351, Korea
| | - Ji Eun Hwang
- Dept. of Digital Health, SAIHST, Sungkyunkwan University, Seoul 06351, Korea
| | - Hyeoun-Ae Park
- College of Nursing, Seoul National University, Seoul, Korea
| |
Collapse
|
10
|
He Z, Tian S, Erdengasileng A, Hanna K, Gong Y, Zhang Z, Luo X, Lustria MLA. Annotation and Information Extraction of Consumer-Friendly Health Articles for Enhancing Laboratory Test Reporting. AMIA Annu Symp Proc 2024; 2023:407-416. [PMID: 38222337 PMCID: PMC10785897] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Viewing laboratory test results is patients' most frequent activity when accessing patient portals, but lab results can be very confusing for patients. Previous research has explored various ways to present lab results, but few have attempted to provide tailored information support based on individual patient's medical context. In this study, we collected and annotated interpretations of textual lab result in 251 health articles about laboratory tests from AHealthyMe.com. Then we evaluated transformer-based language models including BioBERT, ClinicalBERT, RoBERTa, and PubMedBERT for recognizing key terms and their types. Using BioPortal's term search API, we mapped the annotated terms to concepts in major controlled terminologies. Results showed that PubMedBERT achieved the best F1 on both strict and lenient matching criteria. SNOMED CT had the best coverage of the terms, followed by LOINC and ICD-10-CM. This work lays the foundation for enhancing the presentation of lab results in patient portals by providing patients with contextualized interpretations of their lab results and individualized question prompts that they can, in turn, refer to during physician consults.
Collapse
Affiliation(s)
- Zhe He
- School of Information, Florida State University
| | - Shubo Tian
- Department of Statistics, Florida State University
| | | | - Karim Hanna
- Department of Family Medicine, Morsani College of Medicine, University of South Florida
| | - Yang Gong
- School of Biomedical Informatics, University of Texas Health Science Center at Houston
| | - Zhan Zhang
- Seidenberg School of Computer Science and Information Systems, Pace University
| | - Xiao Luo
- Purdue School of Engineering & Technology, IUPUI
| | | |
Collapse
|
11
|
Chaturvedi J, Wang T, Velupillai S, Stewart R, Roberts A. Development of a Knowledge Graph Embeddings Model for Pain. AMIA Annu Symp Proc 2024; 2023:299-308. [PMID: 38222382 PMCID: PMC10785867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has been recorded in electronic health records. Knowledge graphs represent concepts and their relations by an interlinked network, enabling semantic and context-based reasoning in a computationally tractable form. These graphs can, however, be too large for efficient computation. Knowledge graph embeddings help to resolve this by representing the graphs in a low-dimensional vector space. These embeddings can then be used in various downstream tasks such as classification and link prediction. The various relations associated with pain which are required to construct such a knowledge graph can be obtained from external medical knowledge bases such as SNOMED CT, a hierarchical systematic nomenclature of medical terms. A knowledge graph built in this way could be further enriched with real-world examples of pain and its relations extracted from electronic health records. This paper describes the construction of such knowledge graph embedding models of pain concepts, extracted from the unstructured text of mental health electronic health records, combined with external knowledge created from relations described in SNOMED CT, and their evaluation on a subject-object link prediction task. The performance of the models was compared with other baseline models.
Collapse
Affiliation(s)
- Jaya Chaturvedi
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Tao Wang
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Angus Roberts
- Institute of Psychiatry, Psychology and Neurosciences, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
12
|
Zahra FA, Kate RJ. Obtaining clinical term embeddings from SNOMED CT ontology. J Biomed Inform 2024; 149:104560. [PMID: 38070816 DOI: 10.1016/j.jbi.2023.104560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 01/22/2024]
Abstract
Clinical term embeddings are traditionally obtained using corpus-based methods, however, these methods cannot incorporate knowledge about clinical terms which is already present in medical ontologies. On the other hand, graph-based methods can obtain embeddings of clinical concepts from ontologies, but they cannot obtain embeddings for clinical terms and words. In this paper, a novel method is presented to obtain embeddings for clinical terms and words from the SNOMED CT ontology. The method first obtains embeddings of clinical concepts from SNOMED CT using a graph-based method. Next, these concept embeddings are used as targets to train a deep learning model to map clinical terms to concepts embeddings. The learned model then provides embeddings for clinical terms and words as well as maps novel clinical terms to their embeddings. The embeddings obtained using the method out-performed corpus-based embeddings on the task of predicting clinical term similarity on five benchmark datasets. On the clinical term normalization task, using these embeddings simply as a means of computing similarity between clinical terms obtained accuracy which was competitive to methods trained specifically for this task. Both corpus-based and ontology-based embeddings have a limitation that they tend to learn similar embeddings for opposite or analogous terms. To counter this, we also introduce a method to automatically learn patterns that indicate when two clinical terms represent the same concept and when they represent different concepts. Supplementing the normalization process with these patterns showed improvement. Although clinical term embeddings obtained from SNOMED CT incorporate ontological knowledge which is missed by corpus-based embeddings, they do not incorporate linguistic knowledge which is needed for sentence-based tasks. Hence combining ontology-based embeddings with corpus-based embeddings is an avenue for future work.
Collapse
Affiliation(s)
- Fuad Abu Zahra
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA
| | - Rohit J Kate
- Department of Computer Science, University of Wisconsin-Milwaukee, Milwaukee, WI, USA.
| |
Collapse
|
13
|
Cazzaniga G, Eccher A, Munari E, Marletta S, Bonoldi E, Della Mea V, Cadei M, Sbaraglia M, Guerriero A, Dei Tos AP, Pagni F, L’Imperio V. Natural Language Processing to extract SNOMED-CT codes from pathological reports. Pathologica 2023; 115:318-324. [PMID: 38180139 PMCID: PMC10767798 DOI: 10.32074/1591-951x-952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 11/17/2023] [Indexed: 01/06/2024] Open
Abstract
Objective The use of standardized structured reports (SSR) and suitable terminologies like SNOMED-CT can enhance data retrieval and analysis, fostering large-scale studies and collaboration. However, the still large prevalence of narrative reports in our laboratories warrants alternative and automated labeling approaches. In this project, natural language processing (NLP) methods were used to associate SNOMED-CT codes to structured and unstructured reports from an Italian Digital Pathology Department. Methods Two NLP-based automatic coding systems (support vector machine, SVM, and long-short term memory, LSTM) were trained and applied to a series of narrative reports. Results The 1163 cases were tested with both algorithms, showing good performances in terms of accuracy, precision, recall, and F1 score, with SVM showing slightly better performances as compared to LSTM (0.84, 0.87, 0.83, 0.82 vs 0.83, 0.85, 0.83, 0.82, respectively). The integration of an explainability allowed identification of terms and groups of words of importance, enabling fine-tuning, balancing semantic meaning and model performance. Conclusions AI tools allow the automatic SNOMED-CT labeling of the pathology archives, providing a retrospective fix to the large lack of organization of narrative reports.
Collapse
Affiliation(s)
- Giorgio Cazzaniga
- Department of Medicine and Surgery, Pathology, IRCCS Fondazione San Gerardo dei Tintori, University of Milano-Bicocca, Italy
| | - Albino Eccher
- Section of Pathology, Department of Medical and Surgical Sciences for Children and Adults, University of Modena and Reggio Emilia, University Hospital of Modena, Modena, Italy
| | - Enrico Munari
- Department of Diagnostic and Public Health, Section of Pathology, University of Verona, Verona, Italy
| | - Stefano Marletta
- Department of Diagnostic and Public Health, Section of Pathology, University of Verona, Verona, Italy
| | - Emanuela Bonoldi
- Unit of Surgical Pathology and Cytogenetics, ASST Grande Ospedale Metropolitano Niguarda, Milan, Italy
| | - Vincenzo Della Mea
- Department of Mathematics, Computer Science and Physics, University of Udine, Udine, Italy
| | - Moris Cadei
- Pathology Unit, ASST Spedali Civili di Brescia, Brescia, Italy
| | - Marta Sbaraglia
- Surgical Pathology and Cytopathology Unit, Department of Medicine-DIMED, University of Padua School of Medicine, Padua, Italy
| | - Angela Guerriero
- Surgical Pathology and Cytopathology Unit, Department of Medicine-DIMED, University of Padua School of Medicine, Padua, Italy
| | - Angelo Paolo Dei Tos
- Surgical Pathology and Cytopathology Unit, Department of Medicine-DIMED, University of Padua School of Medicine, Padua, Italy
| | - Fabio Pagni
- Department of Medicine and Surgery, Pathology, IRCCS Fondazione San Gerardo dei Tintori, University of Milano-Bicocca, Italy
| | - Vincenzo L’Imperio
- Department of Medicine and Surgery, Pathology, IRCCS Fondazione San Gerardo dei Tintori, University of Milano-Bicocca, Italy
| |
Collapse
|
14
|
Noll R, Frischen LS, Boeker M, Storf H, Schaaf J. Machine translation of standardised medical terminology using natural language processing: A scoping review. N Biotechnol 2023; 77:120-129. [PMID: 37652265 DOI: 10.1016/j.nbt.2023.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 08/01/2023] [Accepted: 08/28/2023] [Indexed: 09/02/2023]
Abstract
Standardised medical terminologies are used to ensure accurate and consistent communication of information and to facilitate data exchange. Currently, many terminologies are only available in English, which hinders international research and automated processing of medical data. Natural language processing (NLP) and Machine Translation (MT) methods can be used to automatically translate these terms. This scoping review examines the research on automated translation of standardised medical terminology. A search was performed in PubMed and Web of Science and results were screened for eligibility by title and abstract as well as full text screening. In addition to bibliographic data, the following data items were considered: 'terminology considered', 'terms considered', 'source language', 'target language', 'translation type', 'NLP technique', 'NLP system', 'machine translation system', 'data source' and 'translation quality'. The results showed that the most frequently translated terminology is SNOMED CT (39.1%), followed by MeSH (13%), ICD (13%) and UMLS (8.7%). The most common source language is English (55.9%), and the most common target language is German (41.2%). Translation methods are often based on Statistical Machine Translation (SMT) (41.7%) and, more recently, Neural Machine Translation (NMT) (30.6%), but can also be combined with various MT methods. Commercial translators such as Google Translate (36.4%) and automatic validation methods such as BLEU (22.2%) are frequently used tools for translation and subsequent validation.
Collapse
Affiliation(s)
- Richard Noll
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany.
| | - Lena S Frischen
- University Hospital Frankfurt, Goethe University, Executive Department for medical IT-Systems and digitalization, Frankfurt, Germany
| | - Martin Boeker
- Institute for Artificial Intelligence and Informatics in Medicine, Chair of Medical Informatics, Medical Center rechts der Isar, Technical University of Munich, Munich, Germany
| | - Holger Storf
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| | - Jannik Schaaf
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| |
Collapse
|
15
|
Newbury A, Liu H, Idnay B, Weng C. The suitability of UMLS and SNOMED-CT for encoding outcome concepts. J Am Med Inform Assoc 2023; 30:1895-1903. [PMID: 37615994 PMCID: PMC10654851 DOI: 10.1093/jamia/ocad161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 06/14/2023] [Accepted: 08/02/2023] [Indexed: 08/25/2023] Open
Abstract
OBJECTIVE Outcomes are important clinical study information. Despite progress in automated extraction of PICO (Population, Intervention, Comparison, and Outcome) entities from PubMed, rarely are these entities encoded by standard terminology to achieve semantic interoperability. This study aims to evaluate the suitability of the Unified Medical Language System (UMLS) and SNOMED-CT in encoding outcome concepts in randomized controlled trial (RCT) abstracts. MATERIALS AND METHODS We iteratively developed and validated an outcome annotation guideline and manually annotated clinically significant outcome entities in the Results and Conclusions sections of 500 randomly selected RCT abstracts on PubMed. The extracted outcomes were fully, partially, or not mapped to the UMLS via MetaMap based on established heuristics. Manual UMLS browser search was performed for select unmapped outcome entities to further differentiate between UMLS and MetaMap errors. RESULTS Only 44% of 2617 outcome concepts were fully covered in the UMLS, among which 67% were complex concepts that required the combination of 2 or more UMLS concepts to represent them. SNOMED-CT was present as a source in 61% of the fully mapped outcomes. DISCUSSION Domains such as Metabolism and Nutrition, and Infections and Infectious Diseases need expanded outcome concept coverage in the UMLS and MetaMap. Future work is warranted to similarly assess the terminology coverage for P, I, C entities. CONCLUSION Computational representation of clinical outcomes is important for clinical evidence extraction and appraisal and yet faces challenges from the inherent complexity and lack of coverage of these concepts in UMLS and SNOMED-CT, as demonstrated in this study.
Collapse
Affiliation(s)
- Abigail Newbury
- Department of Biomedical Informatics, Columbia University, New York City, NY 10032, United States
| | - Hao Liu
- Department of Biomedical Informatics, Columbia University, New York City, NY 10032, United States
| | - Betina Idnay
- Department of Biomedical Informatics, Columbia University, New York City, NY 10032, United States
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York City, NY 10032, United States
| |
Collapse
|
16
|
Lokmic-Tomkins Z, Block LJ, Davies S, Reid L, Ronquillo CE, von Gerich H, Peltonen LM. Evaluating the representation of disaster hazards in SNOMED CT: gaps and opportunities. J Am Med Inform Assoc 2023; 30:1762-1772. [PMID: 37558235 PMCID: PMC10586035 DOI: 10.1093/jamia/ocad153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/20/2023] [Accepted: 07/21/2023] [Indexed: 08/11/2023] Open
Abstract
OBJECTIVE Climate change, an underlying risk driver of natural disasters, threatens the environmental sustainability, planetary health, and sustainable development goals. Incorporating disaster-related health impacts into electronic health records helps to comprehend their impact on populations, clinicians, and healthcare systems. This study aims to: (1) map the United Nations Office for Disaster Risk Reduction and International Science Council (UNDRR-ISC) Hazard Information Profiles to SNOMED CT International, a clinical terminology used by clinicians, to manage patients and provide healthcare services; and (2) to determine the extent of clinical terminologies available to capture disaster-related events. MATERIALS AND METHODS Concepts related to disasters were extracted from the UNDRR-ISC's Hazard Information Profiles and mapped to a health terminology using a procedural framework for standardized clinical terminology mapping. The mapping process involved evaluating candidate matches and creating a final list of matches to determine concept coverage. RESULTS A total of 226 disaster hazard concepts were identified to adversely impact human health. Chemical and biological disaster hazard concepts had better representation than meteorological, hydrological, extraterrestrial, geohazards, environmental, technical, and societal hazard concepts in SNOMED CT. Heatwave, drought, and geographically unique disaster hazards were not found in SNOMED CT. CONCLUSION To enhance clinical reporting of disaster hazards and climate-sensitive health outcomes, the poorly represented and missing concepts in SNOMED CT must be included. Documenting the impacts of climate change on public health using standardized clinical terminology provides the necessary real time data to capture climate-sensitive outcomes. These data are crucial for building climate-resilient healthcare systems, enhanced public health disaster responses and workflows, tracking individual health outcomes, supporting disaster risk reduction modeling, and aiding in disaster preparedness, response, and recovery efforts.
Collapse
Affiliation(s)
- Zerina Lokmic-Tomkins
- School of Nursing and Midwifery, Faculty of Medicine, Nursing and Health Sciences, Monash University, Clayton, Melbourne, Victoria, Australia
| | - Lorraine J Block
- School of Nursing, University of British Columbia, Vancouver, British Columbia, Canada
| | - Shauna Davies
- Faculty of Nursing, University of Regina, Regina, Saskatchewan, Canada
| | - Lisa Reid
- College of Nursing and Health Sciences, Flinders University, Bedford Park, South Australia, Australia
| | | | - Hanna von Gerich
- Department of Nursing Science, University of Turku, Turku, Finland
- Turku University Hospital, Turku, Finland
| | - Laura-Maria Peltonen
- Department of Nursing Science, University of Turku, Turku, Finland
- Turku University Hospital, Turku, Finland
| |
Collapse
|
17
|
Mészáros Á, Kovács S, Héja T, Bagyura Z, Zemplényi A. Mapping Hungarian procedure codes to SNOMED CT. BMC Med Res Methodol 2023; 23:240. [PMID: 37853326 PMCID: PMC10585817 DOI: 10.1186/s12874-023-02036-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 09/19/2023] [Indexed: 10/20/2023] Open
Abstract
BACKGROUND Data harmonisation is essential in real-world data (RWD) research projects based on hospital information systems databases, as coding systems differ between countries. The Hungarian hospital information systems and the national claims database use internationally known diagnosis codes, but data on medical procedures are recorded using national codes. There is no simple or standard solution for mapping the national codes to a standard coding system. Our aim was to map the Hungarian procedure codes (OENO) to SNOMED CT as part of the European Health Data Evidence Network (EHDEN) project. METHODS We recruited 25 professionals from different specialties to manually map the procedure codes used between 2011 and 2021. A mapping protocol and training material were developed, results were regularly revised, and the challenges of mapping were recorded. Approximately 7% of the codes were mapped by more people in different specialties for validation purposes. RESULTS We mapped 4661 OENO codes to standard vocabularies, mostly SNOMED CT. We categorized the challenges into three main areas: semantic, matching, and methodological. Semantic refers to the occasionally unclear meaning of the OENO codes, matching to the different granularity and purpose of the OENO and SNOMED CT vocabularies. Lastly, methodological challenges were used to describe issues related to the design of the above-mentioned two vocabularies. CONCLUSIONS The challenges and solutions presented here may help other researchers to design their process to map their national codes to standard vocabularies in order to achieve greater consistency in mapping results. Moreover, we believe that our work will allow for better use of RWD collected in Hungary in international research collaborations.
Collapse
Affiliation(s)
- Ágota Mészáros
- Department of Public Health, Semmelweis University, Budapest, Hungary.
| | - Sándor Kovács
- Faculty of Pharmacy, Center for Health Technology Assessment and Pharmacoeconomic Research, University of Pécs, Pécs, Hungary
| | - Tibor Héja
- Faculty of Pharmacy, Center for Health Technology Assessment and Pharmacoeconomic Research, University of Pécs, Pécs, Hungary
| | - Zsolt Bagyura
- Heart and Vascular Centre, Semmelweis University, Budapest, Hungary
| | - Antal Zemplényi
- Faculty of Pharmacy, Center for Health Technology Assessment and Pharmacoeconomic Research, University of Pécs, Pécs, Hungary
| |
Collapse
|
18
|
Block LJ, Lozada-Perezmitre E, Cho H, Davies S, Lee J, Lokmic-Tomkins Z, Peltonen LM, Pruinelli L, Reid L, Song J, Topaz M, von Gerich H, Vyas P. Representation of Environmental Concepts Associated with Health Impacts in Computer Standardized Clinical Terminologies. Yearb Med Inform 2023; 32:36-47. [PMID: 38147848 PMCID: PMC10751146 DOI: 10.1055/s-0043-1768746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023] Open
Abstract
OBJECTIVE To evaluate the representation of environmental concepts associated with health impacts in standardized clinical terminologies. METHODS This study used a descriptive approach with methods informed by a procedural framework for standardized clinical terminology mapping. The United Nations Global Indicator Framework for the Sustainable Development Goals and Targets was used as the source document for concept extraction. The target terminologies were the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and the International Classification for Nursing Practice (ICNP). Manual and automated mapping methods were utilized. The lists of candidate matches were reviewed and iterated until a final mapping match list was achieved. RESULTS A total of 119 concepts with 133 mapping matches were added to the final SNOMED CT list. Fifty-three (39.8%) were direct matches, 37 (27.8%) were narrower than matches, 35 (26.3%) were broader than matches, and 8 (6%) had no matches. A total of 26 concepts with 27 matches were added to the final ICNP list. Eight (29.6%) were direct matches, 4 (14.8%) were narrower than, 7 (25.9%) were broader than, and 8 (29.6%) were no matches. CONCLUSION Following this evaluation, both strengths and gaps were identified. Gaps in terminology representation included concepts related to cost expenditures, affordability, community engagement, water, air and sanitation. The inclusion of these concepts is necessary to advance the clinical reporting of these environmental and sustainability indicators. As environmental concepts encoded in standardized terminologies expand, additional insights into data and health conditions, research, education, and policy-level decision-making will be identified.
Collapse
Affiliation(s)
- Lorraine J. Block
- University of British Columbia, School of Nursing, Vancouver, British Columbia, Canada
| | | | - Hwayoung Cho
- University of Florida, Gainesville, Florida, United States
| | | | - Jisan Lee
- Department of Nursing, Gangneung-Wonju National University, Wonju, Republic of Korea
| | - Zerina Lokmic-Tomkins
- School of Nursing and Midwifery, Monash University, 10 Chancellors Walk, Clayton, Melbourne, Victoria 3800, Australia
| | | | | | - Lisa Reid
- Flinders University, Adelaide, South Australia, Australia
| | - Jiyoun Song
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | - Maxim Topaz
- Columbia University & VNS Health, New York, New York, United States
| | - Hanna von Gerich
- University of Turku, Department of Nursing Science, Turku University Hospital, Finland
| | - Pankaj Vyas
- University of Arizona, College of Nursing, Tucson, AZ, United States
| |
Collapse
|
19
|
Mayer CS, Huser V. Learning important common data elements from shared study data: The All of Us program analysis. PLoS One 2023; 18:e0283601. [PMID: 37418391 PMCID: PMC10328251 DOI: 10.1371/journal.pone.0283601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Accepted: 03/13/2023] [Indexed: 07/09/2023] Open
Abstract
There are many initiatives attempting to harmonize data collection across human clinical studies using common data elements (CDEs). The increased use of CDEs in large prior studies can guide researchers planning new studies. For that purpose, we analyzed the All of Us (AoU) program, an ongoing US study intending to enroll one million participants and serve as a platform for numerous observational analyses. AoU adopted the OMOP Common Data Model to standardize both research (Case Report Form [CRF]) and real-world (imported from Electronic Health Records [EHRs]) data. AoU standardized specific data elements and values by including CDEs from terminologies such as LOINC and SNOMED CT. For this study, we defined all elements from established terminologies as CDEs and all custom concepts created in the Participant Provided Information (PPI) terminology as unique data elements (UDEs). We found 1 033 research elements, 4 592 element-value combinations and 932 distinct values. Most elements were UDEs (869, 84.1%), while most CDEs were from LOINC (103 elements, 10.0%) or SNOMED CT (60, 5.8%). Of the LOINC CDEs, 87 (53.1% of 164 CDEs) originated from previous data collection initiatives, such as PhenX (17 CDEs) and PROMIS (15 CDEs). On a CRF level, The Basics (12 of 21 elements, 57.1%) and Lifestyle (10 of 14, 71.4%) were the only CRFs with multiple CDEs. On a value level, 61.7% of distinct values are from an established terminology. AoU demonstrates the use of the OMOP model for integrating research and routine healthcare data (64 elements in both contexts), which allows for monitoring lifestyle and health changes outside the research setting. The increased inclusion of CDEs in large studies (like AoU) is important in facilitating the use of existing tools and improving the ease of understanding and analyzing the data collected, which is more challenging when using study specific formats.
Collapse
Affiliation(s)
- Craig S. Mayer
- Lister Hill National Center for Biomedical Communication, National Library of Medicine, NIH, Bethesda, Maryland, United States of America
| | - Vojtech Huser
- Lister Hill National Center for Biomedical Communication, National Library of Medicine, NIH, Bethesda, Maryland, United States of America
| |
Collapse
|
20
|
Fu M, Yan Y, Olde Loohuis LM, Chang TS. Defining the distance between diseases using SNOMED CT embeddings. J Biomed Inform 2023; 139:104307. [PMID: 36738869 DOI: 10.1016/j.jbi.2023.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/10/2022] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
Characterizing disease relationships is essential to biomedical research to understand disease etiology and improve clinical decision-making. Measurements of distance between disease pairs enable valuable research tasks, such as subgrouping patients and identifying common time courses of disease onset. Distance metrics developed in prior work focused on smaller, targeted disease sets. Distance metrics covering all diseases have not yet been defined, which limits the applications to a broader disease spectrum. Our current study defines disease distances for all disease pairs within the International Classification of Diseases, version 10 (ICD-10), the diagnostic classification system universally used in electronic health records. Our proposed distance is computed based on a biomedical ontology, SNOMED CT (Systemized Nomenclature of Medicine, Clinical Terms), which can also be viewed as a structured knowledge graph. We compared the knowledge graph-based metric to three other distance metrics based on the hierarchical structure of ICD, clinical comorbidity, and genetic correlation, to evaluate how each may capture similar or unique aspects of disease relationships. We show that our knowledge graph-based distance metric captures known phenotypic, clinical, and molecular characteristics at a finer granularity than the other three. With the continued growth of using electronic health records data for research, we believe that our distance metric will play an important role in subgrouping patients for precision health, and enabling individualized disease prevention and treatments.
Collapse
Affiliation(s)
- Mingzhou Fu
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA; Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Yu Yan
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Timothy S Chang
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| |
Collapse
|
21
|
Roberts L. SNOMED CT: working smarter, not harder. Br J Gen Pract 2023; 73:77. [PMID: 36702604 PMCID: PMC9888579 DOI: 10.3399/bjgp23x731901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Affiliation(s)
- Luke Roberts
- Luke is a specialist in SNOMED CT implementation and application at Guy's and St Thomas' NHS Foundation Trust, London.
| |
Collapse
|
22
|
Güngör B, Deppenwiese N, Mang JM, Toddenroth D. Analysis of the Representation of Frequent Clinical Attributes in the Unified Medical Language System. Stud Health Technol Inform 2022; 299:217-222. [PMID: 36325866 DOI: 10.3233/shti220987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Mapping clinical attributes from hospital information systems to standardized terminologies may allow their scientific reuse for multicenter studies. The Unified Medical Language System (UMLS) defines synonyms in different terminologies, which could be valuable for achieving semantic interoperability between different sites. Here we aim to explore the potential relevance of UMLS concepts and associated semantic relations for widely used clinical terminologies in a German university hospital. To semi-automatically examine a sample of the 200 most frequent codes from Erlangen University Hospital for three relevant terminologies, we implemented a script that queries their UMLS representation and associated mappings via a programming interface. We found that 94% of frequent diagnostic codes were available in UMLS, and that most of these codes could be mapped to other terminologies such as SNOMED CT. We observed that all examined laboratory codes were represented in UMLS, and that various translations to other languages were available for these concepts. The classification that is most widely used in German hospital for documenting clinical procedures was not originally represented in UMLS, but external mappings to SNOMED CT allowed identifying UMLS entries for 90.5% of frequent codes. Future research could extend this investigation to other code sets and terminologies, or study the potential utility of available mappings for specific applications.
Collapse
Affiliation(s)
- Baris Güngör
- Medical Informatics, University Erlangen-Nuremberg, Germany
| | - Noemi Deppenwiese
- Medical Center for Information and Communication Technology, University Hospital Erlangen, Germany
| | - Jonathan M Mang
- Medical Center for Information and Communication Technology, University Hospital Erlangen, Germany
| | | |
Collapse
|
23
|
Liu H, Carini S, Chen Z, Phillips Hey S, Sim I, Weng C. Ontology-based categorization of clinical studies by their conditions. J Biomed Inform 2022; 135:104235. [PMID: 36283581 DOI: 10.1016/j.jbi.2022.104235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 09/24/2022] [Accepted: 10/18/2022] [Indexed: 11/20/2022]
Abstract
OBJECTIVE The free-text Condition data field in the ClinicalTrials.gov is not amenable to computational processes for retrieving, aggregating and visualizing clinical studies by condition categories. This paper contributes a method for automated ontology-based categorization of clinical studies by their conditions. MATERIALS AND METHODS Our method first maps text entries in ClinicalTrials.gov's Condition field to standard condition concepts in the OMOP Common Data Model by using SNOMED CT as a reference ontology and using Usagi for concept normalization, followed by hierarchical traversal of the SNOMED ontology for concept expansion, ontology-driven condition categorization, and visualization. We compared the accuracy of this method to that of the MeSH-based method. RESULTS We reviewed the 4,506 studies on Vivli.org categorized by our method. Condition terms of 4,501 (99.89%) studies were successfully mapped to SNOMED CT concepts, and with a minimum concept mapping score threshold, 4,428 (98.27%) studies were categorized into 31 predefined categories. When validating with manual categorization results on a random sample of 300 studies, our method achieved an estimated categorization accuracy of 95.7%, while the MeSH-based method had an accuracy of 85.0%. CONCLUSION We showed that categorizing clinical studies using their Condition terms with referencing to SNOMED CT achieved a better accuracy and coverage than using MeSH terms. The proposed ontology-driven condition categorization was useful to create accurate clinical study categorization that enables clinical researchers to aggregate evidence from a large number of clinical studies.
Collapse
Affiliation(s)
- Hao Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Simona Carini
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Zhehuan Chen
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | | | - Ida Sim
- Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA.
| |
Collapse
|
24
|
Ali M, Evans H, Whitney P, Minhas F, Snead DRJ. Using Systemised Nomenclature of Medicine (SNOMED) codes to select digital pathology whole slide images for long-term archiving. J Clin Pathol 2022; 76:349-352. [PMID: 36109157 PMCID: PMC10176345 DOI: 10.1136/jcp-2022-208483] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 08/20/2022] [Indexed: 11/04/2022]
Abstract
The archiving of whole slide images represents a hurdle to digital pathology implementation largely because of the amount of data generated. The retention of glass slides is currently recommended for a minimum of 10 years, but it is for individual departments to determine how digital images are archived and for how long. In a retrospective study, we examined the combination of Systemised Nomenclature of Medicine (SNOMED) codes allocated to cases reported between July 2011 and December 2015 and recalled more than 12 months after diagnosis in comparison to non-recalled cases.Our results show that 0.2% of cases are recalled after 12 months, and SNOMED code combinations can be used to identify which cases are likely to be recalled and which are not. This approach could reduce the number of cases archived by 62% and still ensure all cases likely to be recalled remain in the archive.
Collapse
Affiliation(s)
- Mahmoud Ali
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
- Histopathology Department, Cambridge University Hospitals NHS Foundation Trust, Cambridge, Cambridgeshire, UK
| | - Harriet Evans
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
| | - Peter Whitney
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
| | - Fayyaz Minhas
- Department of Computer Science, University of Warwick, Coventry, West Midlands, UK
| | - David R J Snead
- Histopathology Department, University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK
- Warick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
25
|
Neely B, Shahsahebi M, Marks CE, Power S, Kanter A, Howell C, Hyslop T, Plichta JK. Design and Evaluation of a Computational Phenotype to Identify Patients With Metastatic Breast Cancer Within the Electronic Health Record. JCO Clin Cancer Inform 2022; 6:e2200056. [PMID: 36179272 PMCID: PMC9848550 DOI: 10.1200/cci.22.00056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Revised: 07/15/2022] [Accepted: 08/30/2022] [Indexed: 01/21/2023] Open
Abstract
PURPOSE Outcomes for patients with metastatic breast cancer (MBC) are continually improving as more effective treatments become available. Granular data sets of this unique population are lacking, and the standard method for data collection relies largely on chart review. Therefore, using electronic health records (EHR) collected at a tertiary hospital system, we developed and evaluated a computational phenotype designed to identify all patients with MBC, and we compared the effectiveness of this algorithm against the gold standard, clinical chart review. METHODS A cohort of patients with breast cancer were identified according to International Classification of Diseases codes, the institutional tumor registry, and SNOMED codes. Chart review was performed to determine whether distant metastases had occurred. We developed a computational phenotype, on the basis of SNOMED concept IDs, which was applied to the EHR to identify patients with MBC. Contingency tables were used to aggregate and compare results. RESULTS A total of 1,741 patients with breast cancer were identified using data from International Classification of Diseases codes, the tumor registry, and/or SNOMED concept identifiers. Chart review of all patients classified each patient as having MBC (n = 416; 23.9%) versus not (n = 1,325; 75.9%). The final computational phenotype successfully classified 1,646 patients (95% accuracy; 82% sensitivity; 99% specificity). CONCLUSION Hospital systems with robust EHRs and reliable mapping to SNOMED have the ability to use standard codes to derive computational phenotypes. These algorithms perform reasonably well and have the added ability to be run at disparate health care facilities. Better tooling to navigate the polyhierarchical structure of SNOMED ontology could yield better-performing computational phenotypes.
Collapse
Affiliation(s)
| | - Mohammad Shahsahebi
- Duke Cancer Institute, Durham, NC
- Department of Family Medicine and Community Health, Duke University, Durham, NC
| | - Caitlin E. Marks
- Department of Surgery, Duke University Medical Center, Durham, NC
| | | | | | | | - Terry Hyslop
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC
| | - Jennifer K. Plichta
- Duke Cancer Institute, Durham, NC
- Department of Surgery, Duke University Medical Center, Durham, NC
| |
Collapse
|
26
|
Kunz S, Zgraggen C, Sariyar M. Mapping SNOMED CT Codes to Semi-Structured Texts via an NLP Pipeline. Stud Health Technol Inform 2022; 295:390-393. [PMID: 35773893 DOI: 10.3233/shti220747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In the project presented here, we used NLP tools for annotating German medical trainings documents with SNOMED CT codes. Following research question was addressed: Is it possible to automate the annotation of training documents with an NLP pipeline especially designed for this task but requiring translation into English? The goal of our stakeholder, an institution responsible for the continuing education of physicians, was to facilitate the switch between different medical trainings programs by coding the same requirement with the same SNOMED CT code, even if the wording is different. We first describe how we chose the concrete NLP tools, after which the concrete steps for implementing our prototype are outlined: the NLP pipeline construction, the implementation, and the validation. We infer three important lessons from our results: (i) self-supervision is no free lunch and should be based on a sophisticated task, (ii) the translation via DeepL can be too context-dependent for a peculiar use case, and (iii) ontology extraction can increase efficiency as well as accuracy.
Collapse
|
27
|
Spotnitz M, Patterson J, Huser V, Weng C, Natarajan K. Harmonization of Measurement Codes for Concept-Oriented Lab Data Retrieval. Stud Health Technol Inform 2022; 290:12-16. [PMID: 35672961 DOI: 10.3233/shti220022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Measurement concepts are essential to observational healthcare research; however, a lack of concept harmonization limits the quality of research that can be done on multisite research networks. We developed five methods that used a combination of automated, semi-automated and manual approaches for generating measurement concept sets. We validated our concept sets by calculating their frequencies in cohorts from the Columbia University Irving Medical Center (CUIMC) database. For heart transplant patients, the preoperative frequencies of basic metabolic panel concept sets, which we generated by a semi-automated approach, were greater than 99%. We also made concept sets for lumbar puncture and coagulation panels, by automated and manual methods respectively.
Collapse
Affiliation(s)
- Matthew Spotnitz
- Columbia University Medical Center Department of Biomedical Informatics
| | - Jason Patterson
- Columbia University Medical Center Department of Biomedical Informatics
| | | | - Chunhua Weng
- Columbia University Medical Center Department of Biomedical Informatics
| | - Karthik Natarajan
- Columbia University Medical Center Department of Biomedical Informatics
| |
Collapse
|
28
|
Kang H, Park HA. Mapping Korean National Health Insurance Reimbursement Claim Codes for Therapeutic and Surgical Procedures to SNOMED-CT to Facilitate Data Reuse. Stud Health Technol Inform 2022; 290:101-105. [PMID: 35672979 DOI: 10.3233/shti220040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
South Korea has a public and single-payer system for healthcare services based on fee-for-service payments. The National Health Insurance (NHI) reimbursement claim codes are used by all healthcare providers for reimbursement. This study mapped NHI reimbursement claim codes for therapeutic and surgical procedures to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) to facilitate semantic interoperability and data reuse for research. The Source codes for mapping were 2,500 reimbursement claim codes for therapeutic and surgical procedures such as surgery, endoscopic procedures, and interventional radiology. The target terminology for mapping was the 'Procedure' hierarchy of the international edition of SNOMED-CT released in July 2019. We translated Korean terms into English, clarified their meaning, extracted characteristics of the source codes, and mapped them to pre-coordinated concepts. If a source concept was not mapped to a pre-coordinated concept, we mapped it to a post-coordinated expression. The mapping results were validated internally using dual independent mapping and group discussion by trained terminologists, and by two physicians with experience of SNOMED-CT mapping. Out of 2,500 source codes, 1,298 (51.9%) codes were mapped to pre-coordinated concepts, and 1,202 (48.1%) codes were mapped to post-coordinated expressions. The mapping of the NHI reimbursement claim codes for therapeutic and surgical procedures to SNOMED-CT is expected to support clinical research by facilitating the utilization of health insurance claim data.
Collapse
Affiliation(s)
- Hannah Kang
- College of Nursing, Seoul National University, Seoul, South Korea
| | - Hyeoun-Ae Park
- College of Nursing, Seoul National University, Seoul, South Korea
| |
Collapse
|
29
|
Saadi A, Rogier A, Burgun A, Tsopra R. Design of an Ontology-Based Triage System for Patients with Chronic Pain. Stud Health Technol Inform 2022; 290:81-85. [PMID: 35672975 DOI: 10.3233/shti220036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
OBJECTIVE Waiting time for a consultation for chronic pain is a widespread health problem. This paper presents the design of an ontology use to assess patients referred to a consultation for chronic pain. METHODS We designed OntoDol, an ontology of pain domain for patient triage based on priority degrees. Terms were extracted from clinical practice guidelines and mapped to SNOMED-CT concepts through the Python module Owlready2. Selected SNOMED-CT concepts, relationships, and the TIME ontology, were implemented in the ontology using Protégé. Decision rules were implemented with SWRL. We evaluated OntoDol on 5 virtual cases. RESULTS OntoDol contains 762 classes, 92 object properties and 18 SWRL rules to assign patients to 4 categories of priority. OntoDol was able to assert every case and classify them in the right category of priority. CONCLUSION Further works will extend OntoDol to other diseases and assess OntoDol with real world data from the hospital.
Collapse
Affiliation(s)
- Alexandre Saadi
- INSERM, Université de Paris, Sorbonne Université, Centre de Recherche des Cordeliers, F-75006 Paris, France
- Department of Evaluation and Treatment of Pain, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
- INRIA, HeKA, Inria Paris, France
| | - Alice Rogier
- INSERM, Université de Paris, Sorbonne Université, Centre de Recherche des Cordeliers, F-75006 Paris, France
- INRIA, HeKA, Inria Paris, France
- Department of Medical Informatics, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
| | - Anita Burgun
- INSERM, Université de Paris, Sorbonne Université, Centre de Recherche des Cordeliers, F-75006 Paris, France
- INRIA, HeKA, Inria Paris, France
- Department of Medical Informatics, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
| | - Rosy Tsopra
- INSERM, Université de Paris, Sorbonne Université, Centre de Recherche des Cordeliers, F-75006 Paris, France
- INRIA, HeKA, Inria Paris, France
- Department of Medical Informatics, Hôpital Européen Georges-Pompidou, AP-HP, Paris, France
| |
Collapse
|
30
|
Schulz S, Boeker M, Prunotto A. Validation of Multiple Path Translation for SNOMED CT Localisation. Stud Health Technol Inform 2022; 294:961-962. [PMID: 35612259 DOI: 10.3233/shti220641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The MTP (multiple translation paths) approach supports human translators in clinical terminology localization. It exploits the results of web-based machine translation tools and generates, for a chosen target language, a scored output of translation candidates for each input terminology code. We present first results of a validation, using four SNOMED CT benchmarks and three translation engines. For German as target language, there was a significant advantage of MTP as a generator of plausible translation candidate lists, and a moderate advantage of the top-ranked MTP translation candidate over single best performing direct-translation approaches.
Collapse
Affiliation(s)
| | - Martin Boeker
- Institute for AI in Healthcare, Technical University of Munich, Germany
| | - Andrea Prunotto
- Institute for AI in Healthcare, Technical University of Munich, Germany
| |
Collapse
|
31
|
Ohlsen T, Kruse V, Krupar R, Banach A, Ingenerf J, Drenkhahn C. Mapping of ICD-O Tuples to OncoTree Codes Using SNOMED CT Post-Coordination. Stud Health Technol Inform 2022; 294:307-311. [PMID: 35612082 DOI: 10.3233/shti220464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Around 500,000 oncological diseases are diagnosed in Germany every year which are documented using the International Classification of Diseases for Oncology (ICD-O). Apart from this, another classification for oncology, OncoTree, is often used for the integration of new research findings in oncology. For this purpose, a semi-automatic mapping of ICD-O tuples to OncoTree codes was developed. The implementation uses a FHIR terminology server, pre-coordinated or post-coordinated SNOMED CT expressions, and subsumption testing. Various validations have been applied. The results were compared with reference data of scientific papers and manually evaluated by a senior pathologist, confirming the applicability of SNOMED CT in general and its post-coordinated expressions in particular as a viable intermediate mapping step. Resulting in an agreement of 84,00 % between the newly developed approach and the manual mapping, it becomes obvious that the present approach has the potential to be used in everyday medical practice.
Collapse
Affiliation(s)
- Tessa Ohlsen
- Institute for Medical Informatics, University of Lübeck, Lübeck, Germany
| | - Valerie Kruse
- Clinic for Hematology and Oncology, UKSH, Lübeck, Germany
| | - Rosemarie Krupar
- Pathology of the Research Center Borstel, Leibniz Lung Center, Borstel, Germany
| | - Alexandra Banach
- Institute for Medical Informatics, University of Lübeck, Lübeck, Germany
| | - Josef Ingenerf
- Institute for Medical Informatics, University of Lübeck, Lübeck, Germany
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
| | - Cora Drenkhahn
- IT Center for Clinical Research, University of Lübeck, Lübeck, Germany
| |
Collapse
|
32
|
Reinecke I, Kallfelz M, Sedlmayr M, Siebel J, Bathelt F. Evaluation and Challenges of Medical Procedure Data Harmonization to SNOMED-CT for Observational Research. Stud Health Technol Inform 2022; 294:405-406. [PMID: 35612106 DOI: 10.3233/shti220484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The relevance of health data research on real world data (RWD) is increasing. To prepare national RWD for international research, harmonization with standard terminologies is required. In this paper, we evaluate to what extent the German OPS vocabulary in OHDSI covers codes present in RWD and mappings to SNOMED-CT. The evaluation identified a mapping gap of 21.1% in the RWD set.
Collapse
Affiliation(s)
- Ines Reinecke
- Carl Gustav Carus Faculty of Medicine, Center for Medical Informatics, Institute for Medical Informatics and Biometry, Technische Universität Dresden, Dresden, Germany
| | | | - Martin Sedlmayr
- Carl Gustav Carus Faculty of Medicine, Center for Medical Informatics, Institute for Medical Informatics and Biometry, Technische Universität Dresden, Dresden, Germany
| | - Joscha Siebel
- Carl Gustav Carus Faculty of Medicine, Center for Medical Informatics, Institute for Medical Informatics and Biometry, Technische Universität Dresden, Dresden, Germany
| | - Franziska Bathelt
- Carl Gustav Carus Faculty of Medicine, Center for Medical Informatics, Institute for Medical Informatics and Biometry, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
33
|
Vander Stichele R, Kalra D. Aggregations of Substance in Virtual Drug Models Based on ISO/CEN Standards for Identification of Medicinal Products (IDMP). Stud Health Technol Inform 2022; 294:377-381. [PMID: 35612100 DOI: 10.3233/shti220478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In this study representation of chemical substances in IDMP is reviewed, with an exploration of aggregation levels for substance used in the virtual drug data models of RxNorm, SNOMED-CT, ATC/INN, and the Belgian SAM database, for products with a single substance and combinations of substances. Active moiety and available solid states forms are explored for diclofenac, amoxicillin, carbamazepine, amlodipine, with regard to their representation in coding systems such as WHODrug, SMS, UNII, CAS, and SNOMED-CT. By counting the number of medicinal products in Belgium for amlodipine in each level of aggregation, concepts for grouper of substances and two levels of grouper of medicinal products are illustrated. Recommendations are made for the further development of IDMP and its link to international drug classifications.
Collapse
Affiliation(s)
| | - Dipak Kalra
- European Institute for Innovation through Health Data, Belgium
| |
Collapse
|
34
|
Abeysinghe R, Zheng F, Cui L. A Comparison of Exhaustive and Non-lattice-based Methods for Auditing Hierarchical Relations in Gene Ontology. AMIA Annu Symp Proc 2022; 2021:177-186. [PMID: 35308995 PMCID: PMC8861660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Uncovering and fixing errors in biomedical terminologies is essential so that they provide accurate knowledge to downstream applications that rely on them. Non-lattice-based methods have been applied to identify various kinds of inconsistencies in different biomedical terminologies. In previous work, we have introduced two inference-based approaches that were applied in an exhaustive manner to audit hierarchical relations in the Gene Ontology: (1) Lexical-based inference framework, and (2) Subsumption-based sub-term inference framework. However, it is unclear how effective these exhaustive approaches perform compared with their corresponding non-lattice-based approaches. Therefore, in this paper, we implement the non-lattice versions of these two exhaustive approaches, and perform a comprehensive comparison between non-lattice-based and exhaustive approaches to audit the Gene Ontology. The domain expert evaluations performed for the two exhaustive approaches are leveraged to evaluate the non-lattice versions. The results indicate that the non-lattice versions have increased precision than their exhaustive counterparts even though they do not capture some of the potential inconsistencies that the exhaustive approaches identify.
Collapse
Affiliation(s)
- Rashmie Abeysinghe
- Department of Neurology, University of Texas Health Science Center at Houston, Houston, TX
| | - Fengbo Zheng
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX
| |
Collapse
|
35
|
Ram A, Kronk CA, Eleazer JR, Goulet JL, Brandt CA, Wang KH. Transphobia, encoded: an examination of trans-specific terminology in SNOMED CT and ICD-10-CM. J Am Med Inform Assoc 2022; 29:404-410. [PMID: 34569604 PMCID: PMC8757305 DOI: 10.1093/jamia/ocab200] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 08/02/2021] [Accepted: 09/02/2021] [Indexed: 11/13/2022] Open
Abstract
Transgender people experience harassment, denial of services, and physical assault during healthcare visits. Electronic health record (EHR) structure and language can exacerbate the harm they experience by using transphobic terminology, emphasizing binary genders, and pathologizing transness. Here, we investigate the ways in which SNOMED CT and ICD-10-CM record gender-related terminology and explore their shortcomings as they contribute to this EHR-mediated violence. We discuss how this "standardized" gender-related medical terminology pathologizes transness, fails to accommodate nonbinary patients, and uses derogatory and outmoded language. We conclude that there is no easy fix to the transphobia beleaguering healthcare, provide options to reduce harm to patients, and ultimately call for a critical examination of medicine's role in transphobia. We aim to demonstrate the ways in which the [mis]use and [mis]understanding of gender-specific terminology in healthcare settings has harmed and continues to harm trans people by grounding our discussion in our personal experiences.
Collapse
Affiliation(s)
- A Ram
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Clair A Kronk
- Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - Jacob R Eleazer
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut, USA
| | - Joseph L Goulet
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Cynthia A Brandt
- VA Connecticut Healthcare System, West Haven, Connecticut, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Karen H Wang
- Equity Research and Innovation Center, Yale School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
36
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
37
|
Stroganov O, Fedarovich A, Wong E, Skovpen Y, Pakhomova E, Grishagin I, Fedarovich D, Khasanova T, Merberg D, Szalma S, Bryant J. Mapping of UK Biobank clinical codes: Challenges and possible solutions. PLoS One 2022; 17:e0275816. [PMID: 36525430 PMCID: PMC9757572 DOI: 10.1371/journal.pone.0275816] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 09/23/2022] [Indexed: 12/23/2022] Open
Abstract
OBJECTIVE The UK Biobank provides a rich collection of longitudinal clinical data coming from different healthcare providers and sources in England, Wales, and Scotland. Although extremely valuable and available to a wide research community, the heterogeneous dataset contains inconsistent medical terminology that is either aligned to several ontologies within the same category or unprocessed. To make these data useful to a research community, data cleaning, curation, and standardization are needed. Significant efforts to perform data reformatting, mapping to any selected ontologies (such as SNOMED-CT) and harmonization are required from any data user to integrate UK Biobank hospital inpatient and self-reported data, data from various registers with primary care (GP) data. The integrated clinical data would provide a more comprehensive picture of one's medical history. MATERIALS AND METHODS We evaluated several approaches to map GP clinical Read codes to International Classification of Diseases (ICD) and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) terminologies. The results were compared, mapping inconsistencies were flagged, a quality category was assigned to each mapping to evaluate overall mapping quality. RESULTS We propose a curation and data integration pipeline for harmonizing diagnosis. We also report challenges identified in mapping Read codes from UK Biobank GP tables to ICD and SNOMED CT. DISCUSSION AND CONCLUSION Some of the challenges-the lack of precise one-to-one mapping between ontologies or the need for additional ontology to fully map terms-are general reflecting trade-offs to be made at different steps. Other challenges are due to automatic mapping and can be overcome by leveraging existing mappings, supplemented with automated and manual curation.
Collapse
Affiliation(s)
- Oleg Stroganov
- Rancho BioSciences, LLC, San Diego, California, United States of America
- * E-mail:
| | - Alena Fedarovich
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Emily Wong
- Takeda Development Center Americas, Inc., San Diego, California, United States of America
| | - Yulia Skovpen
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Elena Pakhomova
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Ivan Grishagin
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Dzmitry Fedarovich
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - Tania Khasanova
- Rancho BioSciences, LLC, San Diego, California, United States of America
| | - David Merberg
- Takeda Development Center Americas, Inc., Cambridge, Massachusetts, United States of America
| | - Sándor Szalma
- Takeda Development Center Americas, Inc., San Diego, California, United States of America
| | - Julie Bryant
- Rancho BioSciences, LLC, San Diego, California, United States of America
| |
Collapse
|
38
|
Rossander A, Lindsköld L, Ranerup A, Karlsson D. A State-of-the Art Review of SNOMED CT Terminology Binding and Recommendations for Practice and Research. Methods Inf Med 2021; 60:e76-e88. [PMID: 34583415 PMCID: PMC8714300 DOI: 10.1055/s-0041-1735167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Accepted: 05/20/2021] [Indexed: 11/21/2022]
Abstract
BACKGROUND Unambiguous sharing of data requires information models and terminology in combination, but there is a lack of knowledge as to how they should be combined, leading to impaired interoperability. OBJECTIVES To facilitate creation of guidelines for SNOMED CT terminology binding we have performed a literature review to find existing recommendations and expose knowledge gaps. The primary audience is practitioners and researchers working with terminology binding. METHODS PubMed, Scopus, and Web of Science were searched for papers containing "terminology binding," "subset," "map," "information model" or "implement" and the term "SNOMED." RESULTS The search yielded 616 unique papers published from 2004 to 2020, from which 55 papers were selected and analyzed inductively. Topics described in the papers include problems related to input material, SNOMED CT, information models, and lack of appropriate tools as well as recommendations regarding competence. CONCLUSION Recommendations are given for practitioners and researchers. Many of the stated problems can be solved by better co-operation between domain experts and informaticians and better knowledge of SNOMED CT. Settings where these competences either work together or where staff with knowledge of both act as brokers are well equipped for terminology binding. Tooling is not thoroughly researched and might be a possible way to facilitate terminology binding.
Collapse
Affiliation(s)
- Anna Rossander
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Lars Lindsköld
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Agneta Ranerup
- Department of Applied Information Technology, University of Gothenburg, Göteborg, Sweden
| | - Daniel Karlsson
- eHealth and Structured Information Unit, National Board of Health and Welfare, Stockholm, Sweden
| |
Collapse
|
39
|
Meredith J, Whitehead N, Dacey M. Utilising the FOXS Stack for FAIR Architected Data Access. Stud Health Technol Inform 2021; 287:134-138. [PMID: 34795097 DOI: 10.3233/shti210832] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A FOXS stack assembles HL7 FHIR, openEHR, IHE XDS and SNOMED CT as an operational clinical data platform to build digital systems. This paper analyses its applicability for FAIR-enabled medical research based on a summary of key principles. It highlights the benefit of the blended approach to operational technology stacks for health systems, and a need for industry standard technologies to enable greater semantic coherence for primary/secondary data use.
Collapse
|
40
|
Kraljevic Z, Searle T, Shek A, Roguski L, Noor K, Bean D, Mascio A, Zhu L, Folarin AA, Roberts A, Bendayan R, Richardson MP, Stewart R, Shah AD, Wong WK, Ibrahim Z, Teo JT, Dobson RJB. Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit. Artif Intell Med 2021; 117:102083. [PMID: 34127232 DOI: 10.1016/j.artmed.2021.102083] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 03/24/2021] [Accepted: 04/28/2021] [Indexed: 11/30/2022]
Abstract
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over ∼8.8B words from ∼17M clinical records and further fine-tuning with ∼6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.
Collapse
Affiliation(s)
- Zeljko Kraljevic
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Thomas Searle
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Anthony Shek
- Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Lukasz Roguski
- Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK
| | - Kawsar Noor
- Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK
| | - Daniel Bean
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK
| | - Aurelie Mascio
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Leilei Zhu
- Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK
| | - Amos A Folarin
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Angus Roberts
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Rebecca Bendayan
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Mark P Richardson
- Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Robert Stewart
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK
| | - Anoop D Shah
- Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK
| | - Wai Keong Wong
- Institute of Health Informatics, University College London, London, UK; NIHR BRC Clinical Research Informatics Unit, University College London Hospitals, NHS Foundation Trust, London, UK
| | - Zina Ibrahim
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - James T Teo
- Department of Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Department of Neurology, King's College Hospital NHS Foundation Trust, London, UK
| | - Richard J B Dobson
- Department of Biostatistics and Health Informatics, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK; Health Data Research UK London, University College London, London, UK; Institute of Health Informatics, University College London, London, UK; NIHR Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London, London, UK.
| |
Collapse
|
41
|
Austin RR, Lu SC, Geiger-Simpson E, Ringdahl D, Pruinelli L, Lindquist R, Koithan M, Monsen KA, Kreitzer MJ, Delaney CW. Evaluating Systemized Nomenclature of Medicine Clinical Terms Coverage of Complementary and Integrative Health Therapy Approaches Used Within Integrative Nursing, Health, and Medicine. Comput Inform Nurs 2021; 39:1000-1006. [PMID: 34074871 DOI: 10.1097/cin.0000000000000764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The use of complementary and integrative health therapy strategies for a wide variety of health conditions is increasing and is rapidly becoming mainstream. However, little is known about how or if complementary and integrative health therapies are represented in the EHR. Standardized terminologies provide an organizing structure for health information that enable EHR representation and support shareable and comparable data; which may contribute to increased understanding of which therapies are being used for whom and for what purposes. Use of standardized terminologies is recommended for interoperable clinical data to support sharable, comparable data to enable the use of complementary and integrative health therapies and to enable research on outcomes. In this study, complementary and integrative health therapy terms were extracted from multiple sources and organized using the National Center for Complementary and Integrative Health and former National Center for Complementary and Alternative Medicine classification structures. A total of 1209 complementary and integrative health therapy terms were extracted. After removing duplicates, the final term list was generated via expert consensus. The final list included 578 terms, and these terms were mapped to Systemized Nomenclature of Medicine Clinical Terms. Of the 578, approximately half (48.1%) were found within Systemized Nomenclature of Medicine Clinical Terms. Levels of specificity of terms differed between National Center for Complementary and Integrative Health and National Center for Complementary and Alternative Medicine classification structures and Systemized Nomenclature of Medicine Clinical Terms. Future studies should focus on the terms not mapped to Systemized Nomenclature of Medicine Clinical Terms (51.9%), to formally submit terms for inclusion in Systemized Nomenclature of Medicine Clinical Terms, toward leveraging the data generated by use of these terms to determine associations among treatments and outcomes.
Collapse
Affiliation(s)
- Robin R Austin
- Author Affiliations: School of Nursing (Dr Austin, Mr Lu, and Drs Geiger-Simpson, Ringdahl, Pruinelli, Lindquist, Monsen, and Delaney) and Earl E. Bakken Center for Spiritualty and Healing (Drs Austin, Ringdahl, Lindquist, and Monsen), University of Minnesota, Minneapolis; and College of Nursing, University of Arizona, Tucson (Dr Koithan)
| | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
Data integration is an increasing need in medical informatics projects like the EU Precise4Q project, in which multidisciplinary semantically and syntactically heterogeneous data across several institutions needs to be integrated. Besides, data sharing agreements often allow a virtual data integration only, because data cannot leave the source repository. We propose a data harmonization infrastructure in which data is virtually integrated by sharing a semantically rich common data representation that allows their homogeneous querying. This common data model integrates content from well-known biomedical ontologies like SNOMED CT by using the BTL2 upper level ontology, and is imported into a graph database. We successfully integrated three datasets and made some test queries showing the feasibility of the approach.
Collapse
Affiliation(s)
- Catalina Martinez-Costa
- University of Murcia, Murcia, Spain
- Biomedical Research Institute of Murcia (IMIB-Arrixaca), Murcia, Spain
| | - Francisco Abad-Navarro
- University of Murcia, Murcia, Spain
- Biomedical Research Institute of Murcia (IMIB-Arrixaca), Murcia, Spain
| |
Collapse
|
43
|
Haffer N, Thun S. Postcoordination of LOINC Codes in SNOMED CT. Stud Health Technol Inform 2021; 278:19-26. [PMID: 34042871 DOI: 10.3233/shti210045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
The objectives of this paper are to analyze the terminologies SNOMED CT and Logical Observation Identifiers Names and Codes (LOINC) and to provide a guideline for the translation of LOINC concepts to SNOMED CT. Verified research data sets were used for this study, so this experiment is replicable with other research data. 50 LOINC concepts of frequently performed laboratory services were translated to SNOMED CT. Information would be lost with pre-coordinated mapping but the compositional grammar of SNOMED CT allows for the linking of individual concepts into complicated postcoordinated expressions including all embedded information in LOINC concepts. All information can thus be transferred smoothly to SNOMED CT.
Collapse
Affiliation(s)
- Nina Haffer
- Hochschule für Technik und Wirtschaft (HTW) - University of Applied Sciences, Berlin, Germany
- Berlin Institute of Health (BIH), Germany
| | - Sylvia Thun
- Berlin Institute of Health (BIH), Germany
- Charité - Universitätsmedizin Berlin, Germany
- Hochschule Niederrhein - University of Applied Sciences, Krefeld, Germany
| |
Collapse
|
44
|
Isaradech N, Khumrin P. Auto-mapping Clinical Documents to ICD-10 using SNOMED-CT. AMIA Jt Summits Transl Sci Proc 2021; 2021:296-304. [PMID: 34457144 PMCID: PMC8378640] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Excessive paperwork is a considerable issue that leads to additional burdens for health-care professionals. In Thai health-care systems, physicians manually review medical records to select an appropriate principle diagnosis and other co-morbidities and convert them into ICD-10s to claim financial support from the government. Accordingly, 160,000 ICD-10 codes and 46,000 in-patient discharge summaries are documented by physicians at Maharaj Nakorn Chiang Mai hospital each year. As a result, to decrease physicians' burden of manual paper-work, we created a new approach to automatically analyse discharge summary notes and map the diagnoses to ICD-10s. We combined SNOMED-CT and natural language processing techniques within the approach through 3 steps: cleaning data; extracting keywords from discharge summary notes; and matching keywords to ICD-10. In this paper, we present that mapping clinical documents by using approximate matching and SNOMED-CT shows potential to be used for automating the ICD-10 mapping process.
Collapse
Affiliation(s)
| | - Piyapong Khumrin
- Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
- Biomedical Informatics Center, Department of Family Medicine, Faculty of Medicine, Chiang Mai University, Chiang Mai, Thailand
| |
Collapse
|
45
|
López-Úbeda P, Pomares-Quimbaya A, Díaz-Galiano MC, Schulz S. Collecting specialty-related medical terms: Development and evaluation of a resource for Spanish. BMC Med Inform Decis Mak 2021; 21:145. [PMID: 33947365 PMCID: PMC8094531 DOI: 10.1186/s12911-021-01495-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 04/03/2021] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Controlled vocabularies are fundamental resources for information extraction from clinical texts using natural language processing (NLP). Standard language resources available in the healthcare domain such as the UMLS metathesaurus or SNOMED CT are widely used for this purpose, but with limitations such as lexical ambiguity of clinical terms. However, most of them are unambiguous within text limited to a given clinical specialty. This is one rationale besides others to classify clinical text by the clinical specialty to which they belong. RESULTS This paper addresses this limitation by proposing and applying a method that automatically extracts Spanish medical terms classified and weighted per sub-domain, using Spanish MEDLINE titles and abstracts as input. The hypothesis is biomedical NLP tasks benefit from collections of domain terms that are specific to clinical subdomains. We use PubMed queries that generate sub-domain specific corpora from Spanish titles and abstracts, from which token n-grams are collected and metrics of relevance, discriminatory power, and broadness per sub-domain are computed. The generated term set, called Spanish core vocabulary about clinical specialties (SCOVACLIS), was made available to the scientific community and used in a text classification problem obtaining improvements of 6 percentage points in the F-measure compared to the baseline using Multilayer Perceptron, thus demonstrating the hypothesis that a specialized term set improves NLP tasks. CONCLUSION The creation and validation of SCOVACLIS support the hypothesis that specific term sets reduce the level of ambiguity when compared to a specialty-independent and broad-scope vocabulary.
Collapse
Affiliation(s)
| | | | | | - Stefan Schulz
- Medical University of Graz, Auenbruggerpl No 2, 8036 Graz, Austria
| |
Collapse
|
46
|
Millares Martin P. Non-systematic review: Correspondence quality and interoperability between family physicians and hospital clinicians. Int J Clin Pract 2021; 75:e13984. [PMID: 33484081 DOI: 10.1111/ijcp.13984] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 01/03/2021] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Medical correspondence between physicians working in the community and in hospital is paramount to provide continuity of care, but there is no agreement on what constitutes a good quality letter, not even interest by some clinicians on this interface. Information flow could be faster electronically rather than in paper, but is content improving? What defines a good letter? AIM (a) To assess what information should be shared between family doctors and hospital physicians and could it be shared better. (b) To assess the possibility of linking the sections of the letter to SNOMED-CT codes to improve interoperability. RESULTS Authors vary regarding what is to be included in communications, and as they also have different needs among services, it creates a very long list of possible items to consider. Standardised templates with their corresponding SNOMED-CT codes are presented. CONCLUSION Standardised correspondence could improve continuity of care. Appropriately coded it could facilitate the information sharing and the data manipulation required to provide an adequate provision of services among primary care or family physicians and hospitals or secondary care organisations. It could also serve as a tool to assess clinicians' performance.
Collapse
|
47
|
Jones PG, Gardener M. Referral for investigation: a redundant SNOMED-CT chief presenting complaint. N Z Med J 2021; 134:39-44. [PMID: 33582706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
AIM The Ministry of Health has mandated that all emergency department (ED) presentations are coded using the Systematised Nomenclature of Medicine - Clinical Terms (SNOMED-CT) from 2021. The current ED reference set contains the non-specific term 'Referral for investigation' in the list of available chief presenting complaints (CPCs). The aim of this study was to determine the rate of use of this term and how often a more specific (and therefore more clinically useful) term was used. METHOD This was a cross-sectional audit of routinely collected presenting complaint data, supplemented by a retrospective case note review. RESULTS 'Referral for investigation' was used for 497/9,067 (5.5%, 95%CI 5-6%) presentations, with increased use for urgent cases. An alternative CPC was available in 467/497 (94.0%, 95%CI 92-96%) of cases from the existing reference set. Of 98 different CPCs, the common alternatives were: 'Chest pain' (6.4%), 'Shortness of breath' (4.2%) 'Abdominal pain' (3.6%), 'Altered mental status' (3.4%) and 'Postoperative complication' (3.2%). Six of 13 cardiac arrests and eight of 63 of multiple trauma cases were coded as 'Referral for investigation'. With the addition of two new terms to the New Zealand reference set ('Abnormal blood test' and 'Radiology request'), each of the remaining 30 presentations would have an alternative and more accurate CPC. CONCLUSION 'Referral for investigation' should be removed from the New Zealand emergency department reference set for chief presenting complaints to improve data quality.
Collapse
Affiliation(s)
- Peter G Jones
- Adult Emergency Department, Auckland City Hospital, Auckland District Health Board; Department of Surgery, School of Medicine, University of Auckland
| | - Mark Gardener
- Adult Emergency Department, Auckland City Hospital, Auckland District Health Board
| |
Collapse
|
48
|
Zheng F, Shi J, Cui L. A lexical-based approach for exhaustive detection of missing hierarchical IS-A relations in SNOMED CT. AMIA Annu Symp Proc 2021; 2020:1392-1401. [PMID: 33936515 PMCID: PMC8075518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Incompleteness of ontologies affects the quality of downstream ontology-based applications. In this paper, we introduce a novel lexical-based approach to automatically detect potentially missing hierarchical IS-A relations in SNOMED CT. We model each concept with an enriched set of lexical features, by leveraging words and noun phrases in the name of the concept itself and the concept's ancestors. Then we perform subset inclusion checking to suggest potentially missing IS-A relations between concepts. We applied our approach to the September 2017 release of SNOMED CT (US edition) which suggested a total of 38,615 potentially missing IS-A relations. For evaluation, a domain expert reviewed a random sample of 100 missing IS-A relations selected from the "Clinical finding" sub-hierarchy, and confirmed 90 are valid (a precision of 90%). Additional review of invalid suggestions further revealed incorrect existing IS-A relations. Our results demonstrate that systematic analysis of the enriched lexical features of concepts is an effective approach to identify potentially missing hierarchical IS-A relations in SNOMED CT.
Collapse
Affiliation(s)
- Fengbo Zheng
- Department of Computer Science, University of Kentucky, Lexington, KY
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX
| | - Jay Shi
- Department of Internal Medicine, University of Kentucky, Lexington, KY
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX
| |
Collapse
|
49
|
Ostropolets A, Reich C, Ryan P, Weng C, Molinaro A, DeFalco F, Jonnagaddala J, Liaw ST, Jeon H, Park RW, Spotnitz ME, Natarajan K, Argyriou G, Kostka K, Miller R, Williams A, Minty E, Posada J, Hripcsak G. Characterizing database granularity using SNOMED-CT hierarchy. AMIA Annu Symp Proc 2021; 2020:983-992. [PMID: 33936474 PMCID: PMC8075504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Multi-center observational studies require recognition and reconciliation of differences in patient representations arising from underlying populations, disparate coding practices and specifics of data capture. This leads to different granularity or detail of concepts representing the clinical facts. For researchers studying certain populations of interest, it is important to ensure that concepts at the right level are used for the definition of these populations. We studied the granularity of concepts within 22 data sources in the OHDSI network and calculated a composite granularity score for each dataset. Three alternative SNOMED-based approaches for such score showed consistency in classifying data sources into three levels of granularity (low, moderate and high), which correlated with the provenance of data and country of origin. However, they performed unsatisfactorily in ordering data sources within these groups and showed inconsistency for small data sources. Further studies on examining approaches to data source granularity are needed.
Collapse
Affiliation(s)
| | | | - Patrick Ryan
- Columbia University, New York, NY, USA
- Janssen Epidemiology Analytics, Janssen Research & Development, Titusville, NJ, USA
| | | | - Anthony Molinaro
- Janssen Epidemiology Analytics, Janssen Research & Development, Titusville, NJ, USA
| | - Frank DeFalco
- Janssen Epidemiology Analytics, Janssen Research & Development, Titusville, NJ, USA
| | | | | | - Hokyun Jeon
- Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Rae Woong Park
- Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | | | | | | | | | - Robert Miller
- Tufts Medical Center, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA
| | - Andrew Williams
- Tufts Medical Center, Institute for Clinical Research and Health Policy Studies, Boston, MA, USA
| | - Evan Minty
- O'Brien Centre for Population Health, Faculty of Medicine, University of Calgary, Canada
| | - Jose Posada
- Stanford Center for Biomedical Informatics Research, Stanford, CA, USA
| | - George Hripcsak
- Columbia University, New York, NY, USA
- Medical Informatics Services, New York-Presbyterian Hospital, New York, NY, USA
| |
Collapse
|
50
|
Abstract
Biological and biomedical ontologies and terminologies are used to organize and store various domain-specific knowledge to provide standardization of terminology usage and to improve interoperability. The growing number of such ontologies and terminologies and their increasing adoption in clinical, research and healthcare settings call for effective and efficient quality assurance and semantic enrichment techniques of these ontologies and terminologies. In this editorial, we provide an introductory summary of nine articles included in this supplement issue for quality assurance and enrichment of biological and biomedical ontologies and terminologies. The articles cover a range of standards including SNOMED CT, National Cancer Institute Thesaurus, Unified Medical Language System, North American Association of Central Cancer Registries and OBO Foundry Ontologies.
Collapse
Affiliation(s)
- Ankur Agrawal
- Department of Computer Science, Manhattan College, New York, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|