1
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
2
|
Dholakia D, Kalra A, Misir BR, Kanga U, Mukerji M. HLA-SPREAD: a natural language processing based resource for curating HLA association from PubMed abstracts. BMC Genomics 2022; 23:10. [PMID: 34991484 PMCID: PMC8740486 DOI: 10.1186/s12864-021-08239-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2021] [Accepted: 12/07/2021] [Indexed: 11/16/2022] Open
Abstract
Extreme complexity in the Human Leukocyte Antigens (HLA) system and its nomenclature makes it difficult to interpret and integrate relevant information for HLA associations with diseases, Adverse Drug Reactions (ADR) and Transplantation. PubMed search displays ~ 146,000 studies on HLA reported from diverse locations. Currently, IPD-IMGT/HLA (Robinson et al., Nucleic Acids Research 48:D948-D955, 2019) database houses data on 28,320 HLA alleles. We developed an automated pipeline with a unified graphical user interface HLA-SPREAD that provides a structured information on SNPs, Populations, REsources, ADRs and Diseases information. Information on HLA was extracted from ~ 28 million PubMed abstracts extracted using Natural Language Processing (NLP). Python scripts were used to mine and curate information on diseases, filter false positives and categorize to 24 tree hierarchical groups and named Entity Recognition (NER) algorithms followed by semantic analysis to infer HLA association(s). This resource from 109 countries and 40 ethnic groups provides interesting insights on: markers associated with allelic/haplotypic association in autoimmune, cancer, viral and skin diseases, transplantation outcome and ADRs for hypersensitivity. Summary information on clinically relevant biomarkers related to HLA disease associations with mapped susceptible/risk alleles are readily retrievable from HLASPREAD. The resource is available at URL http://hla-spread.igib.res.in/ . This resource is first of its kind that can help uncover novel patterns in HLA gene-disease associations.
Collapse
Affiliation(s)
- Dhwani Dholakia
- Institute of Genomics and Integrative Biology-Council of Scientific and Industrial Research, New Delhi, 110025, India.
- Academy of Scientific and Innovative Research, Ghaziabad, 201002, India.
| | - Ankit Kalra
- Netaji Subhas University of Technology, New Delhi, 110078, India
| | - Bishnu Raman Misir
- Centre of Excellence for Applied Development of Ayurveda, Prakriti and Genomics, CSIR- IGIB, Delhi, 110007, India
| | - Uma Kanga
- All India Institute of Medical Sciences, New Delhi, 110029, India
| | - Mitali Mukerji
- Institute of Genomics and Integrative Biology-Council of Scientific and Industrial Research, New Delhi, 110025, India.
- Centre of Excellence for Applied Development of Ayurveda, Prakriti and Genomics, CSIR- IGIB, Delhi, 110007, India.
- Present Address: Department of Bioscience and Bioengineering, Indian Institute of Technology, Jodhpur, Rajasthan, 342037, India.
| |
Collapse
|
3
|
Koumakis L, Schera F, Parker H, Bonotis P, Chatzimina M, Argyropaidas P, Zacharioudakis G, Schäfer M, Kakalou C, Karamanidou C, Didi J, Kazantzaki E, Scarfo L, Marias K, Natsiavas P. Fostering Palliative Care Through Digital Intervention: A Platform for Adult Patients With Hematologic Malignancies. Front Digit Health 2021; 3:730722. [PMID: 34977857 PMCID: PMC8718505 DOI: 10.3389/fdgth.2021.730722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 11/16/2021] [Indexed: 11/13/2022] Open
Abstract
Patient-reported outcomes (PROs) are an emerging paradigm in clinical research and healthcare, aiming to capture the patient's self-assessed health status in order to gauge efficacy of treatment from their perspective. As these patient-generated health data provide insights into the effects of healthcare processes in real-life settings beyond the clinical setting, they can also be viewed as a resolution beyond what can be gleaned directly by the clinician. To this end, patients are identified as a key stakeholder of the healthcare decision making process, instead of passively following their doctor's guidance. As this joint decision-making process requires constant and high-quality communication between the patient and his/her healthcare providers, novel methodologies and tools have been proposed to promote richer and preemptive communication to facilitate earlier recognition of potential complications. To this end, as PROs can be used to quantify the patient impact (especially important for chronic conditions such as cancer), they can play a prominent role in providing patient-centric care. In this paper, we introduce the MyPal platform that aims to support adults suffering from hematologic malignancies, focusing on the technical design and highlighting the respective challenges. MyPal is a Horizon 2020 European project aiming to support palliative care for cancer patients via the electronic PROs (ePROs) paradigm, building upon modern eHealth technologies. To this end, MyPal project evaluate the proposed eHealth intervention via clinical studies and assess its potential impact on the provided palliative care. More specifically, MyPal platform provides specialized applications supporting the regular answering of well-defined and standardized questionnaires, spontaneous symptoms reporting, educational material provision, notifications etc. The presented platform has been validated by end-users and is currently in the phase of pilot testing in a clinical study to evaluate its feasibility and its potential impact on the quality of life of palliative care patients with hematologic malignancies.
Collapse
Affiliation(s)
- Lefteris Koumakis
- Institute of Computer Science, Foundation for Research and Technology–Hellas (FORTH), Heraklion, Greece
- *Correspondence: Lefteris Koumakis
| | - Fatima Schera
- Institute for Biomedical Engineering, Sulzbach, Germany
| | | | - Panos Bonotis
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Maria Chatzimina
- Institute of Computer Science, Foundation for Research and Technology–Hellas (FORTH), Heraklion, Greece
| | - Panagiotis Argyropaidas
- Institute of Computer Science, Foundation for Research and Technology–Hellas (FORTH), Heraklion, Greece
| | - Giorgos Zacharioudakis
- Institute of Computer Science, Foundation for Research and Technology–Hellas (FORTH), Heraklion, Greece
| | | | - Christine Kakalou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Christina Karamanidou
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| | - Jana Didi
- Center for Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czechia
| | - Eleni Kazantzaki
- Department of Hematology, University Hospital of Heraklion, Heraklion, Greece
| | - Lydia Scarfo
- Universita Vita-Salute San Raffaele, Milan, Italy
| | - Kostas Marias
- Institute of Computer Science, Foundation for Research and Technology–Hellas (FORTH), Heraklion, Greece
| | - Pantelis Natsiavas
- Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
| |
Collapse
|
4
|
Medical social networks content mining for a semantic annotation. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-021-00848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
5
|
Alag S. Unique insights from ClinicalTrials.gov by mining protein mutations and RSids in addition to applying the Human Phenotype Ontology. PLoS One 2020; 15:e0233438. [PMID: 32459809 PMCID: PMC7252633 DOI: 10.1371/journal.pone.0233438] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2019] [Accepted: 05/05/2020] [Indexed: 01/31/2023] Open
Abstract
Researchers and clinicians face a significant challenge in keeping up-to-date with the rapid rate of new associations between genetic mutations and diseases. To remedy this problem, this research mined the ClinicalTrials.gov corpus to extract relevant biological insights, produce unique reports to summarize findings, and make the meta-data available via APIs. An automated text-analysis pipeline performed the following features: parsing the ClinicalTrials.gov files, extracting and analyzing mutations from the corpus, mapping clinical trials to Human Phenotype Ontology (HPO), and finding associations between clinical trials and HPO nodes. Unique reports were created for each mutation (SNPs and protein mutations) mentioned in the corpus, as well as for each clinical trial that references a mutation. These reports, which have been run over multiple time points, along with APIs to access meta-data, are freely available at http://snpminertrials.com. Additionally, HPO was used to normalize disease terms and associate clinical trials with relevant genes. The creation of the pipeline and reports, the association of clinical trials with HPO terms, and the insights, public repository, and APIs produced are all novel in this work. The freely-available resources present relevant biological information and novel insights between biomedical entities in a robust and accessible manner, mitigating the challenge of being informed about new associations between mutations, genes, and diseases.
Collapse
Affiliation(s)
- Shray Alag
- The Harker School, San Jose, CA, United States of America
- * E-mail:
| |
Collapse
|
6
|
Chen D, Zhang R, Qiu RG. Leveraging Semantics in WordNet to Facilitate the Computer-Assisted Coding of ICD-11. IEEE J Biomed Health Inform 2020; 24:1469-1476. [DOI: 10.1109/jbhi.2019.2949567] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
7
|
Kondylakis H, Bucur A, Crico C, Dong F, Graf N, Hoffman S, Koumakis L, Manenti A, Marias K, Mazzocco K, Pravettoni G, Renzi C, Schera F, Triberti S, Tsiknakis M, Kiefer S. Patient empowerment for cancer patients through a novel ICT infrastructure. J Biomed Inform 2019; 101:103342. [PMID: 31816400 DOI: 10.1016/j.jbi.2019.103342] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 11/15/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022]
Abstract
As a result of recent advances in cancer research and "precision medicine" approaches, i.e. the idea of treating each patient with the right drug at the right time, more and more cancer patients are being cured, or might have to cope with a life with cancer. For many people, cancer survival today means living with a complex and chronic condition. Surviving and living with or beyond cancer requires the long-term management of the disease, leading to a significant need for active rehabilitation of the patients. In this paper, we present a novel methodology employed in the iManageCancer project for cancer patient empowerment in which personal health systems, serious games, psychoemotional monitoring and other novel decision-support tools are combined into an integrated patient empowerment platform. We present in detail the ICT infrastructure developed and our evaluation with the involvement of cancer patients on two sites, a large-scale pilot for adults and a small-scale test for children. The evaluation showed mixed evidences on the improvement of patient empowerment, while ability to cope with cancer, including improvement in mood and resilience to cancer, increased for the participants of the adults' pilot.
Collapse
Affiliation(s)
| | - Anca Bucur
- PHILIPS Research Europe, Eindhoven, The Netherlands
| | | | - Feng Dong
- Department of Computer Science and Technology, University of Bedfordshire, Luton, UK
| | - Norbert Graf
- Saarland University, Pediatric Oncology and Hematology, Homburg, Germany
| | | | | | | | - Kostas Marias
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion, Greece
| | | | | | | | - Fatima Schera
- Fraunhofer Institute for Biomedical Engineering, Germany
| | | | | | - Stephan Kiefer
- Fraunhofer Institute for Biomedical Engineering, Germany
| |
Collapse
|
8
|
Desterke C, Chiappini F. Lipid Related Genes Altered in NASH Connect Inflammation in Liver Pathogenesis Progression to HCC: A Canonical Pathway. Int J Mol Sci 2019; 20:ijms20225594. [PMID: 31717414 PMCID: PMC6888337 DOI: 10.3390/ijms20225594] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 11/03/2019] [Accepted: 11/04/2019] [Indexed: 02/06/2023] Open
Abstract
Nonalcoholic steatohepatitis (NASH) is becoming a public health problem worldwide. While the number of research studies on NASH progression rises every year, sometime their findings are controversial. To identify the most important and commonly described findings related to NASH progression, we used an original bioinformatics, integrative, text-mining approach that combines PubMed database querying and available gene expression omnibus dataset. We have identified a signature of 25 genes that are commonly found to be dysregulated during steatosis progression to NASH and cancer. These genes are implicated in lipid metabolism, insulin resistance, inflammation, and cancer. They are functionally connected, forming the basis necessary for steatosis progression to NASH and further progression to hepatocellular carcinoma (HCC). We also show that five of the identified genes have genome alterations present in HCC patients. The patients with these genes associated to genome alteration are associated with a poor prognosis. In conclusion, using an integrative literature- and data-mining approach, we have identified and described a canonical pathway underlying progression of NASH. Other parameters (e.g., polymorphisms) can be added to this pathway that also contribute to the progression of the disease to cancer. This work improved our understanding of the molecular basis of NASH progression and will help to develop new therapeutic approaches.
Collapse
Affiliation(s)
| | - Franck Chiappini
- Laboratoire Croissance, Régénération, Réparation et Régénération Tissulaires (CRRET)/ EAC CNRS 7149, Univ Paris-Est Créteil (UPEC), F-94010 Créteil, France
- Correspondence: ; Tel.: +33-(0)1-45177080; Fax: +33-(0)1-45171816
| |
Collapse
|
9
|
Sarwar DM, Kalbasi R, Gennari JH, Carlson BE, Neal ML, Bono BD, Atalag K, Hunter PJ, Nickerson DP. Model annotation and discovery with the Physiome Model Repository. BMC Bioinformatics 2019; 20:457. [PMID: 31492098 PMCID: PMC6731580 DOI: 10.1186/s12859-019-2987-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Accepted: 07/09/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Mathematics and Phy sics-based simulation models have the potential to help interpret and encapsulate biological phenomena in a computable and reproducible form. Similarly, comprehensive descriptions of such models help to ensure that such models are accessible, discoverable, and reusable. To this end, researchers have developed tools and standards to encode mathematical models of biological systems enabling reproducibility and reuse, tools and guidelines to facilitate semantic description of mathematical models, and repositories in which to archive, share, and discover models. Scientists can leverage these resources to investigate specific questions and hypotheses in a more efficient manner. RESULTS We have comprehensively annotated a cohort of models with biological semantics. These annotated models are freely available in the Physiome Model Repository (PMR). To demonstrate the benefits of this approach, we have developed a web-based tool which enables users to discover models relevant to their work, with a particular focus on epithelial transport. Based on a semantic query, this tool will help users discover relevant models, suggesting similar or alternative models that the user may wish to explore or use. CONCLUSION The semantic annotation and the web tool we have developed is a new contribution enabling scientists to discover relevant models in the PMR as candidates for reuse in their own scientific endeavours. This approach demonstrates how semantic web technologies and methodologies can contribute to biomedical and clinical research. The source code and links to the web tool are available at https://github.com/dewancse/model-discovery-tool.
Collapse
Affiliation(s)
- Dewan M Sarwar
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Reza Kalbasi
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - John H Gennari
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Brian E Carlson
- Molecular & Integrative Physiology, University of Michigan, Ann Arbor, Michigan, USA
| | - Maxwell L Neal
- Center for Global Infectious Disease Research, Seattle Children's Research Institute, Seattle, Washington, USA
| | - Bernard de Bono
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Koray Atalag
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - Peter J Hunter
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand
| | - David P Nickerson
- Auckland Bioengineering Institute, University of Auckland, Auckland, New Zealand.
| |
Collapse
|
10
|
Follett L, Geletta S, Laugerman M. Quantifying risk associated with clinical trial termination: A text mining approach. Inf Process Manag 2019. [DOI: 10.1016/j.ipm.2018.11.009] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
11
|
Iatraki G, Kondylakis H, Koumakis L, Chatzimina M, Kazantzaki E, Marias K, Tsiknakis M. Personal Health Information Recommender: implementing a tool for the empowerment of cancer patients. Ecancermedicalscience 2018; 12:851. [PMID: 30079113 PMCID: PMC6057655 DOI: 10.3332/ecancer.2018.851] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2017] [Indexed: 11/25/2022] Open
Abstract
Nowadays, patients have a wealth of information available on the Internet. Despite the potential benefits of Internet health information seeking, several concerns have been raised about the quality of information and about the patient’s capability to evaluate medical information and to relate it to their own disease and treatment. As such, novel tools are required to effectively guide patients and provide high-quality medical information in an intelligent and personalised manner. With this aim, this paper presents the Personal Health Information Recommender (PHIR), a system to empower patients by enabling them to search in a high-quality document repository selected by experts, avoiding the information overload of the Internet. In addition, the information provided to the patients is personalised, based on individual preferences, medical conditions and other profiling information. Despite the generality of our approach, we apply the PHIR to a personal health record system constructed for cancer patients and we report on the design, the implementation and a preliminary validation of the platform. To the best of our knowledge, our platform is the only one combining natural language processing, ontologies and personal information to offer a unique user experience.
Collapse
Affiliation(s)
- Galatia Iatraki
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | | | - Lefteris Koumakis
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Maria Chatzimina
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Eleni Kazantzaki
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece
| | - Kostas Marias
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece.,Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion GR71004, Greece
| | - Manolis Tsiknakis
- Computational BioMedicine Laboratory, FORTH-ICS, Heraklion GR70013, Greece.,Department of Informatics Engineering, Technological Educational Institute of Crete, Heraklion GR71004, Greece
| |
Collapse
|
12
|
Desterke C, Slim R, candelier JJ. A bioinformatics transcriptome meta-analysis highlights the importance of trophoblast differentiation in the pathology of hydatidiform moles. Placenta 2018; 65:29-36. [DOI: 10.1016/j.placenta.2018.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Revised: 03/26/2018] [Accepted: 04/06/2018] [Indexed: 11/25/2022]
|
13
|
Gu H, He Z, Wei D, Elhanan G, Chen Y. Validating UMLS Semantic Type Assignments Using SNOMED CT Semantic Tags. Methods Inf Med 2018; 57:43-53. [PMID: 29621830 DOI: 10.3414/me17-01-0120] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
BACKGROUND The UMLS assigns semantic types to all its integrated concepts. The semantic types are widely used in various natural language processing tasks in the biomedical domain, such as named entity recognition, semantic disambiguation, and semantic annotation. Due to the size of the UMLS, erroneous semantic type assignments are hard to detect. It is imperative to devise automated techniques to identify errors and inconsistencies in semantic type assignments. OBJECTIVES Designing a methodology to perform programmatic checks to detect semantic type assignment errors for UMLS concepts with one or more SNOMED CT terms and evaluating concepts in a selected set of SNOMED CT hierarchies to verify our hypothesis that UMLS semantic type assignment errors may exist in concepts residing in semantically inconsistent groups. METHODS Our methodology is a four-stage process. 1) partitioning concepts in a SNOMED CT hierarchy into semantically uniform groups based on their assigned semantic tags; 2) partitioning concepts in each group from 1) into the disjoint sub-groups based on their semantic type assignments; 3) mapping all SNOMED CT semantic tags into one or more semantic types in the UMLS; 4) identifying semantically inconsistent groups that have inconsistent assignments between semantic tags and semantic types according to the mapping from 3) and providing concepts in such groups to the domain experts for reviewing. RESULTS We applied our method on the UMLS 2013AA release. Concepts of the semantically inconsistent groups in the PHYSICAL FORCE and RECORD ARTIFACT hierarchies have error rates 33% and 62.5% respectively, which are greatly larger than error rates 0.6% and 1% in semantically consistent groups of the two hierarchies. CONCLUSION Concepts in semantically in - consistent groups are more likely to contain semantic type assignment errors. Our methodology can make auditing more efficient by limiting auditing resources on concepts of semantically inconsistent groups.
Collapse
|
14
|
Nursing Education and the 21st Century Library. Nurse Educ 2018; 43:170-172. [DOI: 10.1097/nne.0000000000000461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
15
|
Jovanović J, Bagheri E. Semantic annotation in biomedicine: the current landscape. J Biomed Semantics 2017; 8:44. [PMID: 28938912 PMCID: PMC5610427 DOI: 10.1186/s13326-017-0153-x] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2016] [Accepted: 09/17/2017] [Indexed: 01/12/2023] Open
Abstract
The abundance and unstructured nature of biomedical texts, be it clinical or research content, impose significant challenges for the effective and efficient use of information and knowledge stored in such texts. Annotation of biomedical documents with machine intelligible semantics facilitates advanced, semantics-based text management, curation, indexing, and search. This paper focuses on annotation of biomedical entity mentions with concepts from relevant biomedical knowledge bases such as UMLS. As a result, the meaning of those mentions is unambiguously and explicitly defined, and thus made readily available for automated processing. This process is widely known as semantic annotation, and the tools that perform it are known as semantic annotators.Over the last dozen years, the biomedical research community has invested significant efforts in the development of biomedical semantic annotation technology. Aiming to establish grounds for further developments in this area, we review a selected set of state of the art biomedical semantic annotators, focusing particularly on general purpose annotators, that is, semantic annotation tools that can be customized to work with texts from any area of biomedicine. We also examine potential directions for further improvements of today's annotators which could make them even more capable of meeting the needs of real-world applications. To motivate and encourage further developments in this area, along the suggested and/or related directions, we review existing and potential practical applications and benefits of semantic annotators.
Collapse
Affiliation(s)
- Jelena Jovanović
- Department of Software Engineering, University of Belgrade, 154 Jove Ilica Street, Belgrade, Serbia
| | - Ebrahim Bagheri
- Department of Electrical Engineering, Ryerson University, 245 Church Street, Toronto, Canada.
| |
Collapse
|