1
|
Wang A, Liu C, Yang J, Weng C. Fine-tuning Large Language Models for Rare Disease Concept Normalization. bioRxiv 2024:2023.12.28.573586. [PMID: 38234802 PMCID: PMC10793431 DOI: 10.1101/2023.12.28.573586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Objective We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.
Collapse
Affiliation(s)
- Andy Wang
- Peddie School, Hightstown, NJ, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Cong Liu
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jingye Yang
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA, USA
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
2
|
Gérardin C, Xiong Y, Wajsbürt P, Carrat F, Tannier X. Impact of Translation on Biomedical Information Extraction: Experiment on Real-Life Clinical Notes. JMIR Med Inform 2024; 12:e49607. [PMID: 38596859 DOI: 10.2196/49607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 01/07/2024] [Accepted: 01/10/2024] [Indexed: 03/03/2024] Open
Abstract
Background Biomedical natural language processing tasks are best performed with English models, and translation tools have undergone major improvements. On the other hand, building annotated biomedical data sets remains a challenge. Objective The aim of our study is to determine whether the use of English tools to extract and normalize French medical concepts based on translations provides comparable performance to that of French models trained on a set of annotated French clinical notes. Methods We compared 2 methods: 1 involving French-language models and 1 involving English-language models. For the native French method, the named entity recognition and normalization steps were performed separately. For the translated English method, after the first translation step, we compared a 2-step method and a terminology-oriented method that performs extraction and normalization at the same time. We used French, English, and bilingual annotated data sets to evaluate all stages (named entity recognition, normalization, and translation) of our algorithms. Results The native French method outperformed the translated English method, with an overall F1-score of 0.51 (95% CI 0.47-0.55), compared with 0.39 (95% CI 0.34-0.44) and 0.38 (95% CI 0.36-0.40) for the 2 English methods tested. Conclusions Despite recent improvements in translation models, there is a significant difference in performance between the 2 approaches in favor of the native French method, which is more effective on French medical texts, even with few annotated documents.
Collapse
Affiliation(s)
- Christel Gérardin
- Institut Pierre Louis d'Epidémiologie et de Santé Publique, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale, Paris, France
| | - Yuhan Xiong
- Institut Pierre Louis d'Epidémiologie et de Santé Publique, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale, Paris, France
- Shanghai Jiaotong University, Shanghai, China
| | - Perceval Wajsbürt
- Innovation and Data Unit, Assistance Publique Hôpitaux de Paris, Paris, France
| | - Fabrice Carrat
- Institut Pierre Louis d'Epidémiologie et de Santé Publique, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale, Paris, France
- Department of Public Health, Assistance Publique Hôpitaux de Paris, Hôpital Saint-Antoine, Paris, France
| | - Xavier Tannier
- 5, Sorbonne Université, Institut National de la Santé et de la Recherche Médicale, Université Sorbonne Paris-Nord, Laboratoire d'Informatique Médicale et de Connaissance en e-Santé, Paris, France
| |
Collapse
|
3
|
Li Y, Lundin SK, Li J, Tao W, Dang Y, Chen Y, Tao C. Unpacking adverse events and associations post COVID-19 vaccination: a deep dive into vaccine adverse event reporting system data. Expert Rev Vaccines 2024; 23:53-59. [PMID: 38063069 PMCID: PMC10872386 DOI: 10.1080/14760584.2023.2292203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023]
Abstract
INTRODUCTION The rapid development of COVID-19 vaccines has provided crucial tools for pandemic control, but the occurrence of vaccine-related adverse events (AEs) underscores the need for comprehensive monitoring. METHODS This study analyzed the Vaccine Adverse Event Reporting System (VAERS) data from 2020-2022 using statistical methods such as zero-truncated Poisson regression and logistic regression to assess associations with age, gender groups, and vaccine manufacturers. RESULTS Logistic regression identified 26 System Organ Classes (SOCs) significantly associated with age and gender. Females displayed especially higher odds in SOC 19 (Pregnancy, puerperium and perinatal conditions), while males had higher odds in SOC 25 (Surgical and medical procedures). Older adults (>65) were more prone to symptoms like Cardiac disorders, whereas those aged 18-65 showed susceptibility to AEs like Skin and subcutaneous tissue disorders. Moderna and Pfizer vaccines induced fewer SOC symptoms compared to Janssen and Novavax. The zero-truncated Poisson regression model estimated an average of 4.243 symptoms per individual. CONCLUSION These findings offer vital insights into vaccine safety, guiding evidence-based vaccination strategies and monitoring programs for precise and effective outcomes.
Collapse
Affiliation(s)
- Yiming Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Sori K Lundin
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Jianfu Li
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA
| | - Wei Tao
- Department of Biostatistics & Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Yong Chen
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Cui Tao
- McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL 32224, USA
| |
Collapse
|
4
|
Cuffy C, French E, Fehrmann S, McInnes BT. Exploring Representations for Singular and Multi-Concept Relations for Biomedical Named Entity Normalization. Proc Int World Wide Web Conf 2022; 2022:823-832. [PMID: 37465200 PMCID: PMC10353314 DOI: 10.1145/3487553.3524701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/20/2023]
Abstract
Since the rise of the COVID-19 pandemic, peer-reviewed biomedical repositories have experienced a surge in chemical and disease related queries. These queries have a wide variety of naming conventions and nomenclatures from trademark and generic, to chemical composition mentions. Normalizing or disambiguating these mentions within texts provides researchers and data-curators with more relevant articles returned by their search query. Named entity normalization aims to automate this disambiguation process by linking entity mentions onto their appropriate candidate concepts within a biomedical knowledge base or ontology. We explore several term embedding aggregation techniques in addition to how the term's context affects evaluation performance. We also evaluate our embedding approaches for normalizing term instances containing one or many relations within unstructured texts.
Collapse
Affiliation(s)
- Clint Cuffy
- Virginia Commonwealth University, Richmond, Virginia, USA
| | - Evan French
- Virginia Commonwealth University, Richmond, Virginia, USA
| | | | | |
Collapse
|
5
|
Luo YF, Henry S, Wang Y, Shen F, Uzuner O, Rumshisky A. The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task on clinical concept normalization for clinical records. J Am Med Inform Assoc 2020; 27:1529-1537. [PMID: 32968800 PMCID: PMC7647359 DOI: 10.1093/jamia/ocaa106] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 05/01/2020] [Accepted: 05/14/2020] [Indexed: 01/19/2023] Open
Abstract
OBJECTIVE The 2019 National Natural language processing (NLP) Clinical Challenges (n2c2)/Open Health NLP (OHNLP) shared task track 3, focused on medical concept normalization (MCN) in clinical records. This track aimed to assess the state of the art in identifying and matching salient medical concepts to a controlled vocabulary. In this paper, we describe the task, describe the data set used, compare the participating systems, present results, identify the strengths and limitations of the current state of the art, and identify directions for future research. MATERIALS AND METHODS Participating teams were provided with narrative discharge summaries in which text spans corresponding to medical concepts were identified. This paper refers to these text spans as mentions. Teams were tasked with normalizing these mentions to concepts, represented by concept unique identifiers, within the Unified Medical Language System. Submitted systems represented 4 broad categories of approaches: cascading dictionary matching, cosine distance, deep learning, and retrieve-and-rank systems. Disambiguation modules were common across all approaches. RESULTS A total of 33 teams participated in the MCN task. The best-performing team achieved an accuracy of 0.8526. The median and mean performances among all teams were 0.7733 and 0.7426, respectively. CONCLUSIONS Overall performance among the top 10 teams was high. However, several mention types were challenging for all teams. These included mentions requiring disambiguation of misspelled words, acronyms, abbreviations, and mentions with more than 1 possible semantic type. Also challenging were complex mentions of long, multi-word terms that may require new ways of extracting and representing mention meaning, the use of domain knowledge, parse trees, or hand-crafted rules.
Collapse
Affiliation(s)
- Yen-Fu Luo
- Department of Computer Science, University of Massachusetts
Lowell, Lowell, Massachusetts, USA
| | - Sam Henry
- Department of Information Sciences and Technology, George Mason
University, Fairfax, Virginia, USA
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester,
New York, USA
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester,
New York, USA
| | - Ozlem Uzuner
- Department of Information Sciences and Technology, George Mason
University, Fairfax, Virginia, USA
- Department of Biomedical Informatics, Harvard Medical School,
Boston, Massachusetts, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts, USA
| | - Anna Rumshisky
- Department of Computer Science, University of Massachusetts
Lowell, Lowell, Massachusetts, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts
Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|
6
|
Xu D, Gopale M, Zhang J, Brown K, Begoli E, Bethard S. Unified Medical Language System resources improve sieve-based generation and Bidirectional Encoder Representations from Transformers (BERT)-based ranking for concept normalization. J Am Med Inform Assoc 2020; 27:1510-1519. [PMID: 32719838 PMCID: PMC7566510 DOI: 10.1093/jamia/ocaa080] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2020] [Revised: 03/25/2020] [Accepted: 04/27/2020] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE Concept normalization, the task of linking phrases in text to concepts in an ontology, is useful for many downstream tasks including relation extraction, information retrieval, etc. We present a generate-and-rank concept normalization system based on our participation in the 2019 National NLP Clinical Challenges Shared Task Track 3 Concept Normalization. MATERIALS AND METHODS The shared task provided 13 609 concept mentions drawn from 100 discharge summaries. We first design a sieve-based system that uses Lucene indices over the training data, Unified Medical Language System (UMLS) preferred terms, and UMLS synonyms to generate a list of possible concepts for each mention. We then design a listwise classifier based on the BERT (Bidirectional Encoder Representations from Transformers) neural network to rank the candidate concepts, integrating UMLS semantic types through a regularizer. RESULTS Our generate-and-rank system was third of 33 in the competition, outperforming the candidate generator alone (81.66% vs 79.44%) and the previous state of the art (76.35%). During postevaluation, the model's accuracy was increased to 83.56% via improvements to how training data are generated from UMLS and incorporation of our UMLS semantic type regularizer. DISCUSSION Analysis of the model shows that prioritizing UMLS preferred terms yields better performance, that the UMLS semantic type regularizer results in qualitatively better concept predictions, and that the model performs well even on concepts not seen during training. CONCLUSIONS Our generate-and-rank framework for UMLS concept normalization integrates key UMLS features like preferred terms and semantic types with a neural network-based ranking model to accurately link phrases in text to UMLS concepts.
Collapse
Affiliation(s)
- Dongfang Xu
- School of Information, University of Arizona, Tucson, Arizona, USA
| | - Manoj Gopale
- Department of Electrical and Computer Engineering, University of Arizona, Tucson, Arizona, USA
| | - Jiacheng Zhang
- Department of Computer Science, University of Arizona, Tucson, Arizona, USA
| | - Kris Brown
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Edmon Begoli
- National Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
| | - Steven Bethard
- School of Information, University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
7
|
Chen L, Fu W, Gu Y, Sun Z, Li H, Li E, Jiang L, Gao Y, Huang Y. Clinical concept normalization with a hybrid natural language processing system combining multilevel matching and machine learning ranking. J Am Med Inform Assoc 2020; 27:1576-1584. [PMID: 33029642 PMCID: PMC7647369 DOI: 10.1093/jamia/ocaa155] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Revised: 06/10/2020] [Accepted: 07/20/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Normalizing clinical mentions to concepts in standardized medical terminologies, in general, is challenging due to the complexity and variety of the terms in narrative medical records. In this article, we introduce our work on a clinical natural language processing (NLP) system to automatically normalize clinical mentions to concept unique identifier in the Unified Medical Language System. This work was part of the 2019 n2c2 (National NLP Clinical Challenges) Shared-Task and Workshop on Clinical Concept Normalization. MATERIALS AND METHODS We developed a hybrid clinical NLP system that combines a generic multilevel matching framework, customizable matching components, and machine learning ranking systems. We explored 2 machine leaning ranking systems based on either ensemble of various similarity features extracted from pretrained encoders or a Siamese attention network, targeting at efficient and fast semantic searching/ranking. Besides, we also evaluated the performance of a general-purpose clinical NLP system based on Unstructured Information Management Architecture. RESULTS The systems were evaluated as part of the 2019 n2c2 challenge, and our original best system in the challenge obtained an accuracy of 0.8101, ranked fifth in the challenge. The improved system with newly designed machine learning ranking based on Siamese attention network improved the accuracy to 0.8209. CONCLUSIONS We demonstrate the successful practice of combining multilevel matching and machine learning ranking for clinical concept normalization. Our results indicate the capability and interpretability of our proposed approach, as well as the limitation, suggesting the opportunities of achieving better performance by combining general clinical NLP systems.
Collapse
Affiliation(s)
- Long Chen
- Med Data Quest, San Diego, California, USA
| | - Wenbo Fu
- Med Data Quest, San Diego, California, USA
| | - Yu Gu
- Med Data Quest, San Diego, California, USA
| | | | - Haodan Li
- Med Data Quest, San Diego, California, USA
| | - Enyu Li
- Med Data Quest, San Diego, California, USA
| | - Li Jiang
- Med Data Quest, San Diego, California, USA
| | - Yuan Gao
- Med Data Quest, San Diego, California, USA
| | - Yang Huang
- Med Data Quest, San Diego, California, USA
| |
Collapse
|
8
|
Yuan C, Wang Y, Shang N, Li Z, Zhao R, Weng C. A graph-based method for reconstructing entities from coordination ellipsis in medical text. J Am Med Inform Assoc 2020; 27:1364-1373. [PMID: 32719840 DOI: 10.1093/jamia/ocaa109] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 04/21/2020] [Accepted: 05/12/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Coordination ellipsis is a linguistic phenomenon abound in medical text and is challenging for concept normalization because of difficulty in recognizing elliptical expressions referencing 2 or more entities accurately. To resolve this bottleneck, we aim to contribute a generalizable method to reconstruct concepts from medical coordinated elliptical expressions in a variety of biomedical corpora. MATERIALS AND METHODS We proposed a graph-based representation model and built a pipeline to reconstruct concepts from coordinated elliptical expressions in medical text (RECEEM). There are 4 modules: (1) identify all possible candidate conjunct pairs from original coordinated elliptical expressions, (2) calculate coefficients for candidate conjuncts using the embedding model, (3) select the most appropriate decompositions by global optimization, and (4) rebuild concepts based on a pathfinding algorithm. We evaluated the pipeline's performance on 2658 coordinated elliptical expressions from 3 different medical corpora (ie, biomedical literature, clinical narratives, and eligibility criteria from clinical trials). Precision, recall, and F1 score were calculated. RESULTS The F1 scores for biomedical publications, clinical narratives, and research eligibility criteria were 0.862, 0.721, and 0.870, respectively. RECEEM outperformed 2 previously released methods. By incorporating RECEEM into 2 existing NLP tools, the F1 scores increased from 0.248 to 0.460 and from 0.287 to 0.630 on concept mapping of 1125 coordination ellipses. CONCLUSIONS RECEEM improves concept normalization for medical coordinated elliptical expressions in a variety of biomedical corpora. It outperformed existing methods and significantly enhanced the performance of 2 notable NLP systems for mapping coordination ellipses in the evaluation. The algorithm is open sourced online (https://github.com/chiyuan1126/RECEEM).
Collapse
Affiliation(s)
- Chi Yuan
- Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.,Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Yongli Wang
- Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
| | - Ning Shang
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ziran Li
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Ruxin Zhao
- Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
| | - Chunhua Weng
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
9
|
Boguslav M, Cohen KB, Baumgartner WA, Hunter LE. Improving precision in concept normalization. Pac Symp Biocomput 2018; 23:566-577. [PMID: 29218915 PMCID: PMC5730334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Most natural language processing applications exhibit a trade-off between precision and recall. In some use cases for natural language processing, there are reasons to prefer to tilt that trade-off toward high precision. Relying on the Zipfian distribution of false positive results, we describe a strategy for increasing precision, using a variety of both pre-processing and post-processing methods. They draw on both knowledge-based and frequentist approaches to modeling language. Based on an existing high-performance biomedical concept recognition pipeline and a previously published manually annotated corpus, we apply this hybrid rationalist/empiricist strategy to concept normalization for eight different ontologies. Which approaches did and did not improve precision varied widely between the ontologies.
Collapse
Affiliation(s)
- Mayla Boguslav
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA compbio.ucdenver.edu,
| | | | | | | |
Collapse
|
10
|
Bashyam V, Taira RK. Indexing anatomical phrases in neuro-radiology reports to the UMLS 2005AA. AMIA Annu Symp Proc 2005; 2005:26-30. [PMID: 16778995 PMCID: PMC1560562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
This work describes a methodology to index anatomical phrases to the 2005AA release of the Unified Medical Language System (UMLS). A phrase chunking tool based on Natural Language Processing (NLP) was developed to identify semantically coherent phrases within medical reports. Using this phrase chunker, a set of 2,551 unique anatomical phrases was extracted from brain radiology reports. These phrases were mapped to the 2005AA release of the UMLS using a vector space model. Precision for the task of indexing unique phrases was 0.87.
Collapse
Affiliation(s)
- Vijayaraghavan Bashyam
- Medical Imaging Informatics (MII) Group, University of California - Los Angles (UCLA), USA.
| | | |
Collapse
|