1
|
Simple but Effective Knowledge-Based Query Reformulations for Precision Medicine Retrieval. INFORMATION 2021. [DOI: 10.3390/info12100402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
In Information Retrieval (IR), the semantic gap represents the mismatch between users’ queries and how retrieval models answer to these queries. In this paper, we explore how to use external knowledge resources to enhance bag-of-words representations and reduce the effect of the semantic gap between queries and documents. In this regard, we propose several simple but effective knowledge-based query expansion and reduction techniques, and we evaluate them for the medical domain. The query reformulations proposed are used to increase the probability of retrieving relevant documents through the addition to, or the removal from, the original query of highly specific terms. The experimental analyses on different test collections for Precision Medicine IR show the effectiveness of the developed techniques. In particular, a specific subset of query reformulations allow retrieval models to achieve top performing results in all the considered test collections.
Collapse
|
2
|
Balaneshinkordan S, Kotov A. Bayesian approach to incorporating different types of biomedical knowledge bases into information retrieval systems for clinical decision support in precision medicine. J Biomed Inform 2019; 98:103238. [DOI: 10.1016/j.jbi.2019.103238] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2018] [Revised: 06/15/2019] [Accepted: 06/21/2019] [Indexed: 10/26/2022]
|
3
|
Mu X, Lu K, Ryu H. Explicitly integrating MeSH thesaurus help into health information retrieval systems: An empirical user study. Inf Process Manag 2014. [DOI: 10.1016/j.ipm.2013.03.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
4
|
Zeng QT, Redd D, Rindflesch T, Nebeker J. Synonym, topic model and predicate-based query expansion for retrieving clinical documents. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2012; 2012:1050-1059. [PMID: 23304381 PMCID: PMC3540443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We present a study that developed and tested three query expansion methods for the retrieval of clinical documents. Finding relevant documents in a large clinical data warehouse is a challenging task. To address this issue, first, we implemented a synonym expansion strategy that used a few selected vocabularies. Second, we trained a topic model on a large set of clinical documents, which was then used to identify related terms for query expansion. Third, we obtained related terms from a large predicate database derived from Medline abstracts for query expansion. The three expansion methods were tested on a set of clinical notes. All three methods successfully achieved higher average recalls and average F-measures when compared with the baseline method. The average precisions and precision at 10, however, decreased with all expansions. Amongst the three expansion methods, the topic model-based method performed the best in terms of recall and F-measure.
Collapse
|
5
|
Chawla S. Semantic Query Expansion using Cluster Based Domain Ontologies. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH 2012. [DOI: 10.4018/ijirr.2012040102] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Information on the web has been growing at a very rapid pace and has become quite voluminous over the past few years. The users search query on the web could not retrieve sufficient relevant documents and is responsible for low precision of search results. To improve the precision of search results, an algorithm is proposed in this paper for semantic query expansion using domain ontology based on clustered web query sessions. Domain ontology is created for each cluster of query sessions. The input query of a user is used to select the most similar cluster. The domain ontology of the selected cluster is used to suggest the related concepts for query expansion and the expanded query is used for information retrieval to test its effectiveness. The experiment was conducted on the captured user query sessions on the web and results prove the efficacy of the proposed approach.
Collapse
Affiliation(s)
- Suruchi Chawla
- Department of Computer Science, Shaheed Rajguru College of Applied Science, University of Delhi, Delhi, India
| |
Collapse
|
6
|
Yoo S, Choi J. Evaluation of Term Ranking Algorithms for Pseudo-Relevance Feedback in MEDLINE Retrieval. Healthc Inform Res 2011; 17:120-30. [PMID: 21886873 PMCID: PMC3155169 DOI: 10.4258/hir.2011.17.2.120] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2011] [Accepted: 04/29/2011] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVES The purpose of this study was to investigate the effects of query expansion algorithms for MEDLINE retrieval within a pseudo-relevance feedback framework. METHODS A number of query expansion algorithms were tested using various term ranking formulas, focusing on query expansion based on pseudo-relevance feedback. The OHSUMED test collection, which is a subset of the MEDLINE database, was used as a test corpus. Various ranking algorithms were tested in combination with different term re-weighting algorithms. RESULTS Our comprehensive evaluation showed that the local context analysis ranking algorithm, when used in combination with one of the reweighting algorithms - Rocchio, the probabilistic model, and our variants - significantly outperformed other algorithm combinations by up to 12% (paired t-test; p < 0.05). In a pseudo-relevance feedback framework, effective query expansion would be achieved by the careful consideration of term ranking and re-weighting algorithm pairs, at least in the context of the OHSUMED corpus. CONCLUSIONS Comparative experiments on term ranking algorithms were performed in the context of a subset of MEDLINE documents. With medical documents, local context analysis, which uses co-occurrence with all query terms, significantly outperformed various term ranking methods based on both frequency and distribution analyses. Furthermore, the results of the experiments demonstrated that the term rank-based re-weighting method contributed to a remarkable improvement in mean average precision.
Collapse
Affiliation(s)
- Sooyoung Yoo
- Medical Information Center, Seoul National University Bundang Hospital, Seongnam, Korea
| | | |
Collapse
|
7
|
Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010; 17:229-36. [PMID: 20442139 DOI: 10.1136/jamia.2009.002733] [Citation(s) in RCA: 673] [Impact Index Per Article: 48.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022] Open
Abstract
MetaMap is a widely available program providing access to the concepts in the unified medical language system (UMLS) Metathesaurus from biomedical text. This study reports on MetaMap's evolution over more than a decade, concentrating on those features arising out of the research needs of the biomedical informatics community both within and outside of the National Library of Medicine. Such features include the detection of author-defined acronyms/abbreviations, the ability to browse the Metathesaurus for concepts even tenuously related to input text, the detection of negation in situations in which the polarity of predications is important, word sense disambiguation (WSD), and various technical and algorithmic features. Near-term plans for MetaMap development include the incorporation of chemical name recognition and enhanced WSD.
Collapse
Affiliation(s)
- Alan R Aronson
- Lister Hill National Center for Biomedical Communications, US National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
| | | |
Collapse
|
8
|
Yoo S, Choi J. On the query reformulation technique for effective MEDLINE document retrieval. J Biomed Inform 2010; 43:686-93. [PMID: 20394839 DOI: 10.1016/j.jbi.2010.04.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 03/27/2010] [Accepted: 04/08/2010] [Indexed: 11/17/2022]
Abstract
Improving the retrieval accuracy of MEDLINE documents is still a challenging issue due to low retrieval precision. Focusing on a query expansion technique based on pseudo-relevance feedback (PRF), this paper addresses the problem by systematically examining the effects of expansion term selection and adjustment of the term weights of the expanded query using a set of MEDLINE test documents called OHSUMED. Implementing a baseline information retrieval system based on the Okapi BM25 retrieval model, we compared six well-known term ranking algorithms for useful expansion term selection and then compared traditional term reweighting algorithms with our new variant of the standard Rocchio's feedback formula, which adopts a group-based weighting scheme. Our experimental results on the OHSUMED test collection showed a maximum improvement of 20.2% and 20.4% for mean average precision and recall measures over unexpanded queries when terms were expanded using a co-occurrence analysis-based term ranking algorithm in conjunction with our term reweighting algorithm (p-value<0.05). Our study shows the behaviors of different query reformulation techniques that can be utilized for more effective MEDLINE document retrieval.
Collapse
Affiliation(s)
- Sooyoung Yoo
- Medical Information Center, Seoul National University Bundang Hospital, Gyeonggi-Do, Republic of Korea
| | | |
Collapse
|
9
|
Trieschnigg D, Pezik P, Lee V, de Jong F, Kraaij W, Rebholz-Schuhmann D. MeSH Up: effective MeSH text classification for improved document retrieval. Bioinformatics 2009; 25:1412-8. [PMID: 19376821 PMCID: PMC2682526 DOI: 10.1093/bioinformatics/btp249] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2008] [Revised: 04/02/2009] [Accepted: 04/07/2009] [Indexed: 11/27/2022] Open
Abstract
MOTIVATION Controlled vocabularies such as the Medical Subject Headings (MeSH) thesaurus and the Gene Ontology (GO) provide an efficient way of accessing and organizing biomedical information by reducing the ambiguity inherent to free-text data. Different methods of automating the assignment of MeSH concepts have been proposed to replace manual annotation, but they are either limited to a small subset of MeSH or have only been compared with a limited number of other systems. RESULTS We compare the performance of six MeSH classification systems [MetaMap, EAGL, a language and a vector space model-based approach, a K-Nearest Neighbor (KNN) approach and MTI] in terms of reproducing and complementing manual MeSH annotations. A KNN system clearly outperforms the other published approaches and scales well with large amounts of text using the full MeSH thesaurus. Our measurements demonstrate to what extent manual MeSH annotations can be reproduced and how they can be complemented by automatic annotations. We also show that a statistically significant improvement can be obtained in information retrieval (IR) when the text of a user's query is automatically annotated with MeSH concepts, compared to using the original textual query alone. CONCLUSIONS The annotation of biomedical texts using controlled vocabularies such as MeSH can be automated to improve text-only IR. Furthermore, the automatic MeSH annotation system we propose is highly scalable and it generates improvements in IR comparable with those observed for manual annotations.
Collapse
|
10
|
Fiszman M, Ortiz E, Bray BE, Rindflesch TC. Semantic processing to support clinical guideline development. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:187-191. [PMID: 18999127 PMCID: PMC2656081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/16/2008] [Indexed: 05/27/2023]
Abstract
Clinical practice guidelines are one of the main resources for communicating evidence-based practice to health professionals. During guideline development, questions that express a knowledge gap are answered by finding relevant citations in MEDLINE and other biomedical databases. Determining citation relevance involves extensive manual review. We propose an automated method for finding relevant citations based on guideline question classification, semantic processing, and rules that match question classes with semantic predications. In this initial study, we focused on a pediatric cardiovascular risk factor guideline. The overall performance of the system was 40% recall, 88% precision (F0.5-score 0.71), and 98% specificity. We show that relevant and nonrelevant citations have clinically different semantic characteristics and suggest that this method has the potential to improve the efficiency of the literature review process in guideline development.
Collapse
|
11
|
Fiszman M, Demner-Fushman D, Kilicoglu H, Rindflesch TC. Automatic summarization of MEDLINE citations for evidence-based medical treatment: a topic-oriented evaluation. J Biomed Inform 2008; 42:801-13. [PMID: 19022398 DOI: 10.1016/j.jbi.2008.10.002] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Revised: 09/30/2008] [Accepted: 10/15/2008] [Indexed: 11/18/2022]
Abstract
As the number of electronic biomedical textual resources increases, it becomes harder for physicians to find useful answers at the point of care. Information retrieval applications provide access to databases; however, little research has been done on using automatic summarization to help navigate the documents returned by these systems. After presenting a semantic abstraction automatic summarization system for MEDLINE citations, we concentrate on evaluating its ability to identify useful drug interventions for 53 diseases. The evaluation methodology uses existing sources of evidence-based medicine as surrogates for a physician-annotated reference standard. Mean average precision (MAP) and a clinical usefulness score developed for this study were computed as performance metrics. The automatic summarization system significantly outperformed the baseline in both metrics. The MAP gain was 0.17 (p<0.01) and the increase in the overall score of clinical usefulness was 0.39 (p<0.05).
Collapse
Affiliation(s)
- Marcelo Fiszman
- National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bldg 38A, Rm B1N-28J, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|
12
|
Stokes N, Li Y, Cavedon L, Zobel J. Exploring criteria for successful query expansion in the genomic domain. INFORM RETRIEVAL J 2008. [DOI: 10.1007/s10791-008-9073-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
13
|
Moskovitch R, Shahar Y. Vaidurya: a multiple-ontology, concept-based, context-sensitive clinical-guideline search engine. J Biomed Inform 2008; 42:11-21. [PMID: 18721900 DOI: 10.1016/j.jbi.2008.07.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2007] [Revised: 07/09/2008] [Accepted: 07/15/2008] [Indexed: 11/19/2022]
Abstract
We designed and implemented a generic search engine (Vaidurya), as part of our Digital clinical-Guideline Library (DeGeL) framework. Two search methods were implemented in addition to full-text search: (1) concept-based search, which relies on pre-indexing the guidelines in a clinically meaningful fashion, and (2) context-sensitive search, which relies on first semi-structuring the guidelines according to a given ontology, then searching for terms within specific labeled text segments. The Vaidurya engine is fully functional and is used within the DeGeL system. We describe the Vaidurya ontological and algorithmic framework; we also briefly summarize the results of a detailed evaluation in the clinical-guideline domain, demonstrating that both concept-based and context-sensitive ontology-independent search are highly feasible and significantly improve on free text search retrieval performance. We conclude by analyzing the limitations and advantages of the approach, and the steps that we have started to take to extend it based on user feedback.
Collapse
Affiliation(s)
- Robert Moskovitch
- Medical Informatics Research Center, Department of Information Systems Engineering, Ben Gurion University, P.O. Box 653, Beer Sheva 84105, Israel.
| | | |
Collapse
|
14
|
Bodenreider O. Biomedical ontologies in action: role in knowledge management, data integration and decision support. Yearb Med Inform 2008:67-79. [PMID: 18660879 PMCID: PMC2592252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023] Open
Abstract
OBJECTIVES To provide typical examples of biomedical ontologies in action, emphasizing the role played by biomedical ontologies in knowledge management, data integration and decision support. METHODS Biomedical ontologies selected for their practical impact are examined from a functional perspective. Examples of applications are taken from operational systems and the biomedical literature, with a bias towards recent journal articles. RESULTS The ontologies under investigation in this survey include SNOMED CT, the Logical Observation Identifiers, Names, and Codes (LOINC), the Foundational Model of Anatomy, the Gene Ontology, RxNorm, the National Cancer Institute Thesaurus, the International Classification of Diseases, the Medical Subject Headings (MeSH) and the Unified Medical Language System (UMLS). The roles played by biomedical ontologies are classified into three major categories: knowledge management (indexing and retrieval of data and information, access to information, mapping among ontologies); data integration, exchange and semantic interoperability; and decision support and reasoning (data selection and aggregation, decision support, natural language processing applications, knowledge discovery). CONCLUSIONS Ontologies play an important role in biomedical research through a variety of applications. While ontologies are used primarily as a source of vocabulary for standardization and integration purposes, many applications also use them as a source of computable knowledge. Barriers to the use of ontologies in biomedical applications are discussed.
Collapse
Affiliation(s)
- O Bodenreider
- National Library of Medicine, 8600 Rockville Pike - MS 3841 (Bldg 38A, Rm B1N28U), Bethesda, MD 20894, USA.
| |
Collapse
|
15
|
Sneiderman CA, Demner-Fushman D, Fiszman M, Ide NC, Rindflesch TC. Knowledge-based methods to help clinicians find answers in MEDLINE. J Am Med Inform Assoc 2007; 14:772-80. [PMID: 17712086 PMCID: PMC2213491 DOI: 10.1197/jamia.m2407] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVES Large databases of published medical research can support clinical decision making by providing physicians with the best available evidence. The time required to obtain optimal results from these databases using traditional systems often makes accessing the databases impractical for clinicians. This article explores whether a hybrid approach of augmenting traditional information retrieval with knowledge-based methods facilitates finding practical clinical advice in the research literature. DESIGN Three experimental systems were evaluated for their ability to find MEDLINE citations providing answers to clinical questions of different complexity. The systems (SemRep, Essie, and CQA-1.0), which rely on domain knowledge and semantic processing to varying extents, were evaluated separately and in combination. Fifteen therapy and prevention questions in three categories (general, intermediate, and specific questions) were searched. The first 10 citations retrieved by each system were randomized, anonymized, and evaluated on a three-point scale. The reasons for ratings were documented. MEASUREMENTS Metrics evaluating the overall performance of a system (mean average precision, binary preference) and metrics evaluating the number of relevant documents in the first several presented to a physician were used. RESULTS Scores (mean average precision = 0.57, binary preference = 0.71) for fusion of the retrieval results of the three systems are significantly (p < 0.01) better than those for any individual system. All three systems present three to four relevant citations in the first five for any question type. CONCLUSION The improvements in finding relevant MEDLINE citations due to knowledge-based processing show promise in assisting physicians to answer questions in clinical practice.
Collapse
|
16
|
Ide NC, Loane RF, Demner-Fushman D. Essie: a concept-based search engine for structured biomedical text. J Am Med Inform Assoc 2007; 14:253-63. [PMID: 17329729 PMCID: PMC2244877 DOI: 10.1197/jamia.m2233] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
This article describes the algorithms implemented in the Essie search engine that is currently serving several Web sites at the National Library of Medicine. Essie is a phrase-based search engine with term and concept query expansion and probabilistic relevancy ranking. Essie's design is motivated by an observation that query terms are often conceptually related to terms in a document, without actually occurring in the document text. Essie's performance was evaluated using data and standard evaluation methods from the 2003 and 2006 Text REtrieval Conference (TREC) Genomics track. Essie was the best-performing search engine in the 2003 TREC Genomics track and achieved results comparable to those of the highest-ranking systems on the 2006 TREC Genomics track task. Essie shows that a judicious combination of exploiting document structure, phrase searching, and concept based query expansion is a useful approach for information retrieval in the biomedical domain.
Collapse
Affiliation(s)
- Nicholas C Ide
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
17
|
Zhou W, Smalheiser NR, Yu C. A tutorial on information retrieval: basic terms and concepts. JOURNAL OF BIOMEDICAL DISCOVERY AND COLLABORATION 2006; 1:2. [PMID: 16722601 PMCID: PMC1459215 DOI: 10.1186/1747-5333-1-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2005] [Accepted: 03/13/2006] [Indexed: 11/18/2022]
Abstract
This informal tutorial is intended for investigators and students who would like to understand the workings of information retrieval systems, including the most frequently used search engines: PubMed and Google. Having a basic knowledge of the terms and concepts of information retrieval should improve the efficiency and productivity of searches. As well, this knowledge is needed in order to follow current research efforts in biomedical information retrieval and text mining that are developing new systems not only for finding documents on a given topic, but extracting and integrating knowledge across documents.
Collapse
Affiliation(s)
- Wei Zhou
- Department of Computer Science, University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA
| | - Neil R Smalheiser
- Department of Psychiatry and Psychiatric Institute, MC912, University of Illinois at Chicago, Chicago, IL 60612, USA
| | - Clement Yu
- Department of Computer Science, University of Illinois at Chicago, 851 South Morgan Street, Chicago, IL 60607, USA
| |
Collapse
|
18
|
Zeng QT, Crowell J, Plovnick RM, Kim E, Ngo L, Dibble E. Assisting consumer health information retrieval with query recommendations. J Am Med Inform Assoc 2005; 13:80-90. [PMID: 16221944 PMCID: PMC1380203 DOI: 10.1197/jamia.m1820] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. DESIGN We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. MEASUREMENTS An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. RESULTS The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16-2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. CONCLUSION Providing semantic-distance-based query recommendations can help consumers with query formation during HIR.
Collapse
Affiliation(s)
- Qing T Zeng
- Department of Radiology, Decision Systems Group, Thorn 309, Brigham and Women's Hospital, Harvard Medical School, 75 Francis St, Boston, MA 02115, USA.
| | | | | | | | | | | |
Collapse
|
19
|
Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17-21. [PMID: 11825149 PMCID: PMC2243666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
The UMLS Metathesaurus, the largest thesaurus in the biomedical domain, provides a representation of biomedical knowledge consisting of concepts classified by semantic type and both hierarchical and non-hierarchical relationships among the concepts. This knowledge has proved useful for many applications including decision support systems, management of patient records, information retrieval (IR) and data mining. Gaining effective access to the knowledge is critical to the success of these applications. This paper describes MetaMap, a program developed at the National Library of Medicine (NLM) to map biomedical text to the Metathesaurus or, equivalently, to discover Metathesaurus concepts referred to in text. MetaMap uses a knowledge intensive approach based on symbolic, natural language processing (NLP) and computational linguistic techniques. Besides being applied for both IR and data mining applications, MetaMap is one of the foundations of NLM's Indexing Initiative System which is being applied to both semi-automatic and fully automatic indexing of the biomedical literature at the library.
Collapse
Affiliation(s)
- A R Aronson
- National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| |
Collapse
|
20
|
Nadkarni P, Chen R, Brandt C. UMLS concept indexing for production databases: a feasibility study. J Am Med Inform Assoc 2001; 8:80-91. [PMID: 11141514 PMCID: PMC134593 DOI: 10.1136/jamia.2001.0080080] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVES To explore the feasibility of using the National Library of Medicine's Unified Medical Language System (UMLS) Metathesaurus as the basis for a computational strategy to identify concepts in medical narrative text preparatory to indexing. To quantitatively evaluate this strategy in terms of true positives, false positives (spuriously identified concepts) and false negatives (concepts missed by the identification process). METHODS Using the 1999 UMLS Metathesaurus, the authors processed a training set of 100 documents (50 discharge summaries, 50 surgical notes) with a concept-identification program, whose output was manually analyzed. They flagged concepts that were erroneously identified and added new concepts that were not identified by the program, recording the reason for failure in such cases. After several refinements to both their algorithm and the UMLS subset on which it operated, they deployed the program on a test set of 24 documents (12 of each kind). RESULTS Of 8,745 matches in the training set, 7,227 (82.6 percent ) were true positives, whereas of 1,701 matches in the test set, 1, 298 (76.3 percent) were true positives. Matches other than true positive indicated potential problems in production-mode concept indexing. Examples of causes of problems were redundant concepts in the UMLS, homonyms, acronyms, abbreviations and elisions, concepts that were missing from the UMLS, proper names, and spelling errors. CONCLUSIONS The error rate was too high for concept indexing to be the only production-mode means of preprocessing medical narrative. Considerable curation needs to be performed to define a UMLS subset that is suitable for concept matching.
Collapse
Affiliation(s)
- P Nadkarni
- Yale University School of Medicine, New Haven, CT 06520-8009, USA.
| | | | | |
Collapse
|
21
|
Srinivasan P. MeSHmap: a text mining tool for MEDLINE. Proc AMIA Symp 2001:642-6. [PMID: 11825264 PMCID: PMC2243391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
Our research goal is to explore text mining from the metadata included in MEDLINE documents. We present MeSHmap our prototype text mining system that exploits the MeSH indexing accompanying MEDLINE records. MeSHmap supports searches via PubMed followed by user driven exploration of the MeSH terms and subheadings in the retrieved set. The potential of the system goes beyond text retrieval. It may also be used to compare entities of the same type such as pairs of drugs or pairs of procedures etc. In addition there is the potential to generate maps of entities (drugs or diseases etc.) such that the strength of the link between two entities in the map represents their similarity as expressed in the MeSH metadata of the MEDLINE documents. Higher level operators have been proposed to support these comparison and mapping functions. This paper motivates and describes MeSHmap. Future work will include user evaluations of the system.
Collapse
Affiliation(s)
- P Srinivasan
- School of Library & Information Science, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
22
|
Wilcox A, Hripcsak G. Classification algorithms applied to narrative reports. Proc AMIA Symp 1999:455-9. [PMID: 10566400 PMCID: PMC2232569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
Narrative text reports represent a significant source of clinical data. However, the information stored in these reports is inaccessible to many automated decision support systems. Data mining techniques can assist in extracting information from narrative data. Multiple classification methods, such as rule generation, decision trees, Bayesian classifiers, and information retrieval were used to classify a set of 200 chest X-ray reports according to 6 clinical conditions indicated. A general-purpose natural language processor was used to convert the narrative text into a coded form that could be used by the classification algorithms. Significant differences in performance were found between algorithms. The best performing algorithm applied to the processor output was significantly better than information retrieval applied to raw text. Predictor variables from the coded processor output were limited to avoid overfitting. Methods that limited by domain knowledge performed significantly better than those that limited by conditional probabilities of the variables in the training set. Algorithms were also shown to be dependent on training set size.
Collapse
|
23
|
Westberg EE, Miller RA. The basis for using the Internet to support the information needs of primary care. J Am Med Inform Assoc 1999; 6:6-25. [PMID: 9925225 PMCID: PMC61341 DOI: 10.1136/jamia.1999.0060006] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/1998] [Accepted: 09/22/1998] [Indexed: 11/03/2022] Open
Abstract
Synthesizing the state of the art from the published literature, this review assesses the basis for employing the Internet to support the information needs of primary care. The authors survey what has been published about the information needs of clinical practice, including primary care, and discuss currently available information resources potentially relevant to primary care. Potential methods of linking information needs with appropriate information resources are described in the context of previous classifications of clinical information needs. Also described is the role that existing terminology mapping systems, such as the National Library of Medicine's Unified Medical Language System, may play in representing and linking information needs to answers.
Collapse
Affiliation(s)
- E E Westberg
- Vanderbilt University, Nashville, Tennessee 37232-8340, USA.
| | | |
Collapse
|
24
|
Wilcox A, Hripcsak G, Johnson SB, Hwang JJ, Wu M. Developing online support for clinical information system developers: the FAQ approach. COMPUTERS AND BIOMEDICAL RESEARCH, AN INTERNATIONAL JOURNAL 1998; 31:112-21. [PMID: 9570902 DOI: 10.1006/cbmr.1998.1470] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
OBJECTIVE We investigate a knowledge-based help system for developers of an integrated clinical information system (CIS). The first objective in the study was to determine the system's ability to answer users' questions effectively. User performance and behavior were studied. The second objective was to evaluate the effect of using questions and answers to augment or replace traditional program documentation. DESIGN A comparative study of user and system effectiveness using a collection of 47 veritable questions regarding the CIS, solicited from various CIS developers, is conducted. Most questions were concerning the clinical data model and acquiring the data. MEASUREMENTS Answers using current documentation known by users were compared to answers found using the help system. Answers existing within traditional documentation were compared to answers existing within question-answer exchanges (Q-A's). RESULTS The support system augmented 39% of users' answers to test questions. Though the Q-A's were less than 5% of the total documentation collected, these files contained answers to nearly 50% of the questions in the test group. The rest of the documentation contained about 75% of the answers. CONCLUSIONS A knowledge-based help system built by collecting questions and answers can be a viable alternative to large documentation files, providing the questions and answers can be collected effectively.
Collapse
Affiliation(s)
- A Wilcox
- Department of Medical Informatics, Columbia University, New York, USA
| | | | | | | | | |
Collapse
|
25
|
Aronson AR, Rindflesch TC. Query expansion using the UMLS Metathesaurus. PROCEEDINGS : A CONFERENCE OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION. AMIA FALL SYMPOSIUM 1997:485-9. [PMID: 9357673 PMCID: PMC2233565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Recent work has demonstrated the importance of query expansion for improving retrieval effectiveness when applying statistically-based systems to MEDLINE citations. The research has suggested the use of retrieval feedback for enhancing the original text of users' queries. As an alternative method of query expansion, we propose the use of the MetaMap program for associating UMLS Metathesaurus concepts with the original query. Our experiments show that query expansion based on MetaMap compares favorably with retrieval feedback. We conclude that the optimal strategy would be to combine the two techniques.
Collapse
Affiliation(s)
- A R Aronson
- National Library of Medicine, Bethesda, MD 20894, USA
| | | |
Collapse
|
26
|
|