51
|
A Survey of Direct Users and Uses of SNOMED CT: 2010 Status. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2010; 2010:207-211. [PMID: 21346970 PMCID: PMC3041279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
SNOMED CT is gaining momentum and endorsements as an international clinical terminology. However, many vendors await a clearer business case and clients' demand. We conducted a survey of direct users of SNOMED CT to determine the current profile of users, modes of use, and attitudes towards different aspects of the terminology. A web-base survey, consisting of 43 questions was distributed in January 2010, and 215 responses were elicited. This paper summarizes findings regarding profiles of users and their SNOMED CT use. The results indicate significant use by non-researchers and by industry and government sectors. Many users are relative newcomers with less than 3 years experience with SNOMED CT, and production-related use was reported by 39% of respondents. Most users are satisfied with the level of content coverage. The results indicate that SNOMED CT has a solid footing in production systems, and that SCT is mostly used for concept searches and clinical coding.
Collapse
|
52
|
Source authenticity in the UMLS--a case study of the Minimal Standard Terminology. J Biomed Inform 2010; 43:988-97. [PMID: 20692366 DOI: 10.1016/j.jbi.2010.07.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 07/14/2010] [Accepted: 07/25/2010] [Indexed: 10/24/2022]
Abstract
As the UMLS integrates multiple source vocabularies, the integration process requires that certain adaptation be applied to the source. Our interest is in examining the relationship between the UMLS representation of a source vocabulary and the source vocabulary itself. We investigated the integration of the Minimal Standard Terminology (MST) into the UMLS in order to examine how close its UMLS representation is to the source MST. The MST was conceived as a "minimal" list of terms and structure intended for use within computer systems to facilitate standardized reporting of gastrointestinal endoscopic examinations. Although the MST has an overall schema and implied relationship structure, many of the UMLS integrated MST terms were found to be hierarchically orphaned, and with lateral relationships that do not closely adhere to the source MST. Thus, the MST representation within the UMLS significantly differs from that of the source MST. These representation discrepancies may affect the usability of the MST representation in the UMLS for knowledge acquisition. Furthermore, they pose a problem from the perspective of application developers. While these findings may not necessarily apply to other source terminologies, they highlight the conflict between preservation of authentic concept orientation and the UMLS overall desire to provide fully specified names for all source terms.
Collapse
|
53
|
Abstract
Gene terminologies are playing an increasingly important role in the ever-growing field of genomic research. While errors in large, complex terminologies are inevitable, gene terminologies are even more susceptible to them due to the rapid growth of genomic knowledge and the nature of its discovery. It is therefore very important to establish quality-assurance protocols for such genomic-knowledge repositories. Different kinds of terminologies oftentimes require auditing methodologies adapted to their particular structures. In light of this, an auditing methodology tailored to the characteristics of the NCI Thesaurus’s (NCIT’s) Gene hierarchy is presented. The Gene hierarchy is of particular interest to the NCIT’s designers due to the primary role of genomics in current cancer research. This multiphase methodology focuses on detecting role-errors, such as missing roles or roles with incorrect or incomplete target structures, occurring within that hierarchy. The methodology is based on two kinds of abstraction networks, called taxonomies, that highlight the role distribution among concepts within the IS-A (subsumption) hierarchy. These abstract views tend to highlight portions of the hierarchy having a higher concentration of errors. The errors found during an application of the methodology are reported. Hypotheses pertaining to the efficacy of our methodology are investigated.
Collapse
|
54
|
Auditing SNOMED relationships using a converse abstraction network. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:685-689. [PMID: 20351941 PMCID: PMC2815489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
In SNOMED CT, a given kind of attribute relationship is defined between two hierarchies, a source and a target. Certain hierarchies (or subhierarchies) serve only as targets, with no outgoing relationships of their own. However, converse relationships-those pointing in a direction opposite to the defined relationships-while not explicitly represented in SNOMED's inferred view, can be utilized in forming an alternative view of a source. In particular, they can help shed light on a source hierarchy's overall relationship structure. Toward this end, an abstraction network, called the converse abstraction network (CAN), derived automatically from a given SNOMED hierarchy is presented. An auditing methodology based on the CAN is formulated. The methodology is applied to SNOMED's Device subhierarchy and the related device relationships of the Procedure hierarchy. The results indicate that the CAN is useful in finding opportunities for refining and improving SNOMED.
Collapse
|
55
|
Comparing inconsistent relationship configurations indicating UMLS errors. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:193-7. [PMID: 20351848 PMCID: PMC2815406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The goal of this paper is to audit null-annotated parent-child pairs in the UMLS Metathesaurus. We have developed techniques for identifying suspicious pairs with high likelihood of errors by using inconsistencies between the hierarchical relationships of the Metathesaurus and the Semantic Network. Two formal conditions, called semantic inversion and lack of ancestry are investigated. Analyzing two corresponding samples shows that semantic inversion is significantly more likely to indicate an error than lack of ancestry, which in turn is more likely to indicate errors than a consistent configuration. We also discuss cases of parent-child pairs with semantic inversion that may be corrected by disambiguating the child.
Collapse
|
56
|
Special issue on auditing of terminologies. J Biomed Inform 2009; 42:407-11. [PMID: 19465342 DOI: 10.1016/j.jbi.2009.04.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2009] [Revised: 04/28/2009] [Accepted: 04/28/2009] [Indexed: 10/20/2022]
|
57
|
Abstract
The UMLS's integration of more than 100 source vocabularies, not necessarily consistent with one another, causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports "neighborhood-based" auditing, where, at any given time, an auditor concentrates on a single-focus concept and one of a variety of neighborhoods of its closely related concepts. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings. The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a group of case studies. Its impact is tested with a study involving a select group of auditors.
Collapse
|
58
|
Expanding the extent of a UMLS semantic type via group neighborhood auditing. J Am Med Inform Assoc 2009; 16:746-57. [PMID: 19567802 DOI: 10.1197/jamia.m2951] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Each Unified Medical Language System (UMLS) concept is assigned one or more semantic types (ST). A dynamic methodology for aiding an auditor in finding concepts that are missing the assignment of a given ST, S is presented. DESIGN The first part of the methodology exploits the previously introduced Refined Semantic Network and accompanying refined semantic types (RST) to help narrow the search space for offending concepts. The auditing is focused in a neighborhood surrounding the extent of an RST, T (of S) called an envelope, consisting of parents and children of concepts in the extent. The audit moves outward as long as missing assignments are discovered. In the second part, concepts not reached previously are processed and reassigned T as needed during the processing of S's other RSTs. The set of such concepts is expanded in a similar way to that in the first part. MEASUREMENTS The number of errors discovered is reported. To measure the methodology's efficiency, "error hit rates" (i.e., errors found in concepts examined) are computed. RESULTS The methodology was applied to three STs: Experimental Model of Disease (EMD), Environmental Effect of Humans, and Governmental or Regulatory Activity. The EMD experienced the most drastic change. For its RST "EMD intersection Neoplastic Process" (RST "EMD") with only 33 (31) original concepts, 915 (134) concepts were found by the first (second) part to be missing the EMD assignment. Changes to the other two STs were smaller. CONCLUSION The results show that the proposed auditing methodology can help to effectively and efficiently identify concepts lacking the assignment of a particular semantic type.
Collapse
|
59
|
Using WordNet synonym substitution to enhance UMLS source integration. Artif Intell Med 2009; 46:97-109. [PMID: 19117739 PMCID: PMC2755556 DOI: 10.1016/j.artmed.2008.11.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2008] [Revised: 08/15/2008] [Accepted: 11/09/2008] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Synonym-substitution algorithms have been developed for the purpose of matching source vocabulary terms with existing Unified Medical Language System (UMLS) terms during the integration process. A drawback is the possible explosion in the number of newly generated (potential) synonyms, which can tax computational and expert review resources. Experiments are run using a synonym-substitution approach based on WordNet to see how constraining two methodological parameters, namely, "maximum number of substitutions per term" and "maximum term length," affects performance. Our hypothesis is that these values can be constrained rather tightly--thus greatly speeding up the methodology--without a marked decline in the additional matches produced. Furthermore, we investigate whether a limitation on only the first of the two parameters is sufficient to achieve the same results. METHODS A four-stage synonym-substitution methodology using WordNet is presented. A group of experiments is carried out in which the two methodological parameters "maximum number of substitutions per term" and "maximum term length" are varied. The purpose is to examine their effect on the growth in the number of potential synonyms generated and the associated loss of results. The experiments are based on the re-integration of the "Minimal Standard Terminology" (MST) into the UMLS. Synonym-substitution matches found to be inconsistent with the current content of the UMLS and thus deemed to be incorrect are further manually scrutinized as an audit of the original integration of the MST. RESULTS An increase of 11% in the number of "MST term/UMLS term" matches was achieved using the synonym-substitution methodology. Importantly, this result prevailed when tight threshold values (such as a maximum of two synonym substitutions per term) were imposed on the parameters. Furthermore, it was found that limiting only the "maximum number of substitutions per term" parameter was sufficient to obtain the performance enhancement. During the additional audit phase, a number of the reported mismatches were actually seen to be correct, representing an additional 10% increase in the number of matches obtained. CONCLUSION A synonym-substitution methodology that utilizes WordNet is a useful automated aide in UMLS source integration. Experiments showed that there was a significant speed-up but no degradation in match results when the methodology's "maximum number of substitutions per term" parameter was relatively tightly constrained. The methodology also helped to discover errors in the MST's original integration, and improve the quality of the UMLS's conceptual content.
Collapse
|
60
|
Structural group-based auditing of missing hierarchical relationships in UMLS. J Biomed Inform 2009; 42:452-67. [PMID: 18824248 PMCID: PMC2714188 DOI: 10.1016/j.jbi.2008.08.006] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2008] [Revised: 08/13/2008] [Accepted: 08/14/2008] [Indexed: 11/23/2022]
Abstract
The Metathesaurus of the UMLS was created by integrating various source terminologies. The inter-concept relationships were either integrated into the UMLS from the source terminologies or specially generated. Due to the extensive size and inherent complexity of the Metathesaurus, the accidental omission of some hierarchical relationships was inevitable. We present a recursive procedure which allows a human expert, with the support of an algorithm, to locate missing hierarchical relationships. The procedure starts with a group of concepts with exactly the same (correct) semantic type assignments. It then partitions the concepts, based on child-of hierarchical relationships, into smaller, singly rooted, hierarchically connected subgroups. The auditor only needs to focus on the subgroups with very few concepts and their concepts with semantic type reassignments. The procedure was evaluated by comparing it with a comprehensive manual audit and it exhibits a perfect error recall.
Collapse
|
61
|
Structural group auditing of a UMLS semantic type’s extent. J Biomed Inform 2009; 42:41-52. [DOI: 10.1016/j.jbi.2008.06.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2007] [Revised: 06/09/2008] [Accepted: 06/09/2008] [Indexed: 11/24/2022]
|
62
|
Automated comparative auditing of NCIT genomic roles using NCBI. J Biomed Inform 2008; 41:904-13. [PMID: 18486558 PMCID: PMC2630966 DOI: 10.1016/j.jbi.2008.03.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Revised: 03/20/2008] [Accepted: 03/21/2008] [Indexed: 11/18/2022]
Abstract
Biomedical research has identified many human genes and various knowledge about them. The National Cancer Institute Thesaurus (NCIT) represents such knowledge as concepts and roles (relationships). Due to the rapid advances in this field, it is to be expected that the NCIT's Gene hierarchy will contain role errors. A comparative methodology to audit the Gene hierarchy with the use of the National Center for Biotechnology Information's (NCBI's) Entrez Gene database is presented. The two knowledge sources are accessed via a pair of Web crawlers to ensure up-to-date data. Our algorithms then compare the knowledge gathered from each, identify discrepancies that represent probable errors, and suggest corrective actions. The primary focus is on two kinds of gene-roles: (1) the chromosomal locations of genes, and (2) the biological processes in which genes play a role. Regarding chromosomal locations, the discrepancies revealed are striking and systematic, suggesting a structurally common origin. In regard to the biological processes, difficulties arise because genes frequently play roles in multiple processes, and processes may have many designations (such as synonymous terms). Our algorithms make use of the roles defined in the NCIT Biological Process hierarchy to uncover many probable gene-role errors in the NCIT. These results show that automated comparative auditing is a promising technique that can identify a large number of probable errors and corrections for them in a terminological genomic knowledge repository, thus facilitating its overall maintenance.
Collapse
|
63
|
Complexity measures to track the evolution of a SNOMED hierarchy. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:778-782. [PMID: 18998922 PMCID: PMC2655969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/14/2008] [Revised: 07/08/2008] [Indexed: 05/27/2023]
Abstract
SNOMED CT is an extensive terminology with an attendant amount of complexity. Two measures are proposed for quantifying that complexity. Both are based on abstraction networks, called the area taxonomy and the partial-area taxonomy, that provide, for example, distributions of the relationships within a SNOMED hierarchy. The complexity measures are employed specifically to track the complexity of versions of the Specimen hierarchy of SNOMED before and after it is put through an auditing process. The pre-audit and post-audit versions are compared. The results show that the auditing process indeed leads to a simplification of the terminology's structure.
Collapse
|
64
|
Auditing complex concepts in overlapping subsets of SNOMED. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2008; 2008:273-277. [PMID: 18998838 PMCID: PMC2656006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/12/2008] [Revised: 07/07/2008] [Indexed: 05/27/2023]
Abstract
Limited resources and the sheer volume of concepts make auditing a large terminology, such as SNOMED CT, a daunting task. It is essential to devise techniques that can aid an auditor by automatically identifying concepts that deserve attention. A methodology for this purpose based on a previously introduced abstraction network (called the p-area taxonomy) for a SNOMED CT hierarchy is presented. The methodology algorithmically gathers concepts appearing in certain overlapping subsets, defined exclusively with respect to the p-area taxonomy, for review. The results of applying the methodology to SNOMED's Specimen hierarchy are presented. These results are compared against a control sample composed of concepts residing in subsets without the overlaps. With the use of the double bootstrap, the concept group produced by our methodology is shown to yield a statistically significant higher proportion of error discoveries.
Collapse
|
65
|
Modeling multi-typed structurally viewed chemicals with the UMLS Refined Semantic Network. J Am Med Inform Assoc 2008; 16:116-31. [PMID: 18952946 DOI: 10.1197/jamia.m2604] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Chemical concepts assigned multiple "Chemical Viewed Structurally" semantic types (STs) in the Unified Medical Language System (UMLS) are subject to ambiguous interpretation. The multiple assignments may denote the fact that a specific represented chemical (combination) is a conjugate, derived via a chemical reaction of chemicals of the different types, or a complex, composed of a mixture of such chemicals. The previously introduced Refined Semantic Network (RSN) is modified to properly model these varied multi-typed chemical combinations. DESIGN The RSN was previously introduced as an enhanced abstraction of the UMLS's concepts. It features new types, called intersection semantic types (ISTs), each of which explicitly captures a unique combination of ST assignments in one abstract unit. The ambiguous ISTs of different "Chemical Viewed Structurally" ISTs of the RSN are replaced with two varieties of new types, called conjugate types and complex types, which explicitly denote the nature of the chemical interactions. Additional semantic relationships help further refine that new portion of the RSN rooted at the ST "Chemical Viewed Structurally." MEASUREMENTS The number of new conjugate and complex types and the amount of changes to the type assignment of chemical concepts are presented. RESULTS The modified RSN, consisting of 35 types and featuring 22 new conjugate and complex types, is presented. A total of 800 (about 98%) chemical concepts representing multi-typed chemical combinations from "Chemical Viewed Structurally" STs are uniquely assigned one of the new types. An additional benefit is the identification of a number of illegal ISTs and ST assignment errors, some of which are direct violations of exclusion rules defined by the UMLS Semantic Network. CONCLUSION The modified RSN provides an enhanced abstract view of the UMLS's chemical content. Its array of conjugate and complex types provides a more accurate model of the variety of combinations involving chemicals viewed structurally. This framework will help streamline the process of type assignments for such chemical concepts and improve user orientation to the richness of the chemical content of the UMLS.
Collapse
|
66
|
Comparing and consolidating two heuristic metaschemas. J Biomed Inform 2008; 41:293-317. [DOI: 10.1016/j.jbi.2007.11.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2007] [Revised: 10/09/2007] [Accepted: 11/07/2007] [Indexed: 11/29/2022]
|
67
|
Analysis of error concentrations in SNOMED. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007; 2007:314-318. [PMID: 18693849 PMCID: PMC2655786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/15/2007] [Revised: 07/13/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
Two high-level abstraction networks for the knowledge content of a terminology, known respectively as the "area taxonomy" and "p-area taxonomy," have previously been defined. Both are derived automatically from partitions of the terminology's concepts. An important application of these networks is in auditing, where a number of systematic regimens have been formulated utilizing them. In particular, the taxonomies tend to highlight certain kinds of concept groups where errors are more likely to be found. Using results garnered from applications of our auditing regimens to SNOMED CT, an investigation into the concentration of errors among such groups is carried out. Three hypotheses pertaining to the error distributions are put forth. The results support the fact that certain groups presented by the taxonomies show higher error percentages as compared to other groups. The bootstrap is used to assess their statistical significance. This knowledge will help direct auditing efforts to increase their impact.
Collapse
|
68
|
Evaluation of a UMLS Auditing Process of Semantic Type Assignments. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007; 2007:294-298. [PMID: 18693845 PMCID: PMC2655790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/13/2007] [Revised: 07/17/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
The UMLS is a terminological system that integrates many source terminologies. Each concept in the UMLS is assigned one or more semantic types from the Semantic Network, an upper level ontology for biomedicine. Due to the complexity of the UMLS, errors exist in the semantic type assignments. Finding assignment errors may unearth modeling errors. Even with sophisticated tools, discovering assignment errors requires manual review. In this paper we describe the evaluation of an auditing project of UMLS semantic type assignments. We studied the performance of the auditors who reviewed potential errors. We found that four auditors, interacting according to a multi-step protocol, identified a high rate of errors (one or more errors in 81% of concepts studied) and that results were sufficiently reliable (0.67 to 0.70) for the two most common types of errors. However, reliability was low for each individual auditor, suggesting that review of potential errors is resource-intensive.
Collapse
|
69
|
Updating the genomic component of the UMLS Semantic Network. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007; 2007:150-154. [PMID: 18693816 PMCID: PMC2655921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/16/2007] [Revised: 07/15/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
The UMLS Metathesaurus and the Semantic Network (SN) were created in the absence of a comprehensive curated genomics terminology and before the recent quantitative and qualitative explosion of genomic knowledge. In this paper we evaluate the internal consistency of the SN's categories relevant to genomics and propose changes to improve its ability to express genomic knowledge. We evaluate the completeness of the SN with respect to genomic concepts by extracting genomics vocabulary from leading texts and databases of genomic information and comparing the extracted vocabulary to the SN. We propose corresponding extensions to the SN to fill identified gaps.
Collapse
|
70
|
Analysis of a study of the users, uses, and future agenda of the UMLS. J Am Med Inform Assoc 2007; 14:221-31. [PMID: 17213497 PMCID: PMC2213464 DOI: 10.1197/jamia.m2202] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2006] [Accepted: 12/12/2006] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVES The UMLS constitutes the largest existing collection of medical terms. However, little has been published about the users and uses of the UMLS. This study sheds light on these issues. DESIGN We designed a questionnaire consisting of 26 questions and distributed it to the UMLS user mailing list. Participants were assured complete confidentiality of their replies. To further encourage list members to respond, we promised to provide them with early results prior to publication. Sector analysis of the responses, according to employment organizations is used to obtain insights into some responses. RESULTS We received 70 responses. The study confirms two intended uses of the UMLS: access to source terminologies (75%), and mapping among them (44%). However, most access is just to a few sources, led by SNOMED, MeSH, and ICD. Out of 119 reported purposes of use, terminology research (37), information retrieval (19), and terminology translation (14) lead. Four important observations are that the UMLS is widely used as a terminology (77%), even though it was not designed as one; many users (73%) want the NLM to mark concepts with multiple parents in an indented hierarchy and to derive a terminology from the UMLS (73%). Finally, auditing the UMLS is a top budget priority (35%) for users. CONCLUSIONS The study reports many uses of the UMLS in a variety of subjects from terminology research to decision support and phenotyping. The study confirms that the UMLS is used to access its source terminologies and to map among them. Two primary concerns of the existing user base are auditing the UMLS and the design of a UMLS-based derived terminology.
Collapse
|
71
|
Structural methodologies for auditing SNOMED. J Biomed Inform 2006; 40:561-81. [PMID: 17276736 DOI: 10.1016/j.jbi.2006.12.003] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2006] [Revised: 12/09/2006] [Accepted: 12/12/2006] [Indexed: 11/30/2022]
Abstract
SNOMED is one of the leading health care terminologies being used worldwide. As such, quality assurance is an important part of its maintenance cycle. Methodologies for auditing SNOMED based on structural aspects of its organization are presented. In particular, automated techniques for partitioning SNOMED into smaller groups of concepts based primarily on relationships patterns are defined. Two abstraction networks, the area taxonomy and p-area taxonomy, are derived from the partitions. The high-level views afforded by these abstraction networks form the basis for systematic auditing. The networks tend to highlight errors that manifest themselves as irregularities at the abstract level. They also support group-based auditing, where sets of purportedly similar concepts are focused on for review. The auditing methodologies are demonstrated on one of SNOMED's top-level hierarchies. Errors discovered during the auditing process are reported.
Collapse
|
72
|
Auditing as part of the terminology design life cycle. J Am Med Inform Assoc 2006; 13:676-90. [PMID: 16929044 PMCID: PMC1656963 DOI: 10.1197/jamia.m2036] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2005] [Accepted: 07/16/2006] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE To develop and test an auditing methodology for detecting errors in medical terminologies satisfying systematic inheritance. This methodology is based on various abstraction taxonomies that provide high-level views of a terminology and highlight potentially erroneous concepts. DESIGN Our auditing methodology is based on dividing concepts of a terminology into smaller, more manageable units. First, we divide the terminology's concepts into areas according to their relationships/roles. Then each multi-rooted area is further divided into partial-areas (p-areas) that are singly-rooted. Each p-area contains a set of structurally and semantically uniform concepts. Two kinds of abstraction networks, called the area taxonomy and p-area taxonomy, are derived. These taxonomies form the basis for the auditing approach. Taxonomies tend to highlight potentially erroneous concepts in areas and p-areas. Human reviewers can focus their auditing efforts on the limited number of problematic concepts following two hypotheses on the probable concentration of errors. RESULTS A sample of the area taxonomy and p-area taxonomy for the Biological Process (BP) hierarchy of the National Cancer Institute Thesaurus (NCIT) was derived from the application of our methodology to its concepts. These views led to the detection of a number of different kinds of errors that are reported, and to confirmation of the hypotheses on error concentration in this hierarchy. CONCLUSION Our auditing methodology based on area and p-area taxonomies is an efficient tool for detecting errors in terminologies satisfying systematic inheritance of roles, and thus facilitates their maintenance. This methodology concentrates a domain expert's manual review on portions of the concepts with a high likelihood of errors.
Collapse
|
73
|
Relationship structures and semantic type assignments of the UMLS Enriched Semantic Network. J Am Med Inform Assoc 2005; 12:657-66. [PMID: 16049233 PMCID: PMC1294037 DOI: 10.1197/jamia.m1605] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2004] [Accepted: 07/25/2005] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The Enriched Semantic Network (ESN) was introduced as an extension of the Unified Medical Language System (UMLS) Semantic Network (SN). Its multiple subsumption configuration and concomitant multiple inheritance make the ESN's relationship structures and semantic type assignments different from those of the SN. A technique for deriving the relationship structures of the ESN's semantic types and an automated technique for deriving the ESN's semantic type assignments from those of the SN are presented. DESIGN The technique to derive the ESN's relationship structures finds all newly inherited relationships in the ESN. All such relationships are audited for semantic validity, and the blocking mechanism is used to block invalid relationships. The mapping technique to derive the ESN's semantic type assignments uses current SN semantic type assignments and preserves nonredundant categorizations, while preventing new redundant categorizations. RESULTS Among the 426 newly inherited relationships, 326 are deemed valid. Seven blockings are applied to avoid inheritance of the 100 invalid relationships. Sixteen semantic types have different relationship structures in the ESN as compared to those in the SN. The mapping of semantic type assignments from the SN to the ESN avoids the generation of 26,950 redundant categorizations. The resulting ESN contains 138 semantic types, 149 IS-A links, 7,303 relationships, and 1,013,876 semantic type assignments. CONCLUSION The ESN's multiple inheritance provides more complete relationship structures than in the SN. The ESN's semantic type assignments avoid the existing redundant categorizations appearing in the SN and prevent new ones that might arise due to multiple parents. Compared to the SN, the ESN provides a more accurate unifying semantic abstraction of the UMLS Metathesaurus.
Collapse
|
74
|
An expert study evaluating the UMLS lexical metaschema. Artif Intell Med 2005; 34:219-33. [PMID: 15996860 DOI: 10.1016/j.artmed.2005.01.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2004] [Revised: 01/10/2005] [Accepted: 01/10/2005] [Indexed: 10/25/2022]
Abstract
OBJECTIVE A metaschema is an abstraction network of the UMLS's semantic network (SN) obtained from a connected partition of its collection of semantic types. A lexical metaschema was previously derived based on a lexical partition which partitioned the SN into semantic-type groups using identical word-usage among the names of semantic types and the definitions of their respective children. In this paper, a statistical analysis methodology is presented to evaluate the lexical metaschema based on a study involving a group of established UMLS experts. METHODS In the study, each expert was asked to identify subject areas of the SN based on his or her understanding of the various semantic types. For this purpose, the expert scans the SN hierarchy top-down, identifying semantic types, which are important and different enough from their parent semantic types, as roots of their groups. From the response of each expert, an "expert metaschema" is constructed. The different experts' metaschemas can vary widely. So, additional metaschemas are obtained from aggregations of the experts' responses. Of special interest is the consensus metaschema which represents an aggregation of a simple majority of the experts' responses. Statistical analysis comparing the lexical metaschema with the experts' metaschemas and the consensus metaschema is presented. RESULTS The analysis results shows that 17 out of the 21 meta-semantic types in the lexical metaschema also appear in the consensus metaschema (about 81%). There are 107 semantic types (about 79%) covered by identical meta-semantic types and refinements. The results show the high similarity between the two metaschemas. Furthermore, the statistical analysis shows that the lexical metaschema did not grossly underperform compared to the experts. CONCLUSION Our study shows that the lexical metaschema provides a good approximation for a partition of meaningful subject areas in the SN, when compared to the consensus metaschema capturing the aggregation of a simple majority of the human experts' opinions.
Collapse
|
75
|
A lexical metaschema for the UMLS semantic network. Artif Intell Med 2005; 33:41-59. [PMID: 15617981 DOI: 10.1016/j.artmed.2004.06.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2004] [Revised: 05/13/2004] [Accepted: 06/02/2004] [Indexed: 11/29/2022]
Abstract
OBJECTIVE A metaschema is a high-level abstraction network of the UMLS's semantic network (SN) obtained from a partition of the SN's collection of semantic types. Every metaschema has nodes, called meta-semantic types, each of which denotes a group of semantic types constituting a subject area of the SN. A new kind of metaschema, called the lexical metaschema, is derived from a lexical partition of the SN. The lexical metaschema is compared to previously derived metaschemas, e.g., the cohesive metaschema. DESIGN A new lexical partitioning methodology is presented based on identical word-usage among the names of semantic types and the definitions of their respective children. The lexical metaschema is derived from the application of the methodology. We compare the constituent meta-semantic types and their underlying semantic-type groups with the previously derived cohesive metaschema. A similar comparison of the lexical partition and a published partition of the SN is also carried out. RESULTS The lexical partition of the SN has 21 semantic-type groups, each of which represents a subject area. The lexical metaschema thus has 21 meta-semantic types, 19 meta-child-of hierarchical relationships, and 86 meta-relationships. Our comparison shows that 15 out of the 21 meta-semantic types in the lexical metaschema also appear in the cohesive metaschema, and 80 semantic types are covered by identical meta-semantic types or refinements between the two metaschemas. The comparison between the lexical partition and the semantic partition shows that they have very low similarity. CONCLUSION The algorithmically derived lexical metaschema serves as an abstraction of the SN and provides views representing different subject areas. It compares favorably with the cohesive metaschema derived via the SN's relationship configuration.
Collapse
|
76
|
Research on structural issues of the UMLS--past, present, and future. J Biomed Inform 2004; 36:409-13. [PMID: 14759815 DOI: 10.1016/j.jbi.2003.11.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2003] [Indexed: 10/26/2022]
|
77
|
|
78
|
Auditing concept categorizations in the UMLS. Artif Intell Med 2004; 31:29-44. [PMID: 15182845 DOI: 10.1016/j.artmed.2004.02.002] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2003] [Revised: 02/12/2004] [Accepted: 02/27/2004] [Indexed: 11/24/2022]
Abstract
The Unified Medical Language System (UMLS) integrates about 880,000 concepts from 100 biomedical terminologies. Each concept is categorized to at least one semantic type of the Semantic Network. During the integration, it is unavoidable that some categorization errors and inconsistencies will be introduced. In this paper, we present an auditing technique to find such errors and inconsistencies. Our technique is based on an expert reviewing the pure intersections of meta-semantic types of a metaschema, a compact abstract view of the UMLS Semantic Network. We use a divide and conquer approach, handling differently small pure intersections and medium to large pure intersections. By using this approach, we limit the number of concepts reviewed, for which we expect a high percentage of errors. We reviewed all concepts in 657 pure intersections containing one to 10 concepts. Various kinds of errors are identified and the analysis of the results are presented in the paper. Also, we checked the pure intersections containing more than 10 concepts for their semantic soundness, where the semantically suspicious pure intersections are presented in the paper and their concepts are reviewed.
Collapse
|
79
|
An enriched unified medical language system semantic network with a multiple subsumption hierarchy. J Am Med Inform Assoc 2004; 11:195-206. [PMID: 14764611 PMCID: PMC400518 DOI: 10.1197/jamia.m1269] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The Unified Medical Language System's (UMLS's) Semantic Network's (SN's) two-tree structure is restrictive because it does not allow a semantic type to be a specialization of several other semantic types. In this article, the SN is expanded into a multiple subsumption structure with a directed acyclic graph (DAG) IS-A hierarchy, allowing a semantic type to have multiple parents. New viable IS-A links are added as warranted. DESIGN Two methodologies are presented to identify and add new viable IS-A links. The first methodology is based on imposing the characteristic of connectivity on a previously presented partition of the SN. Four transformations are provided to find viable IS-A links in the process of converting the partition's disconnected groups into connected ones. The second methodology identifies new IS-A links through a string matching process involving names and definitions of various semantic types in the SN. A domain expert is needed to review all the results to determine the validity of the new IS-A links. RESULTS Nineteen new IS-A links are added to the SN, and four new semantic types are also created to support the multiple subsumption framework. The resulting network, called the Enriched Semantic Network (ESN), exhibits a DAG-structured hierarchy. A partition of the ESN containing 19 connected groups is also derived. CONCLUSION The ESN is an expanded abstraction of the UMLS compared with the original SN. Its multiple subsumption hierarchy can accommodate semantic types with multiple parents. Its representation thus provides direct access to a broader range of subsumption knowledge.
Collapse
|
80
|
Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus. J Biomed Inform 2003; 36:450-61. [PMID: 14759818 DOI: 10.1016/j.jbi.2003.11.001] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2002] [Indexed: 11/22/2022]
Abstract
OBJECTIVE To develop and test a method for automatically detecting inconsistencies between the parent-child is-a relationships in the Metathesaurus and the ancestor-descendant relationships in the Semantic Network of the Unified Medical Language System (UMLS). METHODS We exploited the fact that each Metathesaurus concept is assigned one or more semantic types from the UMLS Semantic Network and that the semantic types are arranged in a hierarchy. We compared the semantic types of each pair of parent and child concepts to determine if the types "explained" the Metathesaurus is-a relationships. We considered cases where the semantic type of the parent was neither the same as, nor an ancestor of, the semantic type of the child to be "unexplained." We applied this method to the January 2002 release of the UMLS and examined the unexplained cases we discovered to determine their causes. RESULTS We found that 17022 (24.3%) of the parent-child is-a relationships in the UMLS Metathesaurus could not be explained based on the semantic types of the concepts. Causes for these discrepancies included cases where the parent or child was missing a semantic type, cases where the semantic type of the child was too general or the semantic type of the parent was too specific, cases where the parent-child relationship was incorrect, and cases where an ancestor-descendant relationship should be added to the UMLS Semantic network. In many cases, the specific cause of the discrepancy cannot be resolved without authoritative judgment by the UMLS developers. CONCLUSIONS Our method successfully detects inconsistencies between the hierarchies of the UMLS Metathesaurus and Semantic Network. We believe that our method should be added to the set of tools that the UMLS developers use to maintain and audit the UMLS knowledge sources.
Collapse
|
81
|
Abstract
The enriched semantic network (ESN) has previously been presented as an enhancement of the semantic network (SN) of the UMLS. The ESN's hierarchy is a DAG (Directed Acyclic Graph) structure allowing for multiple parents. The ESN is thus more complex than the SN and can be more difficult to view and comprehend. We have previously introduced the notion of a metaschema for the SN as a compact abstraction to support SN comprehension. We extend the definition of metaschema to make it applicable to a DAG classification hierarchy, such as the one exhibited by the ESN. We specify the requirements for and describe the general process of deriving such a metaschema. We derive two particular metaschemas of the ESN based on a pair of partitions. These two metaschemas and their underlying partitions are compared. Both metaschemas serve as compact representations of the ESN, allowing for convenient viewing of its hierarchy and easier comprehension.
Collapse
|
82
|
Evaluation and application of a semantic network partition. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2002; 6:109-15. [PMID: 12075664 DOI: 10.1109/titb.2002.1006297] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Semantic networks (SNs) are excellent knowledge representation structures. However, large semantic networks are hard to comprehend. To overcome this difficulty, several methods of partitioning have been developed that rely on different mixes of structural and semantic methods. However, little has appeared in the literature concerning the question whether a partition of a semantic network creates subnetworks that agree with human insight. We address this issue by presenting a comparison between the results of an algorithmic partitioning method and a partition created by a group of experts. Subsequently, we show how a network partition can be used to generate various partial views of a semantic network, which facilitate user orientation. Examples from the Unified Medical Language System (UMLS) SN are used to demonstrate partial views.
Collapse
|
83
|
Partitioning the UMLS semantic network. IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE : A PUBLICATION OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY 2002; 6:102-8. [PMID: 12075663 DOI: 10.1109/titb.2002.1006296] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The unified medical language system (UMLS) integrates many well-established biomedical terminologies. The UMLS semantic network (SN) can help orient users to the vast knowledge content of the UMLS Metathesaurus (META) via its abstract conceptual view. However, the SN itself is large and complex and may still be difficult to comprehend. Our technique partitions the SN into smaller meaningful units amenable to display on limited-sized computer screens. The basis for the partitioning is the distribution of the relationships within the SN. Three rules are applied to transform the original partition into a second more cohesive partition.
Collapse
|
84
|
Abstract
The Unified Medical Language System (UMLS) joins together a group of established medical terminologies in a unified knowledge representation framework. Two major resources of the UMLS are its Metathesaurus, containing a large number of concepts, and the Semantic Network (SN), containing semantic types and forming an abstraction of the Metathesaurus. However, the SN itself is large and complex and may still be difficult to view and comprehend. Our structural partitioning technique partitions the SN into structurally uniform sets of semantic types based on the distribution of the relationships within the SN. An enhancement of the structural partition results in cohesive, singly rooted sets of semantic types. Each such set is named after its root which represents the common nature of the group. These sets of semantic types are represented by higher-level components called metasemantic types. A network, called a metaschema, which consists of the meta-semantic types connected by hierarchical and semantic relationships is obtained and provides an abstract view supporting orientation to the SN. The metaschema is utilized to audit the UMLS classifications. We present a set of graphical views of the SN based on the metaschema to help in user orientation to the SN. A study compares the cohesive metaschema to metaschemas derived semantically by UMLS experts.
Collapse
|
85
|
A metaschema of the UMLS based on a partition of its semantic network. Proc AMIA Symp 2002:234-8. [PMID: 11825187 PMCID: PMC2243491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023] Open
Abstract
The Unified Medical Language System's (UMLS's) Semantic Network (SN) provides an important conceptual abstraction that helps orient users to the vast knowledge content of its Metathesaurus. However, the SN is itself large and complex, and can also benefit from an additional abstract view of its own. In this paper, we present a metaschema that serves such a purpose. This metaschema is derived from a previously developed partitioning methodology for the SN. The metaschema is formally defined, and used to provide partial compact views of the SN.
Collapse
|
86
|
Auditing the UMLS for redundant classifications. Proc AMIA Symp 2002:612-6. [PMID: 12463896 PMCID: PMC2244162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
The UMLS's Semantic Network (SN) serves as a valuable abstraction for the underlying concept repository called the Metathesaurus (META). Specifically, the SN forms a classification layer for the META, with each of the META's constituent concepts assigned to one or more semantic types in the SN. The rule in the design of the SN is to have concepts explicitly assigned to the lowest possible semantic types in the SN's IS-A hierarchy. Implicit assignment to higher semantic types can be inferred via the IS-A relationships. However, in subsequent versions of the UMLS, unnecessary, simultaneous assignments to descendant and ancestor semantic types have been discovered (e.g., 8,622 in the UMLS 1998 version and 12,657 in the 2001 version). The assignment of concepts to such ancestor semantic types is called redundant classification. There is a need for an automated auditing tool that can identify all these redundant classifications. In this paper, an efficient algorithm for this auditing task is introduced. Details of its application to the current (2001) version of the UMLS are presented and the results are discussed.
Collapse
|
87
|
Enriching the structure of the UMLS semantic network. Proc AMIA Symp 2002:939-43. [PMID: 12463963 PMCID: PMC2244261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
The Unified Medical Language System's (UMLS's) Semantic Network (SN)---consisting of a network of semantic types---has a two-tree structure, where each semantic type has at most one parent semantic type. This arrangement is restrictive because some semantic types are, by their definition, specializations of several parents. As a proposed enhancement to the SN, its semantic types have previously been partitioned into groups, each of which contains semantic types of some specific area. However, some groups of this proposed partition contain forest (i.e., multiple-tree) structures or even isolated semantic types. Both situations imply a disconnected internal structure. Connectivity is actually one way to assess the proposed "semantic validity" principle for partitions. It is a desired, although not required, property. In this paper, we introduce a methodology for identifying "missing" IS-A links and adding them to the SN. This process transforms the SN into a Directed Acyclic Graph (DAG) structure, with semantic types permitted to have multiple parents. A result of our methodology is the transformation of the proposed SN partition into groups satisfying the connectivity property.
Collapse
|
88
|
Using the metaschema to audit UMLS classification errors. Proc AMIA Symp 2002:310-4. [PMID: 12463837 PMCID: PMC2244443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/27/2023] Open
Abstract
The Unified Medical Language System integrates about 800,000 concepts from 99 biomedical terminologies. Each concept is assigned to at least one semantic type of the Semantic Network. During the integration, it is unavoidable that some classification errors and inconsistencies will be introduced. In this paper, we present an auditing technique to find such errors and inconsistencies. Our technique is based on an expert reviewing the pure intersections of meta-semantic types of the metaschema, a compact abstract view of the Semantic Network. Results regarding the pure intersections are reported. The analysis results for pure intersections with 1 to 6 concepts are presented. Various kinds of errors are identified.
Collapse
|
89
|
Partitioning an object-oriented terminology schema. Methods Inf Med 2001; 40:204-12. [PMID: 11501633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
Controlled medical terminologies are increasingly becoming strategic components of various healthcare enterprises. However, the typical medical terminology can be difficult to exploit due to its extensive size and high density. The schema of a medical terminology offered by an object-oriented representation is a valuable tool in providing an abstract view of the terminology, enhancing comprehensibility and making it more usable. However, schemas themselves can be large and unwieldy. We present a methodology for partitioning a medical terminology schema into manageably sized fragments that promote increased comprehension. Our methodology has a refinement process for the subclass hierarchy of the terminology schema. The methodology is carried out by a medical domain expert in conjunction with a computer. The expert is guided by a set of three modeling rules, which guarantee that the resulting partitioned schema consists of a forest of trees. This makes it easier to understand and consequently use the medical terminology. The application of our methodology to the schema of the Medical Entities Dictionary (MED) is presented.
Collapse
|
90
|
Abstract
OBJECTIVE The Unified Medical Language System (UMLS) combines many well-established authoritative medical informatics terminologies in one knowledge representation system. Such a resource is very valuable to the health care community and industry. However, the UMLS is very large and complex and poses serious comprehension problems for users and maintenance personnel. The authors present a representation to support the user's comprehension and navigation of the UMLS. DESIGN An object-oriented database (OODB) representation is used to represent the two major components of the UMLS-the Metathesaurus and the Semantic Network-as a unified system. The semantic types of the Semantic Network are modeled as semantic type classes. Intersection classes are defined to model concepts of multiple semantic types, which are removed from the semantic type classes. RESULTS The authors provide examples of how the intersection classes help expose omissions of concepts, highlight errors of semantic type classification, and uncover ambiguities of concepts in the UMLS. The resulting UMLS OODB schema is deeper and more refined than the Semantic Network, since intersection classes are introduced. The Metathesaurus is classified into more mutually exclusive, uniform sets of concepts. The schema improves the user's comprehension and navigation of the Metathesaurus. CONCLUSIONS The UMLS OODB schema supports the user's comprehension and navigation of the Metathesaurus. It also helps expose and resolve modeling problems in the UMLS.
Collapse
|
91
|
Benefits of an object-oriented database representation for controlled medical terminologies. J Am Med Inform Assoc 1999; 6:283-303. [PMID: 10428002 PMCID: PMC61370 DOI: 10.1136/jamia.1999.0060283] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE Controlled medical terminologies (CMTs) have been recognized as important tools in a variety of medical informatics applications, ranging from patient-record systems to decision-support systems. Controlled medical terminologies are typically organized in semantic network structures consisting of tens to hundreds of thousands of concepts. This overwhelming size and complexity can be a serious barrier to their maintenance and widespread utilization. The authors propose the use of object-oriented databases to address the problems posed by the extensive scope and high complexity of most CMTs for maintenance personnel and general users alike. DESIGN The authors present a methodology that allows an existing CMT, modeled as a semantic network, to be represented as an equivalent object-oriented database. Such a representation is called an object-oriented health care terminology repository (OOHTR). RESULTS The major benefit of an OOHTR is its schema, which provides an important layer of structural abstraction. Using the high-level view of a CMT afforded by the schema, one can gain insight into the CMT's overarching organization and begin to better comprehend it. The authors' methodology is applied to the Medical Entities Dictionary (MED), a large CMT developed at Columbia-Presbyterian Medical Center. Examples of how the OOHTR schema facilitated updating, correcting, and improving the design of the MED are presented. CONCLUSION The OOHTR schema can serve as an important abstraction mechanism for enhancing comprehension of a large CMT, and thus promotes its usability.
Collapse
|
92
|
Modeling the UMLS using an OODB. Proc AMIA Symp 1999:82-6. [PMID: 10566325 PMCID: PMC2232519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/14/2023] Open
Abstract
The Unified Medical Language System combines many well established authoritative medical informatics terminologies in one system. Such a resource is very valuable to the healthcare industry. However, the UMLS is very large and complex and poses serious comprehension problems for users and maintenance personnel. Furthermore, the sets of concepts of semantic types are not semantically uniform and thus are difficult to study. We describe a method to represent two components of the UMLS, the Metathesaurus (META) and the Semantic Network, as an OODB. The resulting UMLS OODB schema is deeper and more refined than the Semantic Network. It offers semantically uniform classes, which improves support for comprehension and navigation of META. The UMLS OODB also exposes problems in the semantic type classifications.
Collapse
|
93
|
Abstract
Controlled medical vocabularies are useful in application areas such as medical information systems and decision-support systems. However, such vocabularies are large and complex, and working with them can be daunting. It is important to provide a means for orienting vocabulary designers and users to the vocabulary's contents. We describe a methodology for partitioning a vocabulary based on an IS-A hierarchy into small meaningful pieces. The methodology uses our disciplined modeling framework to refine the IS-A hierarchy according to prescribed rules in a process carried out by a user in conjunction with the computer. The partitioning of the hierarchy implies a partitioning of the vocabulary. We demonstrate the methodology with respect to a complex sample of the MED, an existing medical vocabulary.
Collapse
|
94
|
|
95
|
Converting an integrated hospital formulary into an object-oriented database representation. Proc AMIA Symp 1998:770-4. [PMID: 9929323 PMCID: PMC2232291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/10/2023] Open
Abstract
Controlled Medical Vocabularies (CMVs) have proven to be extremely useful in their support of the tasks of information sharing and integration, communication among various software applications, and decision support. Modeling a CMV as an Object-Oriented Database (OODB) provides additional benefits such as increased support for vocabulary comprehension and flexible access. In this paper, we describe the process of modeling and converting an existing integrated hospital formulary (i.e., set of pharmacological concepts) into an equivalent OODB representation, which, in general, we refer to as an Object-Oriented Healthcare Vocabulary Repository (OOHVR). The source for our example OOHVR is a formulary provided by the Connecticut Healthcare Research and Education Foundation (CHREF). Utilizing this source formulary together with the semantic hierarchy composed of major and minor drug classes defined as part of the National Drug Code (NDC) directory, we constructed a CMV that was eventually converted into its OOHVR form (the CHREF-OOHVR). The actual conversion step was carried out automatically by a program, called the OOHVR Generator, that we have developed. At present, the CHREF-OOHVR is running on top of ONTOS, a commercial OODB management system, and is accessible on the Web.
Collapse
|
96
|
Partitioning a vocabulary's IS-A hierarchy into trees. PROCEEDINGS : A CONFERENCE OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION. AMIA FALL SYMPOSIUM 1997:630-4. [PMID: 9357702 PMCID: PMC2233245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Controlled medical vocabularies are useful in application areas such as medical information-systems and decision-support. However, such vocabularies are large and complex, and working with them can be daunting. It is important to provide a means for orienting users to the vocabulary's contents. This paper introduces a methodology for partitioning a vocabulary into small, meaningful pieces. The partitioning is done with respect to the vocabulary's IS-A hierarchy. The methodology, based on a set of rules for refining the IS-A hierarchy, is a process carried out by a user in conjunction with the computer. The methodology is demonstrated on a complex portion of a vocabulary.
Collapse
|
97
|
Computing access relevance for path-method generation in OODB and IM-OODB. J Intell Inf Syst 1996. [DOI: 10.1007/bf00125523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
98
|
Utilizing OODB schema modeling for vocabulary management. PROCEEDINGS : A CONFERENCE OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION. AMIA FALL SYMPOSIUM 1996:274-8. [PMID: 8947671 PMCID: PMC2233034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Comprehension of complex controlled vocabularies is often difficult. We present a method, facilitated by an object-oriented database, for depicting such a vocabulary (the Medical Entities Dictionary (MED) from the Columbia-Presbyterian Medical Center) in a schematic way which uses a sparse inheritance network of area classes. The resulting Object Oriented Health Vocabulary repository (OOHVR) allows visualization of the 43,000 MED concepts as 90 area classes. This view has provided valuable information to those responsible with maintaining the MED. As a result, the MED organization has been improved and some previously-unrecognized errors and inconsistencies have been removed. We believe that this schematic approach allows improved comprehension of the gestalt of large controlled medical vocabulary.
Collapse
|
99
|
A Graphical Schema Representation for Object-Oriented Databases. ACTA ACUST UNITED AC 1993. [DOI: 10.1007/978-1-4471-3423-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
|
100
|
|