1
|
Rappaport N, Twik M, Plaschkes I, Nudel R, Iny Stein T, Levitt J, Gershoni M, Morrey CP, Safran M, Lancet D. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res 2016; 45:D877-D887. [PMID: 27899610 PMCID: PMC5210521 DOI: 10.1093/nar/gkw1012] [Citation(s) in RCA: 375] [Impact Index Per Article: 41.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 10/14/2016] [Accepted: 10/29/2016] [Indexed: 12/13/2022] Open
Abstract
The MalaCards human disease database (http://www.malacards.org/) is an integrated compendium of annotated diseases mined from 68 data sources. MalaCards has a web card for each of ∼20 000 disease entries, in six global categories. It portrays a broad array of annotation topics in 15 sections, including Summaries, Symptoms, Anatomical Context, Drugs, Genetic Tests, Variations and Publications. The Aliases and Classifications section reflects an algorithm for disease name integration across often-conflicting sources, providing effective annotation consolidation. A central feature is a balanced Genes section, with scores reflecting the strength of disease-gene associations. This is accompanied by other gene-related disease information such as pathways, mouse phenotypes and GO-terms, stemming from MalaCards’ affiliation with the GeneCards Suite of databases. MalaCards’ capacity to inter-link information from complementary sources, along with its elaborate search function, relational database infrastructure and convenient data dumps, allows it to tackle its rich disease annotation landscape, and facilitates systems analyses and genome sequence interpretation. MalaCards adopts a ‘flat’ disease-card approach, but each card is mapped to popular hierarchical ontologies (e.g. International Classification of Diseases, Human Phenotype Ontology and Unified Medical Language System) and also contains information about multi-level relations among diseases, thereby providing an optimal tool for disease representation and scrutiny.
Collapse
|
Research Support, Non-U.S. Gov't |
9 |
375 |
2
|
Rappaport N, Nativ N, Stelzer G, Twik M, Guan-Golan Y, Stein TI, Bahir I, Belinky F, Morrey CP, Safran M, Lancet D. MalaCards: an integrated compendium for diseases and their annotation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013; 2013:bat018. [PMID: 23584832 PMCID: PMC3625956 DOI: 10.1093/database/bat018] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Comprehensive disease classification, integration and annotation are crucial for biomedical discovery. At present, disease compilation is incomplete, heterogeneous and often lacking systematic inquiry mechanisms. We introduce MalaCards, an integrated database of human maladies and their annotations, modeled on the architecture and strategy of the GeneCards database of human genes. MalaCards mines and merges 44 data sources to generate a computerized card for each of 16 919 human diseases. Each MalaCard contains disease-specific prioritized annotations, as well as inter-disease connections, empowered by the GeneCards relational database, its searches and GeneDecks set analyses. First, we generate a disease list from 15 ranked sources, using disease-name unification heuristics. Next, we use four schemes to populate MalaCards sections: (i) directly interrogating disease resources, to establish integrated disease names, synonyms, summaries, drugs/therapeutics, clinical features, genetic tests and anatomical context; (ii) searching GeneCards for related publications, and for associated genes with corresponding relevance scores; (iii) analyzing disease-associated gene sets in GeneDecks to yield affiliated pathways, phenotypes, compounds and GO terms, sorted by a composite relevance score and presented with GeneCards links; and (iv) searching within MalaCards itself, e.g. for additional related diseases and anatomical context. The latter forms the basis for the construction of a disease network, based on shared MalaCards annotations, embodying associations based on etiology, clinical features and clinical conditions. This broadly disposed network has a power-law degree distribution, suggesting that this might be an inherent property of such networks. Work in progress includes hierarchical malady classification, ontological mapping and disease set analyses, striving to make MalaCards an even more effective tool for biomedical research. Database URL: http://www.malacards.org/
Collapse
|
Research Support, Non-U.S. Gov't |
12 |
156 |
3
|
Morrey CP, Geller J, Halper M, Perl Y. The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS. J Biomed Inform 2009; 42:468-89. [PMID: 19475725 DOI: 10.1016/j.jbi.2009.01.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The UMLS's integration of more than 100 source vocabularies, not necessarily consistent with one another, causes some inconsistencies. The purpose of auditing the UMLS is to detect such inconsistencies and to suggest how to resolve them while observing the requirement of fully representing the content of each source in the UMLS. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports "neighborhood-based" auditing, where, at any given time, an auditor concentrates on a single-focus concept and one of a variety of neighborhoods of its closely related concepts. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings. The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a group of case studies. Its impact is tested with a study involving a select group of auditors.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
23 |
4
|
Gu HH, Elhanan G, Perl Y, Hripcsak G, Cimino JJ, Xu J, Chen Y, Geller J, Paul Morrey C. A study of terminology auditors' performance for UMLS semantic type assignments. J Biomed Inform 2012; 45:1042-8. [PMID: 22687822 DOI: 10.1016/j.jbi.2012.05.006] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2012] [Revised: 05/26/2012] [Accepted: 05/31/2012] [Indexed: 11/30/2022]
Abstract
Auditing healthcare terminologies for errors requires human experts. In this paper, we present a study of the performance of auditors looking for errors in the semantic type assignments of complex UMLS concepts. In this study, concepts are considered complex whenever they are assigned combinations of semantic types. Past research has shown that complex concepts have a higher likelihood of errors. The results of this study indicate that individual auditors are not reliable when auditing such concepts and their performance is low, according to various metrics. These results confirm the outcomes of an earlier pilot study. They imply that to achieve an acceptable level of reliability and performance, when auditing such concepts of the UMLS, several auditors need to be assigned the same task. A mechanism is then needed to combine the possibly differing opinions of the different auditors into a final determination. In the current study, in contrast to our previous work, we used a majority mechanism for this purpose. For a sample of 232 complex UMLS concepts, the majority opinion was found reliable and its performance for accuracy, recall, precision and the F-measure was found statistically significantly higher than the average performance of individual auditors.
Collapse
|
Research Support, N.I.H., Extramural |
13 |
13 |
5
|
Morrey CP, Perl Y, Halper M, Chen L, Gu H“H. A chemical specialty semantic network for the Unified Medical Language System. J Cheminform 2012; 4:9. [PMID: 22577759 PMCID: PMC3428652 DOI: 10.1186/1758-2946-4-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2012] [Accepted: 05/11/2012] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS's coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics.A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type's extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements. RESULTS A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the "Chemical Entities of Biological Interest" (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown. CONCLUSIONS The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI.
Collapse
|
research-article |
13 |
10 |
6
|
Chen L, Morrey CP, Gu H, Halper M, Perl Y. Modeling multi-typed structurally viewed chemicals with the UMLS Refined Semantic Network. J Am Med Inform Assoc 2008; 16:116-31. [PMID: 18952946 DOI: 10.1197/jamia.m2604] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE Chemical concepts assigned multiple "Chemical Viewed Structurally" semantic types (STs) in the Unified Medical Language System (UMLS) are subject to ambiguous interpretation. The multiple assignments may denote the fact that a specific represented chemical (combination) is a conjugate, derived via a chemical reaction of chemicals of the different types, or a complex, composed of a mixture of such chemicals. The previously introduced Refined Semantic Network (RSN) is modified to properly model these varied multi-typed chemical combinations. DESIGN The RSN was previously introduced as an enhanced abstraction of the UMLS's concepts. It features new types, called intersection semantic types (ISTs), each of which explicitly captures a unique combination of ST assignments in one abstract unit. The ambiguous ISTs of different "Chemical Viewed Structurally" ISTs of the RSN are replaced with two varieties of new types, called conjugate types and complex types, which explicitly denote the nature of the chemical interactions. Additional semantic relationships help further refine that new portion of the RSN rooted at the ST "Chemical Viewed Structurally." MEASUREMENTS The number of new conjugate and complex types and the amount of changes to the type assignment of chemical concepts are presented. RESULTS The modified RSN, consisting of 35 types and featuring 22 new conjugate and complex types, is presented. A total of 800 (about 98%) chemical concepts representing multi-typed chemical combinations from "Chemical Viewed Structurally" STs are uniquely assigned one of the new types. An additional benefit is the identification of a number of illegal ISTs and ST assignment errors, some of which are direct violations of exclusion rules defined by the UMLS Semantic Network. CONCLUSION The modified RSN provides an enhanced abstract view of the UMLS's chemical content. Its array of conjugate and complex types provides a more accurate model of the variety of combinations involving chemicals viewed structurally. This framework will help streamline the process of type assignments for such chemical concepts and improve user orientation to the richness of the chemical content of the UMLS.
Collapse
|
Research Support, N.I.H., Extramural |
17 |
10 |
7
|
Morrey CP, Chen L, Halper M, Perl Y. Resolution of redundant semantic type assignments for organic chemicals in the UMLS. Artif Intell Med 2011; 52:141-51. [PMID: 21646001 DOI: 10.1016/j.artmed.2011.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Revised: 05/03/2011] [Accepted: 05/09/2011] [Indexed: 11/27/2022]
Abstract
OBJECTIVE The Unified Medical Language System (UMLS) integrates terms from different sources into concepts and supplements these with the assignment of one or more high-level semantic types (STs) from its Semantic Network (SN). For a composite organic chemical concept, multiple assignments of organic chemical STs often serve to enumerate the types of the composite's underlying chemical constituents. This practice sometimes leads to the introduction of a forbidden redundant ST assignment, where both an ST and one of its descendants are assigned to the same concept. A methodology for resolving redundant ST assignments for organic chemicals, better capturing the essence of such composite chemicals than the typical omission of the more general ST, is presented. MATERIALS AND METHODS The typical SN resolution of a redundant ST assignment is to retain only the more specific ST assignment and omit the more general one. However, with organic chemicals, that is not always the correct strategy. A methodology for properly dealing with the redundancy based on the relative sizes of the chemical components is presented. It is more accurate to use the ST of the larger chemical component for capturing the category of the concept, even if that means using the more general ST. RESULTS A sample of 254 chemical concepts having redundant ST assignments in older UMLS releases was audited to analyze the accuracy of current ST assignments. For 81 (32%) of them, our chemical analysis-based approach yielded a different recommendation from the UMLS (2009AA). New UMLS usage notes capturing rules of this methodology are proffered. CONCLUSIONS Redundant ST assignments have typically arisen for organic composite chemical concepts. A methodology for dealing with this kind of erroneous configuration, capturing the proper category for a composite chemical, is presented and demonstrated.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
5 |
8
|
Gu HH, Hripcsak G, Chen Y, Morrey CP, Elhanan G, Cimino J, Geller J, Perl Y. Evaluation of a UMLS Auditing Process of Semantic Type Assignments. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2007; 2007:294-298. [PMID: 18693845 PMCID: PMC2655790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Received: 03/13/2007] [Revised: 07/17/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
The UMLS is a terminological system that integrates many source terminologies. Each concept in the UMLS is assigned one or more semantic types from the Semantic Network, an upper level ontology for biomedicine. Due to the complexity of the UMLS, errors exist in the semantic type assignments. Finding assignment errors may unearth modeling errors. Even with sophisticated tools, discovering assignment errors requires manual review. In this paper we describe the evaluation of an auditing project of UMLS semantic type assignments. We studied the performance of the auditors who reviewed potential errors. We found that four auditors, interacting according to a multi-step protocol, identified a high rate of errors (one or more errors in 81% of concepts studied) and that results were sufficiently reliable (0.67 to 0.70) for the two most common types of errors. However, reliability was low for each individual auditor, suggesting that review of potential errors is resource-intensive.
Collapse
|
Evaluation Study |
18 |
|
9
|
Halper M, Morrey CP, Chen Y, Elhanan G, Hripcsak G, Perl Y. Auditing hierarchical cycles to locate other inconsistencies in the UMLS. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:529-36. [PMID: 22195107 PMCID: PMC3243212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A cycle in the parent relationship hierarchy of the UMLS is a configuration that effectively makes some concept(s) an ancestor of itself. Such a structural inconsistency can easily be found automatically. A previous strategy for disconnecting cycles is to break them with the deletion of one or more parent relationships-irrespective of the correctness of the deleted relationships. A methodology is introduced for auditing of cycles that seeks to discover and delete erroneous relationships only. Cycles involving three concepts are the primary consideration. Hypotheses about the high probability of locating an erroneous parent relationship in a cycle are proposed and confirmed with statistical confidence and lend credence to the auditing approach. A cycle may serve as an indicator of other non-structural inconsistencies that are otherwise difficult to detect automatically. An extensive auditing example shows how a cycle can indicate further inconsistencies.
Collapse
|
Research Support, N.I.H., Extramural |
14 |
|
10
|
Geller J, Morrey CP, Xu J, Halper M, Elhanan G, Perl Y, Hripcsak G. Comparing inconsistent relationship configurations indicating UMLS errors. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2009; 2009:193-7. [PMID: 20351848 PMCID: PMC2815406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The goal of this paper is to audit null-annotated parent-child pairs in the UMLS Metathesaurus. We have developed techniques for identifying suspicious pairs with high likelihood of errors by using inconsistencies between the hierarchical relationships of the Metathesaurus and the Semantic Network. Two formal conditions, called semantic inversion and lack of ancestry are investigated. Analyzing two corresponding samples shows that semantic inversion is significantly more likely to indicate an error than lack of ancestry, which in turn is more likely to indicate errors than a consistent configuration. We also discuss cases of parent-child pairs with semantic inversion that may be corrected by disambiguating the child.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
|
11
|
Li L, Morrey CP, Baorto D. Cross-mapping clinical notes between hospitals: an application of the LOINC Document Ontology. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2011; 2011:777-783. [PMID: 22195135 PMCID: PMC3243240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Standardization of document titles is essential for management as the volume of electronic clinical notes increases. The two campuses of the New York Presbyterian Hospital have over 2,700 distinct document titles. The LOINC Document Ontology (DO) provides a standard for the naming of clinical documents in a multi-axis structure. We have represented the latest LOINC DO structure in the MED, and developed an automated process mapping the clinical documents from both the West (Columbia) and East (Cornell) campuses to the LOINC DO. We find that the LOINC DO can represent the majority of our documents, and about half of the documents map between campuses using the LOINC DO as a reference. We evaluated the possibility of using current LOINC codes in document exchange between different institutions. While there is clear success in the ability of the LOINC DO to represent documents and facilitate exchange we find there are granularity issues.
Collapse
|
research-article |
14 |
|