1
|
Reich C, Ostropolets A, Ryan P, Rijnbeek P, Schuemie M, Davydov A, Dymshyts D, Hripcsak G. OHDSI Standardized Vocabularies-a large-scale centralized reference ontology for international data harmonization. J Am Med Inform Assoc 2024; 31:583-590. [PMID: 38175665 PMCID: PMC10873827 DOI: 10.1093/jamia/ocad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2023] [Revised: 11/30/2023] [Accepted: 12/23/2023] [Indexed: 01/05/2024] Open
Abstract
IMPORTANCE The Observational Health Data Sciences and Informatics (OHDSI) is the largest distributed data network in the world encompassing more than 331 data sources with 2.1 billion patient records across 34 countries. It enables large-scale observational research through standardizing the data into a common data model (CDM) (Observational Medical Outcomes Partnership [OMOP] CDM) and requires a comprehensive, efficient, and reliable ontology system to support data harmonization. MATERIALS AND METHODS We created the OHDSI Standardized Vocabularies-a common reference ontology mandatory to all data sites in the network. It comprises imported and de novo-generated ontologies containing concepts and relationships between them, and the praxis of converting the source data to the OMOP CDM based on these. It enables harmonization through assigned domains according to clinical categories, comprehensive coverage of entities within each domain, support for commonly used international coding schemes, and standardization of semantically equivalent concepts. RESULTS The OHDSI Standardized Vocabularies comprise over 10 million concepts from 136 vocabularies. They are used by hundreds of groups and several large data networks. More than 8600 users have performed 50 000 downloads of the system. This open-source resource has proven to address an impediment of large-scale observational research-the dependence on the context of source data representation. With that, it has enabled efficient phenotyping, covariate construction, patient-level prediction, population-level estimation, and standard reporting. DISCUSSION AND CONCLUSION OHDSI has made available a comprehensive, open vocabulary system that is unmatched in its ability to support global observational research. We encourage researchers to exploit it and contribute their use cases to this dynamic resource.
Collapse
Affiliation(s)
- Christian Reich
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- OHDSI Center at the Roux Institute, Northeastern University, Portland ME 04101, United States
- Department of Medical Informatics, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands
| | - Anna Ostropolets
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Department of Biomedical Informatics, Columbia University Medical Center, New York City NY 10032, United States
- Odysseus Data Services, Cambridge MA 02142, United States
| | - Patrick Ryan
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Department of Biomedical Informatics, Columbia University Medical Center, New York City NY 10032, United States
- Observational Health Data Analytics, Janssen Research & Development, Titusville NJ 08560, United States
| | - Peter Rijnbeek
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Department of Medical Informatics, Erasmus University Medical Center, 3015 GD Rotterdam, The Netherlands
| | - Martijn Schuemie
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Observational Health Data Analytics, Janssen Research & Development, Titusville NJ 08560, United States
| | - Alexander Davydov
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Odysseus Data Services, Cambridge MA 02142, United States
| | - Dmitry Dymshyts
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Observational Health Data Analytics, Janssen Research & Development, Titusville NJ 08560, United States
| | - George Hripcsak
- Coordinating Center, Observational Health Data Sciences and Informatics, New York City NY 10032, United States
- Department of Biomedical Informatics, Columbia University Medical Center, New York City NY 10032, United States
| |
Collapse
|
2
|
Noll R, Berger A, Facchinello C, Güngöze O, von Wagner M, Hoehl S, Neff M, Storf H, Schaaf J. Translation of Ontological Concepts from English into German Using Commercial Translation Software and Expert Evaluation. Stud Health Technol Inform 2024; 310:89-93. [PMID: 38269771 DOI: 10.3233/shti230933] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Medical ontologies are mostly available in English. This presents a language barrier that is a limitation in research and automated processing of patient data. The manual translation of ontologies is complex and time-consuming. However, there are commercial translation tools that have shown promising results in the field of medical terminology translation. The aim of this study is to translate selected terms of the Human Phenotype Ontology (HPO) from English into German using commercial translators. Six medical experts evaluated the translation candidates in an iterative process. The results show commercial translators, with DeepL in the lead, provide translations that are positively evaluated by experts. With a broader study scope and additional optimization techniques, commercial translators could support and facilitate the process of translating medical ontologies.
Collapse
Affiliation(s)
- Richard Noll
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| | - Alexandra Berger
- Frankfurt Reference Centre for Rare Diseases, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
| | - Carlo Facchinello
- Department of Internal Medicine 1, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
| | - Oya Güngöze
- Department of Internal Medicine 1, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
- Department of Medical Information Systems and Digitalization, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
| | - Michael von Wagner
- Department of Internal Medicine 1, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
- Department of Medical Information Systems and Digitalization, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
| | - Sebastian Hoehl
- Institute of Medical Virology, Goethe University Frankfurt, University Hospital Frankfurt, Frankfurt, Germany
| | - Michaela Neff
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| | - Holger Storf
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| | - Jannik Schaaf
- Goethe University Frankfurt, University Hospital Frankfurt, Institute of Medical Informatics, Frankfurt, Germany
| |
Collapse
|
3
|
Stöhr MR, Günther A, Majeed RW. Definition, Composition, and Harmonization of Core Datasets Within the German Center for Lung Research. Stud Health Technol Inform 2023; 302:696-700. [PMID: 37203472 DOI: 10.3233/shti230242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Core datasets are the composition of essential data items for a certain research scope. As they state commonalities between heterogeneous data collections, they serve as a basis for cross-site and cross-disease research. Therefore, researchers at the national and international levels have addressed the problem of missing core datasets. The German Center for Lung Research (DZL) comprises five sites and eight disease areas and aims to gain further scientific knowledge by continuously promoting collaborations. In this study, we elaborated a methodology for defining core datasets in the field of lung health science. Additionally, through support of domain experts, we have utilized our method and compiled core datasets for each DZL disease area and a general core dataset for lung research. All included data items were annotated with metadata and where possible they were assigned references to international classification systems. Our findings will support future scientific collaborations and meaningful data collections.
Collapse
Affiliation(s)
- Mark R Stöhr
- UGMLC, German Center for Lung Research (DZL), Justus-Liebig-University, Giessen, Germany
| | - Andreas Günther
- UGMLC, German Center for Lung Research (DZL), Justus-Liebig-University, Giessen, Germany
| | - Raphael W Majeed
- UGMLC, German Center for Lung Research (DZL), Justus-Liebig-University, Giessen, Germany
- Institute of Medical Informatics, Medical Faculty of RWTH Aachen, Aachen, Germany
| |
Collapse
|
4
|
Alpi KM, Martin CL, Plasek JM, Sittig S, Smith CA, Weinfurter EV, Wells JK, Wong R, Austin RR. Characterizing terminology applied by authors and database producers to informatics literature on consumer engagement with wearable devices. J Am Med Inform Assoc 2023:7172839. [PMID: 37203425 DOI: 10.1093/jamia/ocad082] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 04/24/2023] [Accepted: 04/28/2023] [Indexed: 05/20/2023] Open
Abstract
OBJECTIVE Identifying consumer health informatics (CHI) literature is challenging. To recommend strategies to improve discoverability, we aimed to characterize controlled vocabulary and author terminology applied to a subset of CHI literature on wearable technologies. MATERIALS AND METHODS To retrieve articles from PubMed that addressed patient/consumer engagement with wearables, we developed a search strategy of textwords and Medical Subject Headings (MeSH). To refine our methodology, we used a random sample of 200 articles from 2016 to 2018. A descriptive analysis of articles (N = 2522) from 2019 identified 308 (12.2%) CHI-related articles, for which we characterized their assigned terminology. We visualized the 100 most frequent terms assigned to the articles from MeSH, author keywords, CINAHL, and Engineering Databases (Compendex and Inspec together). We assessed the overlap of CHI terms among sources and evaluated terms related to consumer engagement. RESULTS The 308 articles were published in 181 journals, more in health journals (82%) than informatics (11%). Only 44% were indexed with the MeSH term "wearable electronic devices." Author keywords were common (91%) but rarely represented consumer engagement with device data, eg, self-monitoring (n = 12, 0.7%) or self-management (n = 9, 0.5%). Only 10 articles (3%) had terminology from all sources (authors, PubMed, CINAHL, Compendex, and Inspec). DISCUSSION Our main finding was that consumer engagement was not well represented in health and engineering database thesauri. CONCLUSIONS Authors of CHI studies should indicate consumer/patient engagement and the specific technology investigated in titles, abstracts, and author keywords to facilitate discovery by readers and expand vocabularies and indexing.
Collapse
Affiliation(s)
- Kristine M Alpi
- Icahn School of Medicine at Mount Sinai, Levy Library, Annenberg, New York, New York, USA
| | - Christie L Martin
- University of Minnesota School of Nursing, Minneapolis, Minnesota, USA
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Scott Sittig
- College of Nursing and Health Sciences, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
| | | | | | | | - Rachel Wong
- Department of Biomedical Informatics, Stony Brook University Hospital, Stony Brook, New York, USA
| | - Robin R Austin
- University of Minnesota School of Nursing, Minneapolis, Minnesota, USA
| |
Collapse
|
5
|
Soares F, Tateisi Y, Takatsuki T, Yamaguchi A. O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information. Genomics Inform 2021; 19:e26. [PMID: 34638173 PMCID: PMC8510863 DOI: 10.5808/gi.21014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 09/10/2021] [Indexed: 11/25/2022] Open
Abstract
Previous approaches to create a controlled vocabulary for Japanese have resorted to existing bilingual dictionary and transformation rules to allow such mappings. However, given the possible new terms introduced due to coronavirus disease 2019 (COVID-19) and the emphasis on respiratory and infection-related terms, coverage might not be guaranteed. We propose creating a Japanese bilingual controlled vocabulary based on MeSH terms assigned to COVID-19 related publications in this work. For such, we resorted to manual curation of several bilingual dictionaries and a computational approach based on machine translation of sentences containing such terms and the ranking of possible translations for the individual terms by mutual information. Our results show that we achieved nearly 99% occurrence coverage in LitCovid, while our computational approach presented average accuracy of 63.33% for all terms, and 84.51% for drugs and chemicals.
Collapse
Affiliation(s)
- Felipe Soares
- Computer Science Department, The University of Sheffield, Western Bank, Sheffield S10 2TN, UK
| | - Yuka Tateisi
- National Bioscience Database Center, Japan Science and Technology Agency, Tokyo 102-8666, Japan
| | - Terue Takatsuki
- Database Center for Life Science, Research Organization of Information and Systems, Kashiwa 277-0871, Japan
| | - Atsuko Yamaguchi
- Graduate School of Integrative Science and Engineering, Tokyo City University, Tokyo 158-8557, Japan
| |
Collapse
|
6
|
Holmgren SD, Boyles RR, Cronk RD, Duncan CG, Kwok RK, Lunn RM, Osborn KC, Thessen AE, Schmitt CP. Catalyzing Knowledge-Driven Discovery in Environmental Health Sciences through a Community-Driven Harmonized Language. Int J Environ Res Public Health 2021; 18:8985. [PMID: 34501574 PMCID: PMC8430534 DOI: 10.3390/ijerph18178985] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/07/2021] [Revised: 08/13/2021] [Accepted: 08/19/2021] [Indexed: 01/10/2023]
Abstract
Harmonized language is critical for helping researchers to find data, collecting scientific data to facilitate comparison, and performing pooled and meta-analyses. Using standard terms to link data to knowledge systems facilitates knowledge-driven analysis, allows for the use of biomedical knowledge bases for scientific interpretation and hypothesis generation, and increasingly supports artificial intelligence (AI) and machine learning. Due to the breadth of environmental health sciences (EHS) research and the continuous evolution in scientific methods, the gaps in standard terminologies, vocabularies, ontologies, and related tools hamper the capabilities to address large-scale, complex EHS research questions that require the integration of disparate data and knowledge sources. The results of prior workshops to advance a harmonized environmental health language demonstrate that future efforts should be sustained and grounded in scientific need. We describe a community initiative whose mission was to advance integrative environmental health sciences research via the development and adoption of a harmonized language. The products, outcomes, and recommendations developed and endorsed by this community are expected to enhance data collection and management efforts for NIEHS and the EHS community, making data more findable and interoperable. This initiative will provide a community of practice space to exchange information and expertise, be a coordination hub for identifying and prioritizing activities, and a collaboration platform for the development and adoption of semantic solutions. We encourage anyone interested in advancing this mission to engage in this community.
Collapse
Affiliation(s)
- Stephanie D. Holmgren
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA;
| | | | | | - Christopher G. Duncan
- Genes, Environment, and Health Branch, Division of Extramural Research and Training, NIEHS, Durham, NC 27709, USA;
| | - Richard K. Kwok
- Epidemiology Branch, Division of Intramural Research, NIEHS, Durham, NC 27709, USA;
- Office of the Director, NIEHS, Bethesda, MD 20892, USA
| | - Ruth M. Lunn
- Integrative Health Assessment Branch, Division of the National Toxicology Program, NIEHS, Durham, NC 27709, USA;
| | | | - Anne E. Thessen
- Environmental and Molecular Toxicology Department, Oregon State University, Corvallis, OR 97331, USA;
| | - Charles P. Schmitt
- Office of Data Science, National Institute of Environmental Health Sciences (NIEHS), Durham, NC 27709, USA;
| |
Collapse
|
7
|
Torres FBG, Gomes DC, Hino AAF, Moro C, Cubas MR. Comparison of the Results of Manual and Automated Processes of Cross-Mapping Between Nursing Terms: Quantitative Study. JMIR Nurs 2020; 3:e18501. [PMID: 34345784 PMCID: PMC8293700 DOI: 10.2196/18501] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 04/26/2020] [Accepted: 05/05/2020] [Indexed: 12/05/2022] Open
Abstract
Background Cross-mapping establishes equivalence between terms from different terminology systems, which is useful for interoperability, updated terminological versions, and reuse of terms. Due to the number of terms to be mapped, this work can be extensive, tedious, and thorough, and it is susceptible to errors; this can be minimized by automated processes, which use computational tools. Objective The aim of this study was to compare the results of manual and automated term mapping processes. Methods In this descriptive, quantitative study, we used the results of two mapping processes as an empirical basis: manual, which used 2638 terms of nurses’ records from a university hospital in southern Brazil and the International Classification for Nursing Practice (ICNP); and automated, which used the same university hospital terms and the primitive terms of the ICNP through MappICNP, an algorithm based on rules of natural language processing. The two processes were compared via equality and exclusivity assessments of new terms of the automated process and of candidate terms. Results The automated process mapped 569/2638 (21.56%) of the source bank’s terms as identical, and the manual process mapped 650/2638 (24.63%) as identical. Regarding new terms, the automated process mapped 1031/2638 (39.08%) of the source bank’s terms as new, while the manual process mapped 1251 (47.42%). In particular, manual mapping identified 101/2638 (3.82%) terms as identical and 429 (16.26%) as new, whereas the automated process identified 20 (0.75%) terms as identical and 209 (7.92%) as new. Of the 209 terms mapped as new by the automated process, it was possible to establish an equivalence with ICNP terms in 48 (23.0%) cases. An analysis of the candidate terms offered by the automated process to the 429 new terms mapped exclusively by the manual process resulted in 100 (23.3%) candidates that had a semantic relationship with the source term. Conclusions The automated and manual processes map identical and new terms in similar ways and can be considered complementary. Direct identification of identical terms and the offering of candidate terms through the automated process facilitate and enhance the results of the mapping; confirmation of the precision of the automated mapping requires further analysis by researchers.
Collapse
Affiliation(s)
| | - Denilsen Carvalho Gomes
- Graduate Program in Health Technology Pontificia Universidade Católica do Paraná Curitiba Brazil
| | | | - Claudia Moro
- Graduate Program in Health Technology Pontificia Universidade Católica do Paraná Curitiba Brazil
| | - Marcia Regina Cubas
- Graduate Program in Health Technology Pontificia Universidade Católica do Paraná Curitiba Brazil
| |
Collapse
|
8
|
Ammari M, McCarthy F, Nanduri B. Leveraging Experimental Details for an Improved Understanding of Host-Pathogen Interactome. ACTA ACUST UNITED AC 2019; 61:8.26.1-8.26.12. [PMID: 30040202 DOI: 10.1002/cpbi.44] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
An increasing proportion of curated host-pathogen interaction (HPI) information is becoming available in interaction databases. These data represent detailed, experimentally-verified, molecular interaction data, which may be used to better understand infectious diseases. By their very nature, HPIs are context dependent, where the outcome of two proteins as interacting or not depends on the precise biological conditions studied and approaches used for identifying these interactions. The associated biology and the technical details of the experiments identifying interacting protein molecules are increasing being curated using defined curation standards but are overlooked in current HPI network modeling. Given the increase in data size and complexity, awareness of the process and variables included in HPI identification and curation, and their effect on data analysis and interpretation is crucial in understanding pathogenesis. We describe the use of HPI data for network modeling, aspects of curation that can help researchers to more accurately model specific infection conditions, and provide examples to illustrate these principles. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Mais Ammari
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - Fiona McCarthy
- School of Animal and Comparative Biomedical Sciences, University of Arizona, Tucson, Arizona
| | - Bindu Nanduri
- Institute for Genomics, Biocomputing and Biotechnology, Mississippi State University, Mississippi State, Mississippi.,College of Veterinary Medicine, Mississippi State University, Mississippi State, Mississippi
| |
Collapse
|
9
|
Abstract
Logical Observation Identifiers Names and Codes (LOINC) is the most widely used controlled vocabulary to identify laboratory tests. A given laboratory test can often be reported in more than 1 unit of measure (eg, grams or moles), and LOINC defines unique codes for each unit. Consequently, an identical laboratory test performed by 2 different clinical laboratories may have different LOINC codes. The absence of unit conversions between compatible LOINC codes impedes data aggregation and analysis of laboratory results. To develop such conversions, a computational process was developed to review the LOINC standard for potential conversions, and multiple expert reviewers oversaw and finalized the conversion list. In all, 285 bidirectional conversions were identified, including conversions for routine clinical tests such as sodium, magnesium, and human immunodeficiency virus (HIV). Unit conversions were applied to the aggregation of laboratory test results to demonstrate their usefulness. Diverse informatics projects may benefit from the ability to interconvert compatible results.
Collapse
Affiliation(s)
- Ronald G Hauser
- Veterans Affairs Connecticut Healthcare, West Haven, CT, USA.,Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Douglas B Quine
- Veterans Affairs Connecticut Healthcare, West Haven, CT, USA.,Main Laboratory, Bridgeport Hospital, Bridgeport, CT, USA
| | - Alex Ryder
- Children's Foundation Research Institute, Le Bonheur Children's Hospital, Memphis, TN, USA.,Department of Pediatrics and Department of Pathology, University of Tennessee Health Science Center, Memphis, TN, USA
| | - Sheldon Campbell
- Veterans Affairs Connecticut Healthcare, West Haven, CT, USA.,Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
10
|
Hauser RG, Quine DB, Ryder A. LabRS: A Rosetta stone for retrospective standardization of clinical laboratory test results. J Am Med Inform Assoc 2019; 25:121-126. [PMID: 28505339 DOI: 10.1093/jamia/ocx046] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2016] [Accepted: 04/13/2017] [Indexed: 11/13/2022] Open
Abstract
Objective Clinical laboratories in the United States do not have an explicit result standard to report the 7 billion laboratory tests results they produce each year. The absence of standardized test results creates inefficiencies and ambiguities for secondary data users. We developed and tested a tool to standardize the results of laboratory tests in a large, multicenter clinical data warehouse. Methods Laboratory records, each of which consisted of a laboratory result and a test identifier, from 27 diverse facilities were captured from 2000 through 2015. Each record underwent a standardization process to convert the original result into a format amenable to secondary data analysis. The standardization process included the correction of typos, normalization of categorical results, separation of inequalities from numbers, and conversion of numbers represented by words (eg, "million") to numerals. Quality control included expert review. Results We obtained 1.266 × 109 laboratory records and standardized 1.252 × 109 records (98.9%). Of the unique unstandardized records (78.887 × 103), most appeared <5 times (96%, eg, typos), did not have a test identifier (47%), or belonged to an esoteric test with <100 results (2%). Overall, these 3 reasons accounted for nearly all unstandardized results (98%). Conclusion Current results suggest that the tool is both scalable and generalizable among diverse clinical laboratories. Based on observed trends, the tool will require ongoing maintenance to stay current with new tests and result formats. Future work to develop and implement an explicit standard for test results would reduce the need to retrospectively standardize test results.
Collapse
Affiliation(s)
- Ronald George Hauser
- Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA.,Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Douglas B Quine
- Veterans Affairs Connecticut Healthcare System, West Haven, CT, USA.,Main Laboratory, Bridgeport Hospital, Bridgeport, CT, USA
| | - Alex Ryder
- Children's Foundation Research Institute, Le Bonheur Children's Hospital, Memphis, TN, USA.,Department of Pediatrics and Department of Pathology, University of Tennessee Health Science Center, Memphis, TN, USA
| |
Collapse
|
11
|
Abstract
Close collaboration between specialists from diverse backgrounds and working in different scientific domains is an effective strategy to overcome challenges in areas that interface between biology, chemistry, physics and engineering. Communication in such collaborations can itself be challenging. Even when projects are successfully concluded, resulting publications — necessarily multi-authored — have the potential to be disjointed. Few, both in the field and outside, may be able to fully understand the work as a whole. This needs to be addressed to facilitate efficient working, peer review, accessibility and impact to larger audiences. We are an interdisciplinary team working in a nascent scientific area, the repurposing of DNA as a storage medium for digital information. In this note, we highlight some of the difficulties that arise from such collaborations and outline our efforts to improve communication through a glossary and a controlled vocabulary and accessibility via short plain-language summaries. We hope to stimulate early discussion within this emerging field of how our community might improve the description and presentation of our work to facilitate clear communication within and between research groups and increase accessibility to those not familiar with our respective fields — be it molecular biology, computer science, information theory or others that might become relevant in future. To enable an open and inclusive discussion we have created a glossary and controlled vocabulary as a cloud-based shared document and we invite other scientists to critique our suggestions and contribute their own ideas.
Collapse
Affiliation(s)
| | - Jossy Sayir
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| |
Collapse
|
12
|
Zheng S, Lu JJ, Ghasemzadeh N, Hayek SS, Quyyumi AA, Wang F. Effective Information Extraction Framework for Heterogeneous Clinical Reports Using Online Machine Learning and Controlled Vocabularies. JMIR Med Inform 2017; 5:e12. [PMID: 28487265 PMCID: PMC5442348 DOI: 10.2196/medinform.7235] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2016] [Revised: 03/16/2017] [Accepted: 03/20/2017] [Indexed: 01/15/2023] Open
Abstract
BACKGROUND Extracting structured data from narrated medical reports is challenged by the complexity of heterogeneous structures and vocabularies and often requires significant manual effort. Traditional machine-based approaches lack the capability to take user feedbacks for improving the extraction algorithm in real time. OBJECTIVE Our goal was to provide a generic information extraction framework that can support diverse clinical reports and enables a dynamic interaction between a human and a machine that produces highly accurate results. METHODS A clinical information extraction system IDEAL-X has been built on top of online machine learning. It processes one document at a time, and user interactions are recorded as feedbacks to update the learning model in real time. The updated model is used to predict values for extraction in subsequent documents. Once prediction accuracy reaches a user-acceptable threshold, the remaining documents may be batch processed. A customizable controlled vocabulary may be used to support extraction. RESULTS Three datasets were used for experiments based on report styles: 100 cardiac catheterization procedure reports, 100 coronary angiographic reports, and 100 integrated reports-each combines history and physical report, discharge summary, outpatient clinic notes, outpatient clinic letter, and inpatient discharge medication report. Data extraction was performed by 3 methods: online machine learning, controlled vocabularies, and a combination of these. The system delivers results with F1 scores greater than 95%. CONCLUSIONS IDEAL-X adopts a unique online machine learning-based approach combined with controlled vocabularies to support data extraction for clinical reports. The system can quickly learn and improve, thus it is highly adaptable.
Collapse
Affiliation(s)
- Shuai Zheng
- Department of Biomedical Informatics, Emory University, Atlanta, GA, United States
| | - James J Lu
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, United States
| | - Nima Ghasemzadeh
- Division of Cardiology, Emory School of Medicine, Emory University, Atlanta, GA, United States
| | - Salim S Hayek
- Division of Cardiology, Emory School of Medicine, Emory University, Atlanta, GA, United States
| | - Arshed A Quyyumi
- Division of Cardiology, Emory School of Medicine, Emory University, Atlanta, GA, United States
| | - Fusheng Wang
- Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, United States
| |
Collapse
|
13
|
Jamoulle M, Grosjean J, Resnick M, Ittoo A, Treuherz A, Vander Stichele R, Cardillo E, Darmoni SJ, Shamenek FS, Vanmeerbeek M. A Terminology in General Practice/Family Medicine to Represent Non-Clinical Aspects for Various Usages: The Q-Codes. Stud Health Technol Inform 2017; 235:471-475. [PMID: 28423837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The hereby proposed terminology called "Q-Codes" can be defined as an extension of the International Classification of Primary Care (ICPC-2). It deals with non-clinical concepts that are relevant in General Practice/Family Medicine (GP/FM). This terminology is a good way to put an emphasis on underestimated topics such as Teaching, Patient issues or Ethics. It aims at indexing GP/FM documents such as congress abstracts and theses to get a more comprehensive view about the GP/FM domain. The 182 identified Q-Codes have been very precisely defined by a college of experts (physicians and terminologists) from twelve countries. The result is available on the Health Terminology/Ontology Portal (http://www.hetop.org/Q) and formatted in OWL-2 for further semantic considerations and will be used to index the 2016 WONCA World congress communications.
Collapse
Affiliation(s)
- Marc Jamoulle
- Department of General Practice, University of Liège, Belgium
| | - Julien Grosjean
- Département d'Information et d'Informatique Médicale, University of Rouen, France
| | - Melissa Resnick
- University of Texas, Health Science Center at Houston, TX USA
| | - Ashwin Ittoo
- HEC Management School, University of Liège, Belgium
| | - Arthur Treuherz
- Department of Health Sciences Terminology, BIREME/PAHO/WHO, Sao Paulo, Brazil
| | | | | | - Stéfan J Darmoni
- Département d'Information et d'Informatique Médicale, University of Rouen, France
| | | | | |
Collapse
|
14
|
Park MS, He Z, Chen Z, Oh S, Bian J. Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites. JMIR Med Inform 2016; 4:e41. [PMID: 27884812 PMCID: PMC5146325 DOI: 10.2196/medinform.5748] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Revised: 08/02/2016] [Accepted: 10/22/2016] [Indexed: 11/24/2022] Open
Abstract
Background The widely known terminology gap between health professionals and health consumers hinders effective information seeking for consumers. Objective The aim of this study was to better understand consumers’ usage of medical concepts by evaluating the coverage of concepts and semantic types of the Unified Medical Language System (UMLS) on diabetes-related postings in 2 types of social media: blogs and social question and answer (Q&A). Methods We collected 2 types of social media data: (1) a total of 3711 blogs tagged with “diabetes” on Tumblr posted between February and October 2015; and (2) a total of 58,422 questions and associated answers posted between 2009 and 2014 in the diabetes category of Yahoo! Answers. We analyzed the datasets using a widely adopted biomedical text processing framework Apache cTAKES and its extension YTEX. First, we applied the named entity recognition (NER) method implemented in YTEX to identify UMLS concepts in the datasets. We then analyzed the coverage and the popularity of concepts in the UMLS source vocabularies across the 2 datasets (ie, blogs and social Q&A). Further, we conducted a concept-level comparative coverage analysis between SNOMED Clinical Terms (SNOMED CT) and Open-Access Collaborative Consumer Health Vocabulary (OAC CHV)—the top 2 UMLS source vocabularies that have the most coverage on our datasets. We also analyzed the UMLS semantic types that were frequently observed in our datasets. Results We identified 2415 UMLS concepts from blog postings, 6452 UMLS concepts from social Q&A questions, and 10,378 UMLS concepts from the answers. The medical concepts identified in the blogs can be covered by 56 source vocabularies in the UMLS, while those in questions and answers can be covered by 58 source vocabularies. SNOMED CT was the dominant vocabulary in terms of coverage across all the datasets, ranging from 84.9% to 95.9%. It was followed by OAC CHV (between 73.5% and 80.0%) and Metathesaurus Names (MTH) (between 55.7% and 73.5%). All of the social media datasets shared frequent semantic types such as “Amino Acid, Peptide, or Protein,” “Body Part, Organ, or Organ Component,” and “Disease or Syndrome.” Conclusions Although the 3 social media datasets vary greatly in size, they exhibited similar conceptual coverage among UMLS source vocabularies and the identified concepts showed similar semantic type distributions. As such, concepts that are both frequently used by consumers and also found in professional vocabularies such as SNOMED CT can be suggested to OAC CHV to improve its coverage.
Collapse
Affiliation(s)
- Min Sook Park
- School of Information, Florida State University, Tallahassee, FL, United States
| | - Zhe He
- School of Information, Florida State University, Tallahassee, FL, United States.,Institute for Successful Longevity, Florida State University, Tallahassee, FL, United States
| | - Zhiwei Chen
- Department of Computer Science, Florida State University, Tallahassee, FL, United States
| | - Sanghee Oh
- School of Information, Florida State University, Tallahassee, FL, United States
| | - Jiang Bian
- Department of Health Outcomes and Policy, University of Florida, Gainesville, FL, United States
| |
Collapse
|
15
|
Judkins J, Utecht J, Brochhausen M. Easy Extraction of Terms and Definitions with OWL2TL. CEUR Workshop Proc 2016; 1747:D205. [PMID: 28035214 PMCID: PMC5189984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Facilitating good communication between semantic web specialists and domain experts is necessary to efficient ontology development. This development may be hindered by the fact that domain experts tend to be unfamiliar with tools used to create and edit OWL files. This is true in particular when changes to definitions need to be reviewed as often as multiple times a day. We developed "OWL to Term List" (OWL2TL) with the goal of allowing domain experts to view the terms and definitions of an OWL file organized in a list that is updated each time the OWL file is updated. The tool is available online and currently generates a list of terms, along with additional annotation properties that are chosen by the user, in a format that allows easy copying into a spreadsheet.
Collapse
Affiliation(s)
- John Judkins
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Joseph Utecht
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| | - Mathias Brochhausen
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, USA
| |
Collapse
|
16
|
Abstract
National policies in the United States require the use of standard terminology for data exchange between clinical information systems. However, most electronic health record systems continue to use local and idiosyncratic ways of representing clinical observations. To improve mappings between local terms and standard vocabularies, we sought to make existing mappings (wisdom) from healt care organizations (the Crowd) available to individuals engaged in mapping processes. We developed new functionality to display counts of local terms and organizations that had previously mapped to a given Logical Observation Identifiers Names and Codes (LOINC) code. Further, we enabled users to view the details of those mappings, including local term names and the organizations that create the mappings. Users also would have the capacity to contribute their local mappings to a shared mapping repository. In this article, we describe the new functionality and its availability to implementers who desire resources to make mapping more efficient and effective.
Collapse
Affiliation(s)
- Brian E Dixon
- Richard M. Fairbanks School of Public Health at Indiana University-Purdue University Indianapolis, Regenstrief Institute, Inc., and Center for Health Information and Communication, Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service, Indianapolis, IN
| | - John Hook
- Regenstrief Institute, Inc., Indianapolis, IN
| | - Daniel J Vreeman
- Indiana University School of Medicine, Regenstrief Institute, Inc., Indianapolis, IN
| |
Collapse
|
17
|
Chen ES, Sarkar IN. *informatics: Identifying and Tracking Informatics Sub-Discipline Terms in the Literature. Methods Inf Med 2015; 54:530-9. [PMID: 25998007 DOI: 10.3414/me14-01-0088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Accepted: 04/16/2015] [Indexed: 11/09/2022]
Abstract
OBJECTIVE To identify the breadth of informatics sub-discipline terms used in the literature for enabling subsequent organization and searching by sub-discipline. METHODS Titles in five literature sources were analyzed to extract terms for informatics sub-disciplines: 1) United States (U.S.) Library of Congress Online Catalog, 2) English Wikipedia, 3) U.S. National Library of Medicine (NLM) Catalog, 4) PubMed, and 5) PubMed Central. The extracted terms were combined and standardized with those in four vocabulary sources to create an integrated list: 1) Library of Congress Subject Headings (LCSH), 2) Medical Subject Headings (MeSH), 3) U.S. National Cancer Institute Thesaurus (NCIt), and 4) EMBRACE Data and Methods (EDAM). Searches for terms in titles from each literature source were conducted to obtain frequency counts and start years for characterizing established and potentially emerging sub-disciplines. RESULTS Analysis of 6,949 titles from literature sources and 67 terms from vocabulary sources resulted in an integrated list of 382 terms for informatics sub-disciplines mapped to 292 preferred terms. In the last five decades, "bioinformatics", "medical informatics", "health informatics", "nursing informatics", and "biomedical informatics" were associated with the most literature. In the current decade, potentially emerging sub-disciplines include "disability informatics", "neonatal informatics", and "nanoinformatics" based on literature from the last five years. CONCLUSIONS As the field of informatics continues to expand and advance, keeping up-to-date with historical and current trends will become increasingly challenging. The ability to track the accomplishments and evolution of a particular sub-discipline in the literature could be valuable for supporting informatics research, education, and training.
Collapse
Affiliation(s)
- E S Chen
- Elizabeth S. Chen, PhD, Center for Clinical and Translational Science, 89 Beaumont Avenue, Given Courtyard S356, Burlington, VT 05405, USA, E-mail:
| | | |
Collapse
|
18
|
Abstract
The American Association for Respiratory Care has declared a benchmark for competency in mechanical ventilation that includes the ability to "apply to practice all ventilation modes currently available on all invasive and noninvasive mechanical ventilators." This level of competency presupposes the ability to identify, classify, compare, and contrast all modes of ventilation. Unfortunately, current educational paradigms do not supply the tools to achieve such goals. To fill this gap, we expand and refine a previously described taxonomy for classifying modes of ventilation and explain how it can be understood in terms of 10 fundamental constructs of ventilator technology: (1) defining a breath, (2) defining an assisted breath, (3) specifying the means of assisting breaths based on control variables specified by the equation of motion, (4) classifying breaths in terms of how inspiration is started and stopped, (5) identifying ventilator-initiated versus patient-initiated start and stop events, (6) defining spontaneous and mandatory breaths, (7) defining breath sequences (8), combining control variables and breath sequences into ventilatory patterns, (9) describing targeting schemes, and (10) constructing a formal taxonomy for modes of ventilation composed of control variable, breath sequence, and targeting schemes. Having established the theoretical basis of the taxonomy, we demonstrate a step-by-step procedure to classify any mode on any mechanical ventilator.
Collapse
Affiliation(s)
- Robert L Chatburn
- Respiratory Institute, Cleveland Clinic, Cleveland, Ohio and the Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio.
| | - Mohamad El-Khatib
- Department of Anesthesiology, American University of Beirut Medical Center, Beirut, Lebanon
| | - Eduardo Mireles-Cabodevila
- Respiratory Institute, Cleveland Clinic, Cleveland, Ohio and the Lerner College of Medicine of Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
19
|
Famiglietti ML, Estreicher A, Gos A, Bolleman J, Géhant S, Breuza L, Bridge A, Poux S, Redaschi N, Bougueleret L, Xenarios I. Genetic variations and diseases in UniProtKB/Swiss-Prot: the ins and outs of expert manual curation. Hum Mutat 2014; 35:927-35. [PMID: 24848695 PMCID: PMC4107114 DOI: 10.1002/humu.22594] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2014] [Accepted: 05/09/2014] [Indexed: 11/25/2022]
Abstract
During the last few years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized, and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
Collapse
Affiliation(s)
- Maria Livia Famiglietti
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Hendel RC, Bozkurt B, Fonarow GC, Jacobs JP, Lichtman JH, Smith EE, Tcheng JE, Wang TY, Weintraub WS. ACC/AHA 2013 methodology for developing clinical data standards: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Data Standards. J Am Coll Cardiol 2013; 63:2323-34. [PMID: 24246166 DOI: 10.1016/j.jacc.2013.11.006] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
21
|
Fung KW, Jao CS, Demner-Fushman D. Extracting drug indication information from structured product labels using natural language processing. J Am Med Inform Assoc 2013; 20:482-8. [PMID: 23475786 PMCID: PMC3628062 DOI: 10.1136/amiajnl-2012-001291] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2012] [Revised: 12/28/2012] [Accepted: 02/17/2013] [Indexed: 11/04/2022] Open
Abstract
OBJECTIVE To extract drug indications from structured drug labels and represent the information using codes from standard medical terminologies. MATERIALS AND METHODS We used MetaMap and other publicly available resources to extract information from the indications section of drug labels. Drugs and indications were encoded by RxNorm and UMLS identifiers respectively. A sample was manually reviewed. We also compared the results with two independent information sources: National Drug File-Reference Terminology and the Semantic Medline project. RESULTS A total of 6797 drug labels were processed, resulting in 19 473 unique drug-indication pairs. Manual review of 298 most frequently prescribed drugs by seven physicians showed a recall of 0.95 and precision of 0.77. Inter-rater agreement (Fleiss κ) was 0.713. The precision of the subset of results corroborated by Semantic Medline extractions increased to 0.93. DISCUSSION Correlation of a patient's medical problems and drugs in an electronic health record has been used to improve data quality and reduce medication errors. Authoritative drug indication information is available from drug labels, but not in a format readily usable by computer applications. Our study shows that it is feasible to use publicly available natural language processing resources to extract drug indications from drug labels. The same method can be applied to other sections of the drug label-for example, adverse effects, contraindications. CONCLUSIONS It is feasible to use publicly available natural language processing tools to extract indication information from freely available drug labels. Named entity recognition sources (eg, MetaMap) provide reasonable recall. Combination with other data sources provides higher precision.
Collapse
Affiliation(s)
- Kin Wah Fung
- Lister Hill National Center for Biomedical Communications, National Library of Medicine, US National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
22
|
Abstract
BACKGROUND Ideally each Life Science article should get a 'structured digital abstract'. This is a structured summary of the paper's findings that is both human-verified and machine-readable. But articles can contain a large variety of information types and contextual details that all need to be reconciled with appropriate names, terms and identifiers, which poses a challenge to any curator. Current approaches mostly use tagging or limited entry-forms for semantic encoding. FINDINGS We implemented a 'controlled language' as a more expressive representation method. We studied how usable this format was for wet-lab-biologists that volunteered as curators. We assessed some issues that arise with the usability of ontologies and other controlled vocabularies, for the encoding of structured information by 'untrained' curators. We take a user-oriented viewpoint, and make recommendations that may prove useful for creating a better curation environment: one that can engage a large community of volunteer curators. CONCLUSIONS Entering information in a biocuration environment could improve in expressiveness and user-friendliness, if curators would be enabled to use synonymous and polysemous terms literally, whereby each term stays linked to an identifier.
Collapse
Affiliation(s)
- Steven Vercruysse
- Systems Biology group, Department of Biology, Norwegian University of Science and Technology, Høgskoleringen 5, 7491 Trondheim, Norway
| | - Martin Kuiper
- Systems Biology group, Department of Biology, Norwegian University of Science and Technology, Høgskoleringen 5, 7491 Trondheim, Norway
| |
Collapse
|
23
|
Ovezmyradov G, Lu Q, Göpfert MC. Mining Gene Ontology Data with AGENDA. Bioinform Biol Insights 2012; 6:63-7. [PMID: 22553422 PMCID: PMC3337784 DOI: 10.4137/bbi.s9101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The Gene Ontology (GO) initiative is a collaborative effort that uses controlled vocabularies for annotating genetic information. We here present AGENDA (Application for mining Gene Ontology Data), a novel web-based tool for accessing the GO database. AGENDA allows the user to simultaneously retrieve and compare gene lists linked to different GO terms in diverse species using batch queries, facilitating comparative approaches to genetic information. The web-based application offers diverse search options and allows the user to bookmark, visualize, and download the results. AGENDA is an open source web-based application that is freely available for non-commercial use at the project homepage. URL: http://sourceforge.net/projects/bioagenda.
Collapse
Affiliation(s)
- Guvanch Ovezmyradov
- Department of Cellular Neurobiology, Georg-August-University of Göttingen, Schwann-Schleiden Research Centre for Molecular Cell Biology, Julia-Lermontowa-Weg 3, 37077 Göttingen, Germany
| | | | | |
Collapse
|
24
|
Stewart RF, Edgar H, Tatlock C, Kroth PJ. Developing a standardized cephalometric vocabulary: choices and possible strategies. J Dent Educ 2008; 72:989-97. [PMID: 18768441 PMCID: PMC2755070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The science of cephalometry has been invaluable for guiding orthodontic diagnosis, treatment planning, and outcomes tracking. Though software packages easily calculate most cephalometric measurements, the ability to exchange cephalometric data between software packages is poorly developed. Hindering this effort is the lack of an agreed-upon standard for electronic exchange of cephalometric measurements. Unlike more technological issues, the problem of creating such a standard is one of formalizing decisions already established through historical precedent. Solving this problem will require education, cooperation, and consensus in order to reap the potential improvements to patient care, dental education, and research. The first step in overcoming these remaining issues is awareness. This article reviews those factors that place cephalometric measurements in an excellent position for standardization, outlines those decisions that must be made in order to realize the goal of electronic exchange of cephalometric information, and describes some of the options for these decisions as well as some advantages and disadvantages of each.
Collapse
Affiliation(s)
- Randall F Stewart
- Health Sciences Library and Informatics Center, MSC09 5100, 1 University of New Mexico, Albuquerque, NM 87131-0001, USA.
| | | | | | | |
Collapse
|
25
|
Gehanno JF, Thirion B, Darmoni SJ. Evaluation of meta-concepts for information retrieval in a quality-controlled health gateway. AMIA Annu Symp Proc 2007; 2007:269-273. [PMID: 18693840 PMCID: PMC2655924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/13/2007] [Revised: 07/10/2007] [Accepted: 10/11/2007] [Indexed: 05/26/2023]
Abstract
BACKGROUND CISMeF is a French quality-controlled health gateway that uses the MeSH thesaurus. We introduced two new concepts, metaterms (medical specialty which has semantic links with one or more MeSH terms, subheadings and resource types) and resource types. OBJECTIVE Evaluate precision and recall of metaterms. METHODS We created 16 pairs of queries. Each pair concerned the same topic, but one used metaterms and one MeSH terms. To assess precision, each document retrieved by the query was classified as irrelevant, partly relevant or fully relevant. RESULTS The 16 queries yielded 943 documents for metaterm queries and 139 for MeSH term queries. The recall of MeSH term queries was 0.44 (compared to 1 for metaterm queries) and the precision were identical for MeSH term and metaterm queries. CONCLUSION Metaconcept such as CISMeF metaterms allows a better recall with a similar precision that MeSH terms in a quality controlled health gateway.
Collapse
|