1
|
Thieu T, Maldonado JC, Ho PS, Ding M, Marr A, Brandt D, Newman-Griffis D, Zirikly A, Chan L, Rasch E. A comprehensive study of mobility functioning information in clinical notes: Entity hierarchy, corpus annotation, and sequence labeling. Int J Med Inform 2020; 147:104351. [PMID: 33401169 DOI: 10.1016/j.ijmedinf.2020.104351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 08/10/2020] [Accepted: 11/22/2020] [Indexed: 01/19/2023]
Abstract
BACKGROUND Secondary use of Electronic Health Records (EHRs) has mostly focused on health conditions (diseases and drugs). Function is an important health indicator in addition to morbidity and mortality. Nevertheless, function has been overlooked in accessing patients' health status. The World Health Organization (WHO)'s International Classification of Functioning, Disability and Health (ICF) is considered the international standard for describing and coding function and health states. We pioneer the first comprehensive analysis and identification of functioning concepts in the Mobility domain of the ICF. RESULTS Using physical therapy notes at the National Institutes of Health's Clinical Center, we induced a hierarchical order of mobility-related entities including 5 entities types, 3 relations, 8 attributes, and 33 attribute values. Two domain experts manually curated a gold standard corpus of 14,281 nested entity mentions from 400 clinical notes. Inter-annotator agreement (IAA) of exact matching averaged 92.3 % F1-score on mention text spans, and 96.6 % Cohen's kappa on attributes assignments. A high-performance Ensemble machine learning model for named entity recognition (NER) was trained and evaluated using the gold standard corpus. Average F1-score on exact entity matching of our Ensemble method (84.90 %) outperformed popular NER methods: Conditional Random Field (80.4 %), Recurrent Neural Network (81.82 %), and Bidirectional Encoder Representations from Transformers (82.33 %). CONCLUSIONS The results of this study show that mobility functioning information can be reliably captured from clinical notes once adequate resources are provided for sequence labeling methods. We expect that functioning concepts in other domains of the ICF can be identified in similar fashion.
Collapse
Affiliation(s)
- Thanh Thieu
- Oklahoma State University, Stillwater, OK, United States.
| | | | - Pei-Shu Ho
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Min Ding
- National Institute of Standards and Technology, Gaithersburg, MD, United States
| | - Alex Marr
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Diane Brandt
- Social Security Advisory Board, Washington, DC, United States
| | - Denis Newman-Griffis
- National Institutes of Health Clinical Center, Bethesda, MD, United States; Ohio State University, Columbus, OH, United States
| | - Ayah Zirikly
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Leighton Chan
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Elizabeth Rasch
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| |
Collapse
|
2
|
Doing-Harris K, Bray BE, Thackeray A, Shah RU, Shao Y, Cheng Y, Zeng-Treitler Q, Garvin JH, Weir C. Development of a cardiac-centered frailty ontology. J Biomed Semantics 2019; 10:3. [PMID: 30658684 PMCID: PMC6339414 DOI: 10.1186/s13326-019-0195-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2018] [Accepted: 01/01/2019] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND A Cardiac-centered Frailty Ontology can be an important foundation for using NLP to assess patient frailty. Frailty is an important consideration when making patient treatment decisions, particularly in older adults, those with a cardiac diagnosis, or when major surgery is a consideration. Clinicians often report patient's frailty in progress notes and other documentation. Frailty is recorded in many different ways in patient records and many different validated frailty-measuring instruments are available, with little consistency across instruments. We specifically explored concepts relevant to decisions regarding cardiac interventions. We based our work on text found in a large corpus of clinical notes from the Department of Veterans Affairs (VA) national Electronic Health Record (EHR) database. RESULTS The full ontology has 156 concepts, with 246 terms. It includes 86 concepts we expect to find in clinical documents, with 12 qualifier values. The remaining 58 concepts represent hierarchical groups (e.g., physical function findings). Our top-level class is clinical finding, which has children clinical history finding, instrument finding, and physical examination finding, reflecting the OGMS definition of clinical finding. Instrument finding is any score found for the existing frailty instruments. Within our ontology, we used SNOMED-CT concepts where possible. Some of the 86 concepts we expect to find in clinical documents are associated with the properties like ability interpretation. The concept ability to walk can either be able, assisted or unable. Each concept-property level pairing gets a different frailty score. Each scored concept received three scores: a frailty score, a relevance to cardiac decisions score, and a likelihood of resolving after the recommended intervention score. The ontology includes the relationship between scores from ten frailty instruments and frailty as assessed using ontology concepts. It also included rules for mapping ontology elements to instrument items for three common frailty assessment instruments. Ontology elements are used in two clinical NLP systems. CONCLUSIONS We developed and validated a Cardiac-centered Frailty Ontology, which is a machine-interoperable description of frailty that reflects all the areas that clinicians consider when deciding which cardiac intervention will best serve the patient as well as frailty indications generally relevant to medical decisions. The ontology owl file is available on Bioportal at http://bioportal.bioontology.org/ontologies/CCFO .
Collapse
Affiliation(s)
| | - Bruce E. Bray
- Division of Cardiovascular Medicine, University of Utah, Salt Lake City, UT USA
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
| | - Anne Thackeray
- Physical Therapy and Athletic Training Department, University of Utah, Salt Lake City, UT USA
| | - Rashmee U. Shah
- Division of Cardiovascular Medicine, University of Utah, Salt Lake City, UT USA
| | - Yijun Shao
- Medical Informatics Center, George Washington University, Washington DC, USA
| | - Yan Cheng
- Medical Informatics Center, George Washington University, Washington DC, USA
| | - Qing Zeng-Treitler
- Medical Informatics Center, George Washington University, Washington DC, USA
| | - Jennifer H. Garvin
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- VA Healthcare System, Salt Lake City, UT USA
| | - Charlene Weir
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT USA
- VA Healthcare System, Salt Lake City, UT USA
| |
Collapse
|
3
|
Schwanke J, Rienhoff O, Schulze TG, Nussbeck SY. Suitability of customer relationship management systems for the management of study participants in biomedical research. Methods Inf Med 2013; 52:340-50. [PMID: 23877579 DOI: 10.3414/me12-02-0012] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 04/15/2013] [Indexed: 01/26/2023]
Abstract
BACKGROUND Longitudinal biomedical research projects study patients or participants over a course of time. No IT solution is known that can manage study participants, enhance quality of data, support re-contacting of participants, plan study visits, and keep track of informed consent procedures and recruitments that may be subject to change over time. In business settings management of personal is one of the major aspects of customer relationship management systems (CRMS). OBJECTIVES To evaluate whether CRMS are suitable IT solutions for study participant management in biomedical research. METHODS Three boards of experts in the field of biomedical research were consulted to get an insight into recent IT developments regarding study participant management systems (SPMS). Subsequently, a requirements analysis was performed with stakeholders of a major biomedical research project. The successive suitability evaluation was based on the comparison of the identified requirements with the features of six CRMS. RESULTS Independently of each other, the interviewed expert boards confirmed that there is no generic IT solution for the management of participants. Sixty-four requirements were identified and prioritized in a requirements analysis. The best CRMS was able to fulfill forty-two of these requirements. The non-fulfilled requirements demand an adaption of the CRMS, consuming time and resources, reducing the update compatibility, the system's suitability, and the security of the CRMS. CONCLUSIONS A specific solution for the SPMS is favored instead of a generic and commercially-oriented CRMS. Therefore, the development of a small and specific SPMS solution was commenced and is currently on the way to completion.
Collapse
Affiliation(s)
- J Schwanke
- University Medical Center Göttingen, Department of Medical Informatics, Georg-August-University, Department of Medical Informatics, Göttingen, Germany.
| | | | | | | |
Collapse
|
4
|
Pyysalo S, Ohta T, Miwa M, Cho HC, Tsujii J, Ananiadou S. Event extraction across multiple levels of biological organization. ACTA ACUST UNITED AC 2013; 28:i575-i581. [PMID: 22962484 PMCID: PMC3436834 DOI: 10.1093/bioinformatics/bts407] [Citation(s) in RCA: 61] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Motivation: Event extraction using expressive structured representations has been a significant focus of recent efforts in biomedical information extraction. However, event extraction resources and methods have so far focused almost exclusively on molecular-level entities and processes, limiting their applicability. Results: We extend the event extraction approach to biomedical information extraction to encompass all levels of biological organization from the molecular to the whole organism. We present the ontological foundations, target types and guidelines for entity and event annotation and introduce the new multi-level event extraction (MLEE) corpus, manually annotated using a structured representation for event extraction. We further adapt and evaluate named entity and event extraction methods for the new task, demonstrating that both can be achieved with performance broadly comparable with that for established molecular entity and event extraction tasks. Availability: The resources and methods introduced in this study are available from http://nactem.ac.uk/MLEE/. Contact:pyysalos@cs.man.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sampo Pyysalo
- National Centre for Text Mining and School of Computer Science, University of Manchester, Manchester, UK.
| | | | | | | | | | | |
Collapse
|
5
|
Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner WA, Cohen KB, Verspoor K, Blake JA, Hunter LE. Concept annotation in the CRAFT corpus. BMC Bioinformatics 2012; 13:161. [PMID: 22776079 PMCID: PMC3476437 DOI: 10.1186/1471-2105-13-161] [Citation(s) in RCA: 109] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Accepted: 06/08/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Manually annotated corpora are critical for the training and evaluation of automated methods to identify concepts in biomedical text. RESULTS This paper presents the concept annotations of the Colorado Richly Annotated Full-Text (CRAFT) Corpus, a collection of 97 full-length, open-access biomedical journal articles that have been annotated both semantically and syntactically to serve as a research resource for the biomedical natural-language-processing (NLP) community. CRAFT identifies all mentions of nearly all concepts from nine prominent biomedical ontologies and terminologies: the Cell Type Ontology, the Chemical Entities of Biological Interest ontology, the NCBI Taxonomy, the Protein Ontology, the Sequence Ontology, the entries of the Entrez Gene database, and the three subontologies of the Gene Ontology. The first public release includes the annotations for 67 of the 97 articles, reserving two sets of 15 articles for future text-mining competitions (after which these too will be released). Concept annotations were created based on a single set of guidelines, which has enabled us to achieve consistently high interannotator agreement. CONCLUSIONS As the initial 67-article release contains more than 560,000 tokens (and the full set more than 790,000 tokens), our corpus is among the largest gold-standard annotated biomedical corpora. Unlike most others, the journal articles that comprise the corpus are drawn from diverse biomedical disciplines and are marked up in their entirety. Additionally, with a concept-annotation count of nearly 100,000 in the 67-article subset (and more than 140,000 in the full collection), the scale of conceptual markup is also among the largest of comparable corpora. The concept annotations of the CRAFT Corpus have the potential to significantly advance biomedical text mining by providing a high-quality gold standard for NLP systems. The corpus, annotation guidelines, and other associated resources are freely available at http://bionlp-corpora.sourceforge.net/CRAFT/index.shtml.
Collapse
Affiliation(s)
- Michael Bada
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Miriam Eckert
- Department of Linguistics, University of Colorado Boulder, Boulder, CO, USA
| | - Donald Evans
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kristin Garcia
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Krista Shipley
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Dmitry Sitnikov
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | - William A Baumgartner
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - K Bretonnel Cohen
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Karin Verspoor
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
- Victoria Research Lab, National ICT Australia, Melbourne, VIC, 3010, Australia
| | - Judith A Blake
- Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, ME, USA
| | - Lawrence E Hunter
- Department of Pharmacology, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| |
Collapse
|
6
|
Thomas DG, Klaessig F, Harper SL, Fritts M, Hoover MD, Gaheen S, Stokes TH, Reznik-Zellen R, Freund ET, Klemm JD, Paik DS, Baker NA. Informatics and standards for nanomedicine technology. WILEY INTERDISCIPLINARY REVIEWS. NANOMEDICINE AND NANOBIOTECHNOLOGY 2011; 3:511-532. [PMID: 21721140 PMCID: PMC3189420 DOI: 10.1002/wnan.152] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
There are several issues to be addressed concerning the management and effective use of information (or data), generated from nanotechnology studies in biomedical research and medicine. These data are large in volume, diverse in content, and are beset with gaps and ambiguities in the description and characterization of nanomaterials. In this work, we have reviewed three areas of nanomedicine informatics: information resources; taxonomies, controlled vocabularies, and ontologies; and information standards. Informatics methods and standards in each of these areas are critical for enabling collaboration; data sharing; unambiguous representation and interpretation of data; semantic (meaningful) search and integration of data; and for ensuring data quality, reliability, and reproducibility. In particular, we have considered four types of information standards in this article, which are standard characterization protocols, common terminology standards, minimum information standards, and standard data communication (exchange) formats. Currently, because of gaps and ambiguities in the data, it is also difficult to apply computational methods and machine learning techniques to analyze, interpret, and recognize patterns in data that are high dimensional in nature, and also to relate variations in nanomaterial properties to variations in their chemical composition, synthesis, characterization protocols, and so on. Progress toward resolving the issues of information management in nanomedicine using informatics methods and standards discussed in this article will be essential to the rapidly growing field of nanomedicine informatics.
Collapse
Affiliation(s)
- Dennis G. Thomas
- Knowledge Discovery and Informatics Group, Pacific Northwest National Laboratory.
| | | | - Stacey L. Harper
- Environmental and Molecular Toxicology & School of Chemical, Biological and Environmental Engineering. Oregon State University.
| | | | | | | | - Todd H. Stokes
- Department of Biomedical Engineering, Emory University and Georgia Tech.
| | | | | | - Juli D. Klemm
- Center for Biomedical Informatics and Information Technology, National Cancer Institute.
| | - David S. Paik
- Radiological Sciences Laboratory, Stanford University.
| | - Nathan A. Baker
- Pacific Northwest National Laboratory, 902 Battelle Blvd. P.O. Box 999, MSIN K7-28, Richland, WA 99352 USA
| |
Collapse
|
7
|
Smith B, Scheuermann RH. Ontologies for clinical and translational research: Introduction. J Biomed Inform 2011; 44:3-7. [PMID: 21241822 DOI: 10.1016/j.jbi.2011.01.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 01/05/2011] [Accepted: 01/08/2011] [Indexed: 10/18/2022]
|