1
|
Kaelin VC, Bosak DL, Saluja S, Newman-Griffis D, Boyd AD, Khetani MA. Representation of child and youth participation within the Unified Medical Language System (UMLS). Disabil Rehabil 2024:1-6. [PMID: 38596871 DOI: 10.1080/09638288.2024.2338191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 03/28/2024] [Indexed: 04/11/2024]
Abstract
PURPOSE To examine (1) how much participation is represented in the benchmark Unified Medical Language System (UMLS) resource, and (2) to what extent that representation reflects the definition of child and youth participation and/or its related constructs per the family of Participation-Related Constructs framework. MATERIALS AND METHODS We searched and analysed UMLS concepts related to the term "participation." Identified UMLS concepts were rated according to their representation of participation (i.e., attendance, involvement, both) as well as participation-related constructs using deductive content analysis. RESULTS 363 UMLS concepts were identified. Of those, 68 had at least one English definition, resulting in 81 definitions that were further analysed. Results revealed 2 definitions (2/81; 3%; 2/68 UMLS concepts) representing participation "attendance" and 18 definitions (18/81; 22%; 14/68 UMLS concepts) representing participation "involvement." No UMLS concept definition represented both attendance and involvement (i.e., participation). Most of the definitions (11/20; 55%; 9/16 UMLS concepts) representing attendance or involvement also represent a participation-related construct. CONCLUSION(S) The representation of participation within the UMLS is limited and poorly aligned with the contemporary definition of child and youth participation. Expanding ontological resources to represent child and youth participation is needed to enable better data analytics that reflect contemporary paediatric rehabilitation practice.
Collapse
Affiliation(s)
- Vera C Kaelin
- Department of Occupational Therapy, University of Illinois Chicago, Chicago, IL, USA
- Children's Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL, USA
- Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA
- Department of Computing Science, Umeå University, Umeå, Sweden
| | - Dianna L Bosak
- Children's Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL, USA
| | - Shivani Saluja
- Children's Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL, USA
| | | | - Andrew D Boyd
- Department of Biomedical and Health Information Sciences, University of Illinois Chicago, Chicago, IL, USA
| | - Mary A Khetani
- Department of Occupational Therapy, University of Illinois Chicago, Chicago, IL, USA
- Children's Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL, USA
- CanChild Centre for Childhood Disability Research, McMaster University, Hamilton, CA, USA
| |
Collapse
|
2
|
Penn JA, Newman-Griffis D. Half the picture: Word frequencies reveal racial differences in clinical documentation, but not their causes. AMIA Annu Symp Proc 2022; 2022:386-395. [PMID: 35854748 PMCID: PMC9285139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Clinical notes are the best record of a provider's perceptions of their patients, but their use in studying racial bias in clinical documentation has typically been limited to manual evaluation of small datasets. We investigated the use of computational methods to scale these insights to large, heterogeneous clinical text data. We found significant differences in negative emotional tone and language implying social dominance in clinical notes between Black and White patients, but identified multiple contributing factors in addition to potential provider bias, including mis-categorization of some healthcare vocabulary as emotion-related. We further found that notes for Black patients were significantly less likely to mention opioids than for White patients, potentially reflecting both inequitable access to medication and provider bias. Our analysis showed that computational tools have significant potential for studying racial bias in large clinical corpora, and identified key challenges to providing a nuanced analysis of bias in clinical documentation.
Collapse
|
3
|
Newman-Griffis D, Camacho Maldonado J, Ho PS, Sacco M, Jimenez Silva R, Porcino J, Chan L. Linking Free Text Documentation of Functioning and Disability to the ICF With Natural Language Processing. Front Rehabilit Sci 2021; 2. [PMID: 35694445 PMCID: PMC9180751 DOI: 10.3389/fresc.2021.742702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Background: Invaluable information on patient functioning and the complex interactions that define it is recorded in free text portions of the Electronic Health Record (EHR). Leveraging this information to improve clinical decision-making and conduct research requires natural language processing (NLP) technologies to identify and organize the information recorded in clinical documentation. Methods: We used natural language processing methods to analyze information about patient functioning recorded in two collections of clinical documents pertaining to claims for federal disability benefits from the U.S. Social Security Administration (SSA). We grounded our analysis in the International Classification of Functioning, Disability, and Health (ICF), and used the Activities and Participation domain of the ICF to classify information about functioning in three key areas: mobility, self-care, and domestic life. After annotating functional status information in our datasets through expert clinical review, we trained machine learning-based NLP models to automatically assign ICF categories to mentions of functional activity. Results: We found that rich and diverse information on patient functioning was documented in the free text records. Annotation of 289 documents for Mobility information yielded 2,455 mentions of Mobility activities and 3,176 specific actions corresponding to 13 ICF-based categories. Annotation of 329 documents for Self-Care and Domestic Life information yielded 3,990 activity mentions and 4,665 specific actions corresponding to 16 ICF-based categories. NLP systems for automated ICF coding achieved over 80% macro-averaged F-measure on both datasets, indicating strong performance across all ICF categories used. Conclusions: Natural language processing can help to navigate the tradeoff between flexible and expressive clinical documentation of functioning and standardizable data for comparability and learning. The ICF has practical limitations for classifying functional status information in clinical documentation but presents a valuable framework for organizing the information recorded in health records about patient functioning. This study advances the development of robust, ICF-based NLP technologies to analyze information on patient functioning and has significant implications for NLP-powered analysis of functional status information in disability benefits management, clinical care, and research.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
- *Correspondence: Denis Newman-Griffis
| | - Jonathan Camacho Maldonado
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Pei-Shu Ho
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Maryanne Sacco
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Rafael Jimenez Silva
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Julia Porcino
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Leighton Chan
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| |
Collapse
|
4
|
Newman-Griffis D, Divita G, Desmet B, Zirikly A, Rosé CP, Fosler-Lussier E. Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets. J Am Med Inform Assoc 2021; 28:516-532. [PMID: 33319905 DOI: 10.1093/jamia/ocaa269] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 09/13/2020] [Accepted: 11/17/2020] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research. MATERIALS AND METHODS We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language. RESULTS We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. DISCUSSION Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods. CONCLUSIONS Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA.,Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
| | - Guy Divita
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Bart Desmet
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Ayah Zirikly
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA
| | - Carolyn P Rosé
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, Maryland, USA.,Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Eric Fosler-Lussier
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
5
|
Vashishth S, Newman-Griffis D, Joshi R, Dutt R, Rosé CP. Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets. J Biomed Inform 2021; 121:103880. [PMID: 34390853 PMCID: PMC8952339 DOI: 10.1016/j.jbi.2021.103880] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2021] [Revised: 07/31/2021] [Accepted: 07/31/2021] [Indexed: 10/28/2022]
Abstract
OBJECTIVES Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction-extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types. METHODS We experiment with five off-the-shelf biomedical NLP toolkits on four benchmark datasets for medical information extraction from scientific literature and clinical notes. All toolkits adopt a staged approach of mention detection followed by two stages of medical entity linking: (1) generating a list of candidate concepts, and (2) picking the best concept among them. We introduce a semantic type prediction module to alleviate the problem of overgeneration of candidate concepts by filtering out irrelevant candidate concepts based on the predicted semantic type of a mention. We present MedType, a fully modular semantic type prediction model which we integrate into the existing NLP toolkits. To address the dearth of broad-coverage training data for medical information extraction, we further present WikiMed and PubMedDS, two large-scale datasets for medical entity linking. RESULTS Semantic type filtering improves medical entity linking performance across all toolkits and datasets, often by several percentage points of F-1. Further, pretraining MedType on our novel datasets achieves state-of-the-art performance for semantic type prediction in biomedical text. CONCLUSIONS Semantic type prediction is a key part of building accurate NLP pipelines for broad-coverage information extraction from biomedical text. We make our source code and novel datasets publicly available to foster reproducible research.
Collapse
Affiliation(s)
| | | | - Rishabh Joshi
- Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA
| | - Ritam Dutt
- Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA
| | - Carolyn P Rosé
- Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Zirikly A, Desmet B, Newman-Griffis D, Marfeo EE, McDonough C, Goldman H, Chan L. Viewpoint: An Information Extraction Framework for Disability Determination Using a Mental Functioning Use-Case (Preprint). JMIR Med Inform 2021; 10:e32245. [PMID: 35302510 PMCID: PMC8976250 DOI: 10.2196/32245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/08/2021] [Accepted: 01/16/2022] [Indexed: 01/08/2023] Open
Abstract
Natural language processing (NLP) in health care enables transformation of complex narrative information into high value products such as clinical decision support and adverse event monitoring in real time via the electronic health record (EHR). However, information technologies for mental health have consistently lagged because of the complexity of measuring and modeling mental health and illness. The use of NLP to support management of mental health conditions is a viable topic that has not been explored in depth. This paper provides a framework for the advanced application of NLP methods to identify, extract, and organize information on mental health and functioning to inform the decision-making process applied to assessing mental health. We present a use-case related to work disability, guided by the disability determination process of the US Social Security Administration (SSA). From this perspective, the following questions must be addressed about each problem that leads to a disability benefits claim: When did the problem occur and how long has it existed? How severe is it? Does it affect the person’s ability to work? and What is the source of the evidence about the problem? Our framework includes 4 dimensions of medical information that are central to assessing disability—temporal sequence and duration, severity, context, and information source. We describe key aspects of each dimension and promising approaches for application in mental functioning. For example, to address temporality, a complete functional timeline must be created with all relevant aspects of functioning such as intermittence, persistence, and recurrence. Severity of mental health symptoms can be successfully identified and extracted on a 4-level ordinal scale from absent to severe. Some NLP work has been reported on the extraction of context for specific cases of wheelchair use in clinical settings. We discuss the links between the task of information source assessment and work on source attribution, coreference resolution, event extraction, and rule-based methods. Gaps were identified in NLP applications that directly applied to the framework and in existing relevant annotated data sets. We highlighted NLP methods with the potential for advanced application in the field of mental functioning. Findings of this work will inform the development of instruments for supporting SSA adjudicators in their disability determination process. The 4 dimensions of medical information may have relevance for a broad array of individuals and organizations responsible for assessing mental health function and ability. Further, our framework with 4 specific dimensions presents significant opportunity for the application of NLP in the realm of mental health and functioning beyond the SSA setting, and it may support the development of robust tools and methods for decision-making related to clinical care, program implementation, and other outcomes.
Collapse
Affiliation(s)
- Ayah Zirikly
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States
| | - Bart Desmet
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| | - Denis Newman-Griffis
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Elizabeth E Marfeo
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Occupational Therapy, Tufts University, Medford, MA, United States
| | - Christine McDonough
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- School of Health and Rehabilitation Science, University of Pittsburgh, Pittsburgh, PA, United States
| | - Howard Goldman
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Psychiatry, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Leighton Chan
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
7
|
Newman-Griffis D, Sivaraman V, Perer A, Fosler-Lussier E, Hochheiser H. TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora. Proc Conf 2021; 2021:106-115. [PMID: 34151319 PMCID: PMC8212692] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.
Collapse
Affiliation(s)
| | | | - Adam Perer
- Human-Computer Interaction Institute, Carnegie Mellon University
| | | | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh
- Intelligent Systems Program, University of Pittsburgh
| |
Collapse
|
8
|
Newman-Griffis D, Lehman JF, Rosé C, Hochheiser H. Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research. Proc Conf 2021; 2021:4125-4138. [PMID: 34179899 PMCID: PMC8223521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.
Collapse
Affiliation(s)
| | - Jill Fain Lehman
- Human-Computer Interaction Institute, Carnegie Mellon University, USA
| | - Carolyn Rosé
- Language Technologies Institute, Carnegie Mellon University, USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, USA
| |
Collapse
|
9
|
Newman-Griffis D, Fosler-Lussier E. Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health. Front Digit Health 2021; 3:620828. [PMID: 33791684 PMCID: PMC8009547 DOI: 10.3389/fdgth.2021.620828] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
- Epidemiology & Biostatistics Section, Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Eric Fosler-Lussier
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
10
|
Thieu T, Maldonado JC, Ho PS, Ding M, Marr A, Brandt D, Newman-Griffis D, Zirikly A, Chan L, Rasch E. A comprehensive study of mobility functioning information in clinical notes: Entity hierarchy, corpus annotation, and sequence labeling. Int J Med Inform 2020; 147:104351. [PMID: 33401169 DOI: 10.1016/j.ijmedinf.2020.104351] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 08/10/2020] [Accepted: 11/22/2020] [Indexed: 01/19/2023]
Abstract
BACKGROUND Secondary use of Electronic Health Records (EHRs) has mostly focused on health conditions (diseases and drugs). Function is an important health indicator in addition to morbidity and mortality. Nevertheless, function has been overlooked in accessing patients' health status. The World Health Organization (WHO)'s International Classification of Functioning, Disability and Health (ICF) is considered the international standard for describing and coding function and health states. We pioneer the first comprehensive analysis and identification of functioning concepts in the Mobility domain of the ICF. RESULTS Using physical therapy notes at the National Institutes of Health's Clinical Center, we induced a hierarchical order of mobility-related entities including 5 entities types, 3 relations, 8 attributes, and 33 attribute values. Two domain experts manually curated a gold standard corpus of 14,281 nested entity mentions from 400 clinical notes. Inter-annotator agreement (IAA) of exact matching averaged 92.3 % F1-score on mention text spans, and 96.6 % Cohen's kappa on attributes assignments. A high-performance Ensemble machine learning model for named entity recognition (NER) was trained and evaluated using the gold standard corpus. Average F1-score on exact entity matching of our Ensemble method (84.90 %) outperformed popular NER methods: Conditional Random Field (80.4 %), Recurrent Neural Network (81.82 %), and Bidirectional Encoder Representations from Transformers (82.33 %). CONCLUSIONS The results of this study show that mobility functioning information can be reliably captured from clinical notes once adequate resources are provided for sequence labeling methods. We expect that functioning concepts in other domains of the ICF can be identified in similar fashion.
Collapse
Affiliation(s)
- Thanh Thieu
- Oklahoma State University, Stillwater, OK, United States.
| | | | - Pei-Shu Ho
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Min Ding
- National Institute of Standards and Technology, Gaithersburg, MD, United States
| | - Alex Marr
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Diane Brandt
- Social Security Advisory Board, Washington, DC, United States
| | - Denis Newman-Griffis
- National Institutes of Health Clinical Center, Bethesda, MD, United States; Ohio State University, Columbus, OH, United States
| | - Ayah Zirikly
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Leighton Chan
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Elizabeth Rasch
- National Institutes of Health Clinical Center, Bethesda, MD, United States
| |
Collapse
|
11
|
Newman-Griffis D, Porcino J, Zirikly A, Thieu T, Camacho Maldonado J, Ho PS, Ding M, Chan L, Rasch E. Broadening horizons: the case for capturing function and the role of health informatics in its use. BMC Public Health 2019; 19:1288. [PMID: 31615472 PMCID: PMC6794808 DOI: 10.1186/s12889-019-7630-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2018] [Accepted: 09/16/2019] [Indexed: 12/18/2022] Open
Abstract
Background Human activity and the interaction between health conditions and activity is a critical part of understanding the overall function of individuals. The World Health Organization’s International Classification of Functioning, Disability and Health (ICF) models function as all aspects of an individual’s interaction with the world, including organismal concepts such as individual body structures, functions, and pathologies, as well as the outcomes of the individual’s interaction with their environment, referred to as activity and participation. Function, particularly activity and participation outcomes, is an important indicator of health at both the level of an individual and the population level, as it is highly correlated with quality of life and a critical component of identifying resource needs. Since it reflects the cumulative impact of health conditions on individuals and is not disease specific, its use as a health indicator helps to address major barriers to holistic, patient-centered care that result from multiple, and often competing, disease specific interventions. While the need for better information on function has been widely endorsed, this has not translated into its routine incorporation into modern health systems. Purpose We present the importance of capturing information on activity as a core component of modern health systems and identify specific steps and analytic methods that can be used to make it more available to utilize in improving patient care. We identify challenges in the use of activity and participation information, such as a lack of consistent documentation and diversity of data specificity and representation across providers, health systems, and national surveys. We describe how activity and participation information can be more effectively captured, and how health informatics methodologies, including natural language processing (NLP), can enable automatically locating, extracting, and organizing this information on a large scale, supporting standardization and utilization with minimal additional provider burden. We examine the analytic requirements and potential challenges of capturing this information with informatics, and describe how data-driven techniques can combine with common standards and documentation practices to make activity and participation information standardized and accessible for improving patient care. Recommendations We recommend four specific actions to improve the capture and analysis of activity and participation information throughout the continuum of care: (1) make activity and participation annotation standards and datasets available to the broader research community; (2) define common research problems in automatically processing activity and participation information; (3) develop robust, machine-readable ontologies for function that describe the components of activity and participation information and their relationships; and (4) establish standards for how and when to document activity and participation status during clinical encounters. We further provide specific short-term goals to make significant progress in each of these areas within a reasonable time frame.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA. .,Department of Computer Science and Engineering, The Ohio State University, 2015 Neil Avenue, DL 395, Columbus, OH, 43210, USA.
| | - Julia Porcino
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| | - Ayah Zirikly
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| | - Thanh Thieu
- Department of Computer Science, Oklahoma State University, 116-A MSCS, Stillwater, OK, 74078, USA
| | - Jonathan Camacho Maldonado
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| | - Pei-Shu Ho
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| | - Min Ding
- Information Technology Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA
| | - Leighton Chan
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| | - Elizabeth Rasch
- Rehabilitation Medicine Department, National Institutes of Health, Mark O. Hatfield Clinical Research Center, 6707 Democracy Boulevard, Suite 856, MSC 5493, Bethesda, MD, 20892, USA
| |
Collapse
|