1
|
Wieland-Jorna Y, van Kooten D, Verheij RA, de Man Y, Francke AL, Oosterveld-Vlug MG. Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review. JAMIA Open 2024; 7:ooae044. [PMID: 38798774 PMCID: PMC11126158 DOI: 10.1093/jamiaopen/ooae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 03/21/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
Objective Natural language processing (NLP) can enhance research on activities of daily living (ADL) by extracting structured information from unstructured electronic health records (EHRs) notes. This review aims to give insight into the state-of-the-art, usability, and performance of NLP systems to extract information on ADL from EHRs. Materials and Methods A systematic review was conducted based on searches in Pubmed, Embase, Cinahl, Web of Science, and Scopus. Studies published between 2017 and 2022 were selected based on predefined eligibility criteria. Results The review identified 22 studies. Most studies (65%) used NLP for classifying unstructured EHR data on 1 or 2 ADL. Deep learning, combined with a ruled-based method or machine learning, was the approach most commonly used. NLP systems varied widely in terms of the pre-processing and algorithms. Common performance evaluation methods were cross-validation and train/test datasets, with F1, precision, and sensitivity as the most frequently reported evaluation metrics. Most studies reported relativity high overall scores on the evaluation metrics. Discussion NLP systems are valuable for the extraction of unstructured EHR data on ADL. However, comparing the performance of NLP systems is difficult due to the diversity of the studies and challenges related to the dataset, including restricted access to EHR data, inadequate documentation, lack of granularity, and small datasets. Conclusion This systematic review indicates that NLP is promising for deriving information on ADL from unstructured EHR notes. However, what the best-performing NLP system is, depends on characteristics of the dataset, research question, and type of ADL.
Collapse
Affiliation(s)
- Yvonne Wieland-Jorna
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Daan van Kooten
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Robert A Verheij
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Yvonne de Man
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Anneke L Francke
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Department of Public and Occupational Health, Location Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Postbus 7057, 1007 MB, The Netherlands
| | - Mariska G Oosterveld-Vlug
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| |
Collapse
|
2
|
Li C, Mowery DL, Ma X, Yang R, Vurgun U, Hwang S, Donnelly HK, Bandhey H, Akhtar Z, Senathirajah Y, Sadhu EM, Getzen E, Freda PJ, Long Q, Becich MJ. Realizing the Potential of Social Determinants Data: A Scoping Review of Approaches for Screening, Linkage, Extraction, Analysis and Interventions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.04.24302242. [PMID: 38370703 PMCID: PMC10871446 DOI: 10.1101/2024.02.04.24302242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Social determinants of health (SDoH) like socioeconomics and neighborhoods strongly influence outcomes, yet standardized SDoH data is lacking in electronic health records (EHR), limiting research and care quality. Methods We searched PubMed using keywords "SDOH" and "EHR", underwent title/abstract and full-text screening. Included records were analyzed under five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions. Results We identified 685 articles, of which 324 underwent full review. Key findings include tailored screening instruments implemented across settings, census and claims data linkage providing contextual SDoH profiles, rule-based and neural network systems extracting SDoH from notes using NLP, connections found between SDoH data and healthcare utilization/chronic disease control, and integrated care management programs executed. However, considerable variability persists across data sources, tools, and outcomes. Discussion Despite progress identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical to fulfill the potential of SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Collapse
Affiliation(s)
- Chenyu Li
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Danielle L. Mowery
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Xiaomeng Ma
- University of Toronto, Institute of Health Policy Management and Evaluations
| | - Rui Yang
- Duke-NUS Medical School, Centre for Quantitative Medicine
| | - Ugurcan Vurgun
- University of Pennsylvania, Institute for Biomedical Informatics
| | - Sy Hwang
- University of Pennsylvania, Institute for Biomedical Informatics
| | | | - Harsh Bandhey
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Zohaib Akhtar
- Northwestern University, Kellogg School of Management
| | - Yalini Senathirajah
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Eugene Mathew Sadhu
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Emily Getzen
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Philip J Freda
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Qi Long
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Michael J. Becich
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| |
Collapse
|
3
|
Kaelin VC, Boyd AD, Werler MM, Parde N, Khetani MA. Natural Language Processing to Classify Caregiver Strategies Supporting Participation Among Children and Youth with Craniofacial Microsomia and Other Childhood-Onset Disabilities. JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:480-500. [PMID: 37927374 PMCID: PMC10620347 DOI: 10.1007/s41666-023-00149-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 07/18/2023] [Accepted: 08/29/2023] [Indexed: 11/07/2023]
Abstract
Customizing participation-focused pediatric rehabilitation interventions is an important but also complex and potentially resource intensive process, which may benefit from automated and simplified steps. This research aimed at applying natural language processing to develop and identify a best performing predictive model that classifies caregiver strategies into participation-related constructs, while filtering out non-strategies. We created a dataset including 1,576 caregiver strategies obtained from 236 families of children and youth (11-17 years) with craniofacial microsomia or other childhood-onset disabilities. These strategies were annotated to four participation-related constructs and a non-strategy class. We experimented with manually created features (i.e., speech and dependency tags, predefined likely sets of words, dense lexicon features (i.e., Unified Medical Language System (UMLS) concepts)) and three classical methods (i.e., logistic regression, naïve Bayes, support vector machines (SVM)). We tested a series of binary and multinomial classification tasks applying 10-fold cross-validation on the training set (80%) to test the best performing model on the held-out test set (20%). SVM using term frequency-inverse document frequency (TF-IDF) was the best performing model for all four classification tasks, with accuracy ranging from 78.10 to 94.92% and a macro-averaged F1-score ranging from 0.58 to 0.83. Manually created features only increased model performance when filtering out non-strategies. Results suggest pipelined classification tasks (i.e., filtering out non-strategies; classification into intrinsic and extrinsic strategies; classification into participation-related constructs) for implementation into participation-focused pediatric rehabilitation interventions like Participation and Environment Measure Plus (PEM+) among caregivers who complete the Participation and Environment Measure for Children and Youth (PEM-CY). Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00149-y.
Collapse
Affiliation(s)
- Vera C. Kaelin
- Department of Occupational Therapy, University of Illinois Chicago, 1919 West Taylor Street, Room 316A, Chicago, IL 60612 − 7250 USA
- Department of Computer Science, University of Illinois Chicago, 851 South Morgan Street, Room 1132, Chicago, IL 60607-7042 USA
- Children’s Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL USA
| | - Andrew D. Boyd
- Biomedical and Health Information Sciences, University of Illinois Chicago, Chicago, IL USA
| | | | - Natalie Parde
- Department of Computer Science, University of Illinois Chicago, 851 South Morgan Street, Room 1132, Chicago, IL 60607-7042 USA
- Natural Language Processing Laboratory, University of Illinois Chicago, Chicago, IL USA
| | - Mary A. Khetani
- Department of Occupational Therapy, University of Illinois Chicago, 1919 West Taylor Street, Room 316A, Chicago, IL 60612 − 7250 USA
- Children’s Participation in Environment Research Lab, University of Illinois Chicago, Chicago, IL USA
- CanChild Centre for Childhood Disability Research, McMaster University, Hamilton, ON Canada
| |
Collapse
|
4
|
Fernandes MB, Valizadeh N, Alabsi HS, Quadri SA, Tesh RA, Bucklin AA, Sun H, Jain A, Brenner LN, Ye E, Ge W, Collens SI, Lin S, Das S, Robbins GK, Zafar SF, Mukerji SS, Westover MB. Classification of neurologic outcomes from medical notes using natural language processing. EXPERT SYSTEMS WITH APPLICATIONS 2023; 214:119171. [PMID: 36865787 PMCID: PMC9974159 DOI: 10.1016/j.eswa.2022.119171] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Neurologic disability level at hospital discharge is an important outcome in many clinical research studies. Outside of clinical trials, neurologic outcomes must typically be extracted by labor intensive manual review of clinical notes in the electronic health record (EHR). To overcome this challenge, we set out to develop a natural language processing (NLP) approach that automatically reads clinical notes to determine neurologic outcomes, to make it possible to conduct larger scale neurologic outcomes studies. We obtained 7314 notes from 3632 patients hospitalized at two large Boston hospitals between January 2012 and June 2020, including discharge summaries (3485), occupational therapy (1472) and physical therapy (2357) notes. Fourteen clinical experts reviewed notes to assign scores on the Glasgow Outcome Scale (GOS) with 4 classes, namely 'good recovery', 'moderate disability', 'severe disability', and 'death' and on the Modified Rankin Scale (mRS), with 7 classes, namely 'no symptoms', 'no significant disability', 'slight disability', 'moderate disability', 'moderately severe disability', 'severe disability', and 'death'. For 428 patients' notes, 2 experts scored the cases generating interrater reliability estimates for GOS and mRS. After preprocessing and extracting features from the notes, we trained a multiclass logistic regression model using LASSO regularization and 5-fold cross validation for hyperparameter tuning. The model performed well on the test set, achieving a micro average area under the receiver operating characteristic and F-score of 0.94 (95% CI 0.93-0.95) and 0.77 (0.75-0.80) for GOS, and 0.90 (0.89-0.91) and 0.59 (0.57-0.62) for mRS, respectively. Our work demonstrates that an NLP algorithm can accurately assign neurologic outcomes based on free text clinical notes. This algorithm increases the scale of research on neurological outcomes that is possible with EHR data.
Collapse
Affiliation(s)
- Marta B. Fernandes
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Navid Valizadeh
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Haitham S. Alabsi
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Syed A. Quadri
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Ryan A. Tesh
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Abigail A. Bucklin
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Haoqi Sun
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Aayushee Jain
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Laura N. Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, MGH, Boston, MA, United States
- Division of General Internal Medicine, MGH, Boston, MA, United States
| | - Elissa Ye
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Wendong Ge
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Sarah I. Collens
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
| | - Stacie Lin
- Harvard Medical School, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Gregory K. Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, MGH, Boston, MA, United States
| | - Sahar F. Zafar
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S. Mukerji
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Diseases, MGH, Boston, MA, United States
| | - M. Brandon Westover
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
- McCance Center for Brain Health, MGH, Boston, MA, United States
| |
Collapse
|
5
|
Meskers CGM, van der Veen S, Kim J, Meskers CJW, Smit QTS, Verkijk S, Geleijn E, Widdershoven GAM, Vossen PTJM, van der Leeden M. Automated recognition of functioning, activity and participation in COVID-19 from electronic patient records by natural language processing: a proof- of- concept. Ann Med 2022; 54:235-243. [PMID: 35040376 PMCID: PMC8774059 DOI: 10.1080/07853890.2021.2025418] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/21/2021] [Accepted: 12/29/2021] [Indexed: 02/08/2023] Open
Abstract
PURPOSE To address the feasibility, reliability and internal validity of natural language processing (NLP) for automated functional assessment of hospitalised COVID-19 patients in key International Classification of Functioning, Disability and Health (ICF) categories and levels from unstructured text in electronic health records (EHR) from a large teaching hospital. MATERIALS AND METHODS Eight human annotators assigned four ICF categories to relevant sentences: Emotional functions, Exercise tolerance, Walking and Moving, Work and Employment and their ICF levels (Functional Ambulation Categories for Walking and Moving, metabolic equivalents for Exercise tolerance). A linguistic neural network-based model was trained on 80% of the annotated sentences; inter-annotator agreement (IAA, Cohen's kappa), a weighted score of precision and recall (F1) and RMSE for level detection were assessed for the remaining 20%. RESULTS In total 4112 sentences of non-COVID-19 and 1061 of COVID-19 patients were annotated. Average IAA was 0.81; F1 scores were 0.7 for Walking and Moving and Emotional functions; RMSE for Walking and Moving (5- level scale) was 1.17 for COVID-19 patients. CONCLUSION Using a limited amount of annotated EHR sentences, a proof-of-concept was obtained for automated functional assessment of COVID-19 patients in ICF categories and levels. This allows for instantaneous assessment of the functional consequences of new diseases like COVID-19 for large numbers of patients.Key messagesHospitalised Covid-19 survivors may persistently suffer from low physical and mental functioning and a reduction in overall quality of life requiring appropriate and personalised rehabilitation strategies.For this, assessment of functioning within multiple domains and categories of the International Classification of Function is required, which is cumbersome using structured data.We show a proof-of-concept using Natural Language Processing techniques to automatically derive the aforementioned information from free-text notes within the Electronic Health Record of a large academic teaching hospital.
Collapse
Affiliation(s)
- Carel G. M. Meskers
- Department of Rehabilitation Medicine, Amsterdam University Medical Centers, Amsterdam Movement Sciences, Amsterdam, The Netherlands
| | - Sabina van der Veen
- Department of Ethics, Law and Humanities, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Jenia Kim
- Computational Lexicology and Terminology Lab, Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Caroline J. W. Meskers
- Department of Rehabilitation Medicine, Amsterdam University Medical Centers, Amsterdam Movement Sciences, Amsterdam, The Netherlands
| | - Quirine T. S. Smit
- Computational Lexicology and Terminology Lab, Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Stella Verkijk
- Computational Lexicology and Terminology Lab, Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Edwin Geleijn
- Department of Rehabilitation Medicine, Amsterdam University Medical Centers, Amsterdam Movement Sciences, Amsterdam, The Netherlands
| | - Guy A. M. Widdershoven
- Department of Ethics, Law and Humanities, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Piek T. J. M. Vossen
- Computational Lexicology and Terminology Lab, Faculty of Humanities, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Marike van der Leeden
- Department of Rehabilitation Medicine, Amsterdam University Medical Centers, Amsterdam Movement Sciences, Amsterdam, The Netherlands
| |
Collapse
|
6
|
Newman-Griffis DR, Hurwitz MB, McKernan GP, Houtrow AJ, Dicianno BE. A roadmap to reduce information inequities in disability with digital health and natural language processing. PLOS DIGITAL HEALTH 2022; 1:e0000135. [PMID: 36812573 PMCID: PMC9931310 DOI: 10.1371/journal.pdig.0000135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
People with disabilities disproportionately experience negative health outcomes. Purposeful analysis of information on all aspects of the experience of disability across individuals and populations can guide interventions to reduce health inequities in care and outcomes. Such an analysis requires more holistic information on individual function, precursors and predictors, and environmental and personal factors than is systematically collected in current practice. We identify 3 key information barriers to more equitable information: (1) a lack of information on contextual factors that affect a person's experience of function; (2) underemphasis of the patient's voice, perspective, and goals in the electronic health record; and (3) a lack of standardized locations in the electronic health record to record observations of function and context. Through analysis of rehabilitation data, we have identified ways to mitigate these barriers through the development of digital health technologies to better capture and analyze information about the experience of function. We propose 3 directions for future research on using digital health technologies, particularly natural language processing (NLP), to facilitate capturing a more holistic picture of a patient's unique experience: (1) analyzing existing information on function in free text documentation; (2) developing new NLP-driven methods to collect information on contextual factors; and (3) collecting and analyzing patient-reported descriptions of personal perceptions and goals. Multidisciplinary collaboration between rehabilitation experts and data scientists to advance these research directions will yield practical technologies to help reduce inequities and improve care for all populations.
Collapse
Affiliation(s)
- Denis R. Newman-Griffis
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Center for Health Equity Research and Promotion, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
- Information School, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | - Max B. Hurwitz
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gina P. McKernan
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Human Engineering Research Laboratories, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
| | - Amy J. Houtrow
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Brad E. Dicianno
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Human Engineering Research Laboratories, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
7
|
Divita G, Coale K, Maldonado JC, Silva RJ, Rasch E. Extracting body function information using rule-based methods: Highlighting structure and formatting challenges in clinical text. Front Digit Health 2022; 4:914171. [PMID: 36148210 PMCID: PMC9485548 DOI: 10.3389/fdgth.2022.914171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
This paper describes the identification of body function (BF) mentions within the clinical text within a large, national, heterogeneous corpus to highlight structural challenges presented by the clinical text. BF in clinical documents provides information on dysfunction or impairments in the function or structure of organ systems or organs. BF mentions are embedded in highly formatted structures where the formats include implied scoping boundaries that confound existing natural language processing segmentation and document decomposition techniques. This paper describes follow-up work to adapt a rule-based system created using National Institutes of Health records to a larger, more challenging corpus of Social Security Administration data. Results of these systems provide a baseline for future work to improve document decomposition techniques.
Collapse
|
8
|
Dorr DA, Quiñones AR, King T, Wei MY, White K, Bejan CA. Prediction of Future Health Care Utilization Through Note-extracted Psychosocial Factors. Med Care 2022; 60:570-578. [PMID: 35658116 PMCID: PMC9262845 DOI: 10.1097/mlr.0000000000001742] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND Persons with multimorbidity (≥2 chronic conditions) face an increased risk of poor health outcomes, especially as they age. Psychosocial factors such as social isolation, chronic stress, housing insecurity, and financial insecurity have been shown to exacerbate these outcomes, but are not routinely assessed during the clinical encounter. Our objective was to extract these concepts from chart notes using natural language processing and predict their impact on health care utilization for patients with multimorbidity. METHODS A cohort study to predict the 1-year likelihood of hospitalizations and emergency department visits for patients 65+ with multimorbidity with and without psychosocial factors. Psychosocial factors were extracted from narrative notes; all other covariates were extracted from electronic health record data from a large academic medical center using validated algorithms and concept sets. Logistic regression was performed to predict the likelihood of hospitalization and emergency department visit in the next year. RESULTS In all, 76,479 patients were eligible; the majority were White (89%), 54% were female, with mean age 73. Those with psychosocial factors were older, had higher baseline utilization, and more chronic illnesses. The 4 psychosocial factors all independently predicted future utilization (odds ratio=1.27-2.77, C -statistic=0.63). Accounting for demographics, specific conditions, and previous utilization, 3 of 4 of the extracted factors remained predictive (odds ratio=1.13-1.86) for future utilization. Compared with models with no psychosocial factors, they had improved discrimination. Individual predictions were mixed, with social isolation predicting depression and morbidity; stress predicting atherosclerotic cardiovascular disease onset; and housing insecurity predicting substance use disorder morbidity. DISCUSSION Psychosocial factors are known to have adverse health impacts, but are rarely measured; using natural language processing, we extracted factors that identified a higher risk segment of older adults with multimorbidity. Combining these extraction techniques with other measures of social determinants may help catalyze population health efforts to address psychosocial factors to mitigate their health impacts.
Collapse
Affiliation(s)
- David A. Dorr
- Department of Medical Informatics & Clinical Epidemiology; Oregon Health & Science University; Portland, OR
| | - Ana R. Quiñones
- Department of Family Medicine; Oregon Health & Science University; Portland, OR
| | - Taylor King
- Department of Medical Informatics & Clinical Epidemiology; Oregon Health & Science University; Portland, OR
| | | | - Kellee White
- Department of Health Policy and Management; University of Maryland; College Park, MD
| | - Cosmin A. Bejan
- Department of Biomedical Informatics; Vanderbilt University Medical Center; Nashville, TN, USA
| |
Collapse
|
9
|
Kaelin VC, Valizadeh M, Salgado Z, Sim JG, Anaby D, Boyd AD, Parde N, Khetani MA. Capturing and Operationalizing Participation in Pediatric Re/Habilitation Research Using Artificial Intelligence: A Scoping Review. FRONTIERS IN REHABILITATION SCIENCES 2022; 3. [PMID: 35919375 PMCID: PMC9340801 DOI: 10.3389/fresc.2022.855240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Background There is increased interest in using artificial intelligence (AI) to provide participation-focused pediatric re/habilitation. Existing reviews on the use of AI in participation-focused pediatric re/habilitation focus on interventions and do not screen articles based on their definition of participation. AI-based assessments may help reduce provider burden and can support operationalization of the construct under investigation. To extend knowledge of the landscape on AI use in participation-focused pediatric re/habilitation, a scoping review on AI-based participation-focused assessments is needed. Objective To understand how the construct of participation is captured and operationalized in pediatric re/habilitation using AI. Methods We conducted a scoping review of literature published in Pubmed, PsycInfo, ERIC, CINAHL, IEEE Xplore, ACM Digital Library, ProQuest Dissertation and Theses, ACL Anthology, AAAI Digital Library, and Google Scholar. Documents were screened by 2–3 independent researchers following a systematic procedure and using the following inclusion criteria: (1) focuses on capturing participation using AI; (2) includes data on children and/or youth with a congenital or acquired disability; and (3) published in English. Data from included studies were extracted [e.g., demographics, type(s) of AI used], summarized, and sorted into categories of participation-related constructs. Results Twenty one out of 3,406 documents were included. Included assessment approaches mainly captured participation through annotated observations (n = 20; 95%), were administered in person (n = 17; 81%), and applied machine learning (n = 20; 95%) and computer vision (n = 13; 62%). None integrated the child or youth perspective and only one included the caregiver perspective. All assessment approaches captured behavioral involvement, and none captured emotional or cognitive involvement or attendance. Additionally, 24% (n = 5) of the assessment approaches captured participation-related constructs like activity competencies and 57% (n = 12) captured aspects not included in contemporary frameworks of participation. Conclusions Main gaps for future research include lack of: (1) research reporting on common demographic factors and including samples representing the population of children and youth with a congenital or acquired disability; (2) AI-based participation assessment approaches integrating the child or youth perspective; (3) remotely administered AI-based assessment approaches capturing both child or youth attendance and involvement; and (4) AI-based assessment approaches aligning with contemporary definitions of participation.
Collapse
Affiliation(s)
- Vera C. Kaelin
- Rehabilitation Sciences, University of Illinois at Chicago, Chicago, IL, United States
- Children's Participation in Environment Research Lab, University of Illinois at Chicago, Chicago, IL, United States
| | - Mina Valizadeh
- Computer Science, University of Illinois at Chicago, Chicago, IL, United States
- Natural Language Processing Laboratory, University of Illinois at Chicago, Chicago, IL, United States
| | - Zurisadai Salgado
- Children's Participation in Environment Research Lab, University of Illinois at Chicago, Chicago, IL, United States
| | - Julia G. Sim
- Children's Participation in Environment Research Lab, University of Illinois at Chicago, Chicago, IL, United States
| | - Dana Anaby
- School of Physical and Occupational Therapy, McGill University, Montreal, QC, Canada
- CanChild Centre for Childhood Disability Research, McMaster University, Hamilton, ON, Canada
| | - Andrew D. Boyd
- Rehabilitation Sciences, University of Illinois at Chicago, Chicago, IL, United States
- Biomedical and Health Information Sciences, University of Illinois at Chicago, Chicago, IL, United States
- Physical Therapy, University of Illinois at Chicago, Chicago, IL, United States
| | - Natalie Parde
- Computer Science, University of Illinois at Chicago, Chicago, IL, United States
- Natural Language Processing Laboratory, University of Illinois at Chicago, Chicago, IL, United States
- Natalie Parde
| | - Mary A. Khetani
- Rehabilitation Sciences, University of Illinois at Chicago, Chicago, IL, United States
- Children's Participation in Environment Research Lab, University of Illinois at Chicago, Chicago, IL, United States
- CanChild Centre for Childhood Disability Research, McMaster University, Hamilton, ON, Canada
- Occupational Therapy, University of Illinois at Chicago, Chicago, IL, United States
- *Correspondence: Mary A. Khetani
| |
Collapse
|
10
|
Conic RRZ, Geis C, Vincent HK. Social Determinants of Health in Physiatry: Challenges and Opportunities for Clinical Decision Making and Improving Treatment Precision. Front Public Health 2021; 9:738253. [PMID: 34858922 PMCID: PMC8632538 DOI: 10.3389/fpubh.2021.738253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 10/11/2021] [Indexed: 11/15/2022] Open
Abstract
Physiatry is a medical specialty focused on improving functional outcomes in patients with a variety of medical conditions that affect the brain, spinal cord, peripheral nerves, muscles, bones, joints, ligaments, and tendons. Social determinants of health (SDH) play a key role in determining therapeutic process and patient functional outcomes. Big data and precision medicine have been used in other fields and to some extent in physiatry to predict patient outcomes, however many challenges remain. The interplay between SDH and physiatry outcomes is highly variable depending on different phases of care, and more favorable patient profiles in acute care may be less favorable in the outpatient setting. Furthermore, SDH influence which treatments or interventional procedures are accessible to the patient and thus determine outcomes. This opinion paper describes utility of existing datasets in combination with novel data such as movement, gait patterning and patient perceived outcomes could be analyzed with artificial intelligence methods to determine the best treatment plan for individual patients in order to achieve maximal functional capacity.
Collapse
Affiliation(s)
- Rosalynn R Z Conic
- Department of Family Medicine and Public Health, University of California, San Diego, San Diego, CA, United States
| | - Carolyn Geis
- Department of Physical Medicine and Rehabilitation, University of Florida, Gainesville, FL, United States
| | - Heather K Vincent
- Department of Physical Medicine and Rehabilitation, University of Florida, Gainesville, FL, United States
| |
Collapse
|
11
|
Newman-Griffis D, Camacho Maldonado J, Ho PS, Sacco M, Jimenez Silva R, Porcino J, Chan L. Linking Free Text Documentation of Functioning and Disability to the ICF With Natural Language Processing. FRONTIERS IN REHABILITATION SCIENCES 2021; 2. [PMID: 35694445 PMCID: PMC9180751 DOI: 10.3389/fresc.2021.742702] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Background: Invaluable information on patient functioning and the complex interactions that define it is recorded in free text portions of the Electronic Health Record (EHR). Leveraging this information to improve clinical decision-making and conduct research requires natural language processing (NLP) technologies to identify and organize the information recorded in clinical documentation. Methods: We used natural language processing methods to analyze information about patient functioning recorded in two collections of clinical documents pertaining to claims for federal disability benefits from the U.S. Social Security Administration (SSA). We grounded our analysis in the International Classification of Functioning, Disability, and Health (ICF), and used the Activities and Participation domain of the ICF to classify information about functioning in three key areas: mobility, self-care, and domestic life. After annotating functional status information in our datasets through expert clinical review, we trained machine learning-based NLP models to automatically assign ICF categories to mentions of functional activity. Results: We found that rich and diverse information on patient functioning was documented in the free text records. Annotation of 289 documents for Mobility information yielded 2,455 mentions of Mobility activities and 3,176 specific actions corresponding to 13 ICF-based categories. Annotation of 329 documents for Self-Care and Domestic Life information yielded 3,990 activity mentions and 4,665 specific actions corresponding to 16 ICF-based categories. NLP systems for automated ICF coding achieved over 80% macro-averaged F-measure on both datasets, indicating strong performance across all ICF categories used. Conclusions: Natural language processing can help to navigate the tradeoff between flexible and expressive clinical documentation of functioning and standardizable data for comparability and learning. The ICF has practical limitations for classifying functional status information in clinical documentation but presents a valuable framework for organizing the information recorded in health records about patient functioning. This study advances the development of robust, ICF-based NLP technologies to analyze information on patient functioning and has significant implications for NLP-powered analysis of functional status information in disability benefits management, clinical care, and research.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
- *Correspondence: Denis Newman-Griffis
| | - Jonathan Camacho Maldonado
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Pei-Shu Ho
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Maryanne Sacco
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Rafael Jimenez Silva
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Julia Porcino
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Leighton Chan
- Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| |
Collapse
|
12
|
Zirikly A, Desmet B, Newman-Griffis D, Marfeo EE, McDonough C, Goldman H, Chan L. Viewpoint: An Information Extraction Framework for Disability Determination Using a Mental Functioning Use-Case (Preprint). JMIR Med Inform 2021; 10:e32245. [PMID: 35302510 PMCID: PMC8976250 DOI: 10.2196/32245] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 10/08/2021] [Accepted: 01/16/2022] [Indexed: 01/08/2023] Open
Abstract
Natural language processing (NLP) in health care enables transformation of complex narrative information into high value products such as clinical decision support and adverse event monitoring in real time via the electronic health record (EHR). However, information technologies for mental health have consistently lagged because of the complexity of measuring and modeling mental health and illness. The use of NLP to support management of mental health conditions is a viable topic that has not been explored in depth. This paper provides a framework for the advanced application of NLP methods to identify, extract, and organize information on mental health and functioning to inform the decision-making process applied to assessing mental health. We present a use-case related to work disability, guided by the disability determination process of the US Social Security Administration (SSA). From this perspective, the following questions must be addressed about each problem that leads to a disability benefits claim: When did the problem occur and how long has it existed? How severe is it? Does it affect the person’s ability to work? and What is the source of the evidence about the problem? Our framework includes 4 dimensions of medical information that are central to assessing disability—temporal sequence and duration, severity, context, and information source. We describe key aspects of each dimension and promising approaches for application in mental functioning. For example, to address temporality, a complete functional timeline must be created with all relevant aspects of functioning such as intermittence, persistence, and recurrence. Severity of mental health symptoms can be successfully identified and extracted on a 4-level ordinal scale from absent to severe. Some NLP work has been reported on the extraction of context for specific cases of wheelchair use in clinical settings. We discuss the links between the task of information source assessment and work on source attribution, coreference resolution, event extraction, and rule-based methods. Gaps were identified in NLP applications that directly applied to the framework and in existing relevant annotated data sets. We highlighted NLP methods with the potential for advanced application in the field of mental functioning. Findings of this work will inform the development of instruments for supporting SSA adjudicators in their disability determination process. The 4 dimensions of medical information may have relevance for a broad array of individuals and organizations responsible for assessing mental health function and ability. Further, our framework with 4 specific dimensions presents significant opportunity for the application of NLP in the realm of mental health and functioning beyond the SSA setting, and it may support the development of robust tools and methods for decision-making related to clinical care, program implementation, and other outcomes.
Collapse
Affiliation(s)
- Ayah Zirikly
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States
- Malone Center for Engineering in Healthcare, Johns Hopkins University, Baltimore, MD, United States
| | - Bart Desmet
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| | - Denis Newman-Griffis
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - Elizabeth E Marfeo
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Occupational Therapy, Tufts University, Medford, MA, United States
| | - Christine McDonough
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- School of Health and Rehabilitation Science, University of Pittsburgh, Pittsburgh, PA, United States
| | - Howard Goldman
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
- Department of Psychiatry, School of Medicine, University of Maryland, Baltimore, MD, United States
| | - Leighton Chan
- Rehabilitation Medicine Department, Clinical Center, National Institutes of Health, Bethesda, MD, United States
| |
Collapse
|
13
|
Newman-Griffis D, Lehman JF, Rosé C, Hochheiser H. Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research. PROCEEDINGS OF THE CONFERENCE. ASSOCIATION FOR COMPUTATIONAL LINGUISTICS. NORTH AMERICAN CHAPTER. MEETING 2021; 2021:4125-4138. [PMID: 34179899 PMCID: PMC8223521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.
Collapse
Affiliation(s)
| | - Jill Fain Lehman
- Human-Computer Interaction Institute, Carnegie Mellon University, USA
| | - Carolyn Rosé
- Language Technologies Institute, Carnegie Mellon University, USA
| | - Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, USA
| |
Collapse
|