1
|
Bai C, Mardini MT. Advances of artificial intelligence in predicting frailty using real-world data: A scoping review. Ageing Res Rev 2024; 101:102529. [PMID: 39369796 DOI: 10.1016/j.arr.2024.102529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Revised: 08/27/2024] [Accepted: 09/30/2024] [Indexed: 10/08/2024]
Abstract
BACKGROUND Frailty assessment is imperative for tailoring healthcare interventions for older adults, but its implementation remains challenging due to the effort and time needed. The advances of artificial intelligence (AI) and natural language processing (NLP) present a novel opportunity to harness real-world data (RWD) including electronic health records, administrative claims, and other routinely collected medical records for frailty assessments. METHODS We followed the PRISMA-ScR guideline and searched Embase, Web of Science, and PubMed databases for articles that predict frailty using AI through RWD from inception until October 2023. We synthesized and analyzed the selected publications according to their field of application, methodologies employed, validation processes, outcomes achieved, and their respective limitations and strengths. RESULTS A total of 23 publications were selected from the initial search (N=2067) and bibliography. The approaches to frailty prediction using RWD and AI were categorized into two groups based on the type of data utilized: 1) AI models using structured data and 2) NLP techniques applied to unstructured clinical notes. We found that AI models achieved moderate to high predictive performance in predicting frailty. However, to demonstrate their clinical utility, these models require further validation using external data and a comprehensive assessment of their impact on patients' health outcomes. Additionally, the application of NLP in frailty prediction is still in its early stages. Great potential exists to enhance frailty prediction by integrating structured data and clinical notes. CONCLUSION The combination of AI and RWD presents significant opportunities for advancing frailty assessment. To maximize the advantages of these technological advances, future research is needed to rigorously address the challenges associated with the validation of AI models and innovative data integration.
Collapse
Affiliation(s)
- Chen Bai
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, United States
| | - Mamoun T Mardini
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL 32611, United States.
| |
Collapse
|
2
|
Osman M, Cooper R, Sayer AA, Witham MD. The use of natural language processing for the identification of ageing syndromes including sarcopenia, frailty and falls in electronic healthcare records: a systematic review. Age Ageing 2024; 53:afae135. [PMID: 38970549 PMCID: PMC11227113 DOI: 10.1093/ageing/afae135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Indexed: 07/08/2024] Open
Abstract
BACKGROUND Recording and coding of ageing syndromes in hospital records is known to be suboptimal. Natural Language Processing algorithms may be useful to identify diagnoses in electronic healthcare records to improve the recording and coding of these ageing syndromes, but the feasibility and diagnostic accuracy of such algorithms are unclear. METHODS We conducted a systematic review according to a predefined protocol and in line with Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. Searches were run from the inception of each database to the end of September 2023 in PubMed, Medline, Embase, CINAHL, ACM digital library, IEEE Xplore and Scopus. Eligible studies were identified via independent review of search results by two coauthors and data extracted from each study to identify the computational method, source of text, testing strategy and performance metrics. Data were synthesised narratively by ageing syndrome and computational method in line with the Studies Without Meta-analysis guidelines. RESULTS From 1030 titles screened, 22 studies were eligible for inclusion. One study focussed on identifying sarcopenia, one frailty, twelve falls, five delirium, five dementia and four incontinence. Sensitivity (57.1%-100%) of algorithms compared with a reference standard was reported in 20 studies, and specificity (84.0%-100%) was reported in only 12 studies. Study design quality was variable with results relevant to diagnostic accuracy not always reported, and few studies undertaking external validation of algorithms. CONCLUSIONS Current evidence suggests that Natural Language Processing algorithms can identify ageing syndromes in electronic health records. However, algorithms require testing in rigorously designed diagnostic accuracy studies with appropriate metrics reported.
Collapse
Affiliation(s)
- Mo Osman
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Rachel Cooper
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Avan A Sayer
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| | - Miles D Witham
- AGE Research Group, Translational and Clinical Research Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
- NIHR Newcastle Biomedical Research Centre, Newcastle upon Tyne NHS Foundation Trust, Cumbria Northumberland Tyne and Wear NHS Foundation Trust and Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
3
|
Wieland-Jorna Y, van Kooten D, Verheij RA, de Man Y, Francke AL, Oosterveld-Vlug MG. Natural language processing systems for extracting information from electronic health records about activities of daily living. A systematic review. JAMIA Open 2024; 7:ooae044. [PMID: 38798774 PMCID: PMC11126158 DOI: 10.1093/jamiaopen/ooae044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Revised: 03/21/2024] [Accepted: 05/07/2024] [Indexed: 05/29/2024] Open
Abstract
Objective Natural language processing (NLP) can enhance research on activities of daily living (ADL) by extracting structured information from unstructured electronic health records (EHRs) notes. This review aims to give insight into the state-of-the-art, usability, and performance of NLP systems to extract information on ADL from EHRs. Materials and Methods A systematic review was conducted based on searches in Pubmed, Embase, Cinahl, Web of Science, and Scopus. Studies published between 2017 and 2022 were selected based on predefined eligibility criteria. Results The review identified 22 studies. Most studies (65%) used NLP for classifying unstructured EHR data on 1 or 2 ADL. Deep learning, combined with a ruled-based method or machine learning, was the approach most commonly used. NLP systems varied widely in terms of the pre-processing and algorithms. Common performance evaluation methods were cross-validation and train/test datasets, with F1, precision, and sensitivity as the most frequently reported evaluation metrics. Most studies reported relativity high overall scores on the evaluation metrics. Discussion NLP systems are valuable for the extraction of unstructured EHR data on ADL. However, comparing the performance of NLP systems is difficult due to the diversity of the studies and challenges related to the dataset, including restricted access to EHR data, inadequate documentation, lack of granularity, and small datasets. Conclusion This systematic review indicates that NLP is promising for deriving information on ADL from unstructured EHR notes. However, what the best-performing NLP system is, depends on characteristics of the dataset, research question, and type of ADL.
Collapse
Affiliation(s)
- Yvonne Wieland-Jorna
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Daan van Kooten
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Robert A Verheij
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Tranzo, School of Social Sciences and Behavioural Research, Tilburg University, Tilburg, Postbus 90153, 5000 LE, The Netherlands
| | - Yvonne de Man
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| | - Anneke L Francke
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
- Department of Public and Occupational Health, Location Vrije Universiteit Amsterdam, Amsterdam UMC, Amsterdam, Postbus 7057, 1007 MB, The Netherlands
| | - Mariska G Oosterveld-Vlug
- Netherlands Institute for Health Services Research (Nivel), Utrecht, Postbus 1568, 3500 BN, The Netherlands
| |
Collapse
|
4
|
Linfield GH, Patel S, Ko HJ, Lacar B, Gottlieb LM, Adler-Milstein J, Singh NV, Pantell MS, De Marchis EH. Evaluating the comparability of patient-level social risk data extracted from electronic health records: A systematic scoping review. Health Informatics J 2023; 29:14604582231200300. [PMID: 37677012 DOI: 10.1177/14604582231200300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Objective: To evaluate how and from where social risk data are extracted from EHRs for research purposes, and how observed differences may impact study generalizability. Methods: Systematic scoping review of peer-reviewed literature that used patient-level EHR data to assess 1 ± 6 social risk domains: housing, transportation, food, utilities, safety, social support/isolation. Results: 111/9022 identified articles met inclusion criteria. By domain, social support/isolation was most often included (N = 68/111), predominantly defined by marital/partner status (N = 48/68) and extracted from structured sociodemographic data (N = 45/48). Housing risk was defined primarily by homelessness (N = 39/49). Structured housing data was extracted most from billing codes and screening tools (N = 15/30, 13/30, respectively). Across domains, data were predominantly sourced from structured fields (N = 89/111) versus unstructured free text (N = 32/111). Conclusion: We identified wide variability in how social domains are defined and extracted from EHRs for research. More consistency, particularly in how domains are operationalized, would enable greater insights across studies.
Collapse
Affiliation(s)
- Gaia H Linfield
- School of Medicine, University of California, San Francisco, CA, USA
| | - Shyam Patel
- School of Medicine, University of California, San Francisco, CA, USA
| | - Hee Joo Ko
- School of Medicine, University of California, San Francisco, CA, USA
| | - Benjamin Lacar
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA; Berkeley Institute for Data Science, University of California, Berkeley
| | - Laura M Gottlieb
- Department of Family & Community Medicine, University of California, San Francisco, CA, USA
| | - Julia Adler-Milstein
- School of Medicine, University of California, San Francisco, CA, USA; Center for Clinical Informatics and Improvement Research, University of California, San Francisco, CA, USA
| | - Nina V Singh
- California School of Professional Psychology, Alliant International University, Emeryvilla, CA, USA
| | - Matthew S Pantell
- Department of Pediatrics, University of California, San Francisco, CA, USA
| | - Emilia H De Marchis
- Department of Family & Community Medicine, University of California, San Francisco, CA, USA
| |
Collapse
|
5
|
Hossain E, Rana R, Higgins N, Soar J, Barua PD, Pisani AR, Turner K. Natural Language Processing in Electronic Health Records in relation to healthcare decision-making: A systematic review. Comput Biol Med 2023; 155:106649. [PMID: 36805219 DOI: 10.1016/j.compbiomed.2023.106649] [Citation(s) in RCA: 28] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2022] [Revised: 01/04/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) is widely used to extract clinical insights from Electronic Health Records (EHRs). However, the lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs. Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively. METHODOLOGY After screening 261 articles from 11 databases, we included 127 papers for full-text review covering seven categories of articles: (1) medical note classification, (2) clinical entity recognition, (3) text summarisation, (4) deep learning (DL) and transfer learning architecture, (5) information extraction, (6) Medical language translation and (7) other NLP applications. This study follows the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. RESULT AND DISCUSSION EHR was the most commonly used data type among the selected articles, and the datasets were primarily unstructured. Various ML and DL methods were used, with prediction or classification being the most common application of ML or DL. The most common use cases were: the International Classification of Diseases, Ninth Revision (ICD-9) classification, clinical note analysis, and named entity recognition (NER) for clinical descriptions and research on psychiatric disorders. CONCLUSION We find that the adopted ML models were not adequately assessed. In addition, the data imbalance problem is quite important, yet we must find techniques to address this underlining problem. Future studies should address key limitations in studies, primarily identifying Lupus Nephritis, Suicide Attempts, perinatal self-harmed and ICD-9 classification.
Collapse
Affiliation(s)
- Elias Hossain
- School of Engineering & Physical Sciences, North South University, Dhaka 1229, Bangladesh.
| | - Rajib Rana
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Niall Higgins
- School of Management and Enterprise, University of Southern Queensland, Darling Heights QLD 4350, Australia; School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia; Metro North Mental Health, Herston QLD 4029, Australia
| | - Jeffrey Soar
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Prabal Datta Barua
- School of Business, University of Southern Queensland, Springfield Central QLD 4300, Australia
| | - Anthony R Pisani
- Center for the Study and Prevention of Suicide, University of Rochester, Rochester, NY, United States
| | - Kathryn Turner
- School of Nursing, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4000, Australia
| |
Collapse
|
6
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 31] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Newman-Griffis DR, Hurwitz MB, McKernan GP, Houtrow AJ, Dicianno BE. A roadmap to reduce information inequities in disability with digital health and natural language processing. PLOS DIGITAL HEALTH 2022; 1:e0000135. [PMID: 36812573 PMCID: PMC9931310 DOI: 10.1371/journal.pdig.0000135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
People with disabilities disproportionately experience negative health outcomes. Purposeful analysis of information on all aspects of the experience of disability across individuals and populations can guide interventions to reduce health inequities in care and outcomes. Such an analysis requires more holistic information on individual function, precursors and predictors, and environmental and personal factors than is systematically collected in current practice. We identify 3 key information barriers to more equitable information: (1) a lack of information on contextual factors that affect a person's experience of function; (2) underemphasis of the patient's voice, perspective, and goals in the electronic health record; and (3) a lack of standardized locations in the electronic health record to record observations of function and context. Through analysis of rehabilitation data, we have identified ways to mitigate these barriers through the development of digital health technologies to better capture and analyze information about the experience of function. We propose 3 directions for future research on using digital health technologies, particularly natural language processing (NLP), to facilitate capturing a more holistic picture of a patient's unique experience: (1) analyzing existing information on function in free text documentation; (2) developing new NLP-driven methods to collect information on contextual factors; and (3) collecting and analyzing patient-reported descriptions of personal perceptions and goals. Multidisciplinary collaboration between rehabilitation experts and data scientists to advance these research directions will yield practical technologies to help reduce inequities and improve care for all populations.
Collapse
Affiliation(s)
- Denis R. Newman-Griffis
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Center for Health Equity Research and Promotion, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
- Information School, University of Sheffield, Sheffield, United Kingdom
- * E-mail:
| | - Max B. Hurwitz
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Gina P. McKernan
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Human Engineering Research Laboratories, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
| | - Amy J. Houtrow
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Brad E. Dicianno
- Department of Physical Medicine and Rehabilitation, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- Human Engineering Research Laboratories, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
8
|
Kitchen C, Chang HY, Weiner JP, Kharrazi H. Assessing the Added Value of Vital Signs Extracted from Electronic Health Records in Healthcare Risk Adjustment Models. Healthc Policy 2022; 15:1671-1682. [PMID: 36092549 PMCID: PMC9462838 DOI: 10.2147/rmhp.s356080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 03/26/2022] [Indexed: 11/24/2022] Open
Abstract
Purpose Patient vital signs are related to specific health risks and outcomes but are underutilized in the prediction of health-care utilization and cost. To measure the added value of electronic health record (EHR) extracted Body Mass Index (BMI) and blood pressure (BP) values in improving healthcare risk and utilization predictions. Patients and Methods A sample of 12,820 adult outpatients from the Johns Hopkins Health System (JHHS) were identified between 2016 and 2017, having high data quality and recorded values for BMI and BP. We evaluated the added value of BMI and BP in predicting health-care utilization and cost through a retrospective cohort design. BMI, mean arterial pressure (MAP), systolic and diastolic BPs were summarized as annual aggregated values. Concurrent annual BMI and MAP changes were quantified as the difference between maximum and minimum recorded values. Model performance estimates consisted of repeated 10-fold cross validation, compared to base model point estimates for demographic and diagnostic, coded events: (1) patient age and sex, (2) age, sex, and the Charlson weighted index, (3) age, sex and the Johns Hopkins ACG system’s DxPM risk score. Results Both categorical BMI and BP were progressively indicative of disease comorbidity, but not uniformly related to health-care utilization or cost. Annual change in BMI and MAP improved predictions for most concurrent year outcomes when compared to base models. Conclusion When a healthcare system lacks relevant diagnostic or risk assessment information for a patient, vital signs may be useful for a simple estimation of disease risk, cost and utilization.
Collapse
Affiliation(s)
- Christopher Kitchen
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Hsien-Yen Chang
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Jonathan P Weiner
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
| | - Hadi Kharrazi
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.,Division of Health Sciences Informatics, Johns Hopkins School of Medicine, Baltimore, MD, USA
| |
Collapse
|
9
|
Bear Don’t Walk OJ, Reyes Nieva H, Lee SSJ, Elhadad N. A scoping review of ethics considerations in clinical natural language processing. JAMIA Open 2022; 5:ooac039. [PMID: 35663112 PMCID: PMC9154253 DOI: 10.1093/jamiaopen/ooac039] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 05/05/2022] [Accepted: 05/12/2022] [Indexed: 11/12/2022] Open
Abstract
Abstract
Objectives
To review through an ethics lens the state of research in clinical natural language processing (NLP) for the study of bias and fairness, and to identify gaps in research.
Methods
We queried PubMed and Google Scholar for articles published between 2015 and 2021 concerning clinical NLP, bias, and fairness. We analyzed articles using a framework that combines the machine learning (ML) development process (ie, design, data, algorithm, and critique) and bioethical concepts of beneficence, nonmaleficence, autonomy, justice, as well as explicability. Our approach further differentiated between biases of clinical text (eg, systemic or personal biases in clinical documentation towards patients) and biases in NLP applications.
Results
Out of 1162 articles screened, 22 met criteria for full text review. We categorized articles based on the design (N = 2), data (N = 12), algorithm (N = 14), and critique (N = 17) phases of the ML development process.
Discussion
Clinical NLP can be used to study bias in applications reliant on clinical text data as well as explore biases in the healthcare setting. We identify 3 areas of active research that require unique ethical considerations about the potential for clinical NLP to address and/or perpetuate bias: (1) selecting metrics that interrogate bias in models; (2) opportunities and risks of identifying sensitive patient attributes; and (3) best practices in reconciling individual autonomy, leveraging patient data, and inferring and manipulating sensitive information of subgroups. Finally, we address the limitations of current ethical frameworks to fully address concerns of justice. Clinical NLP is a rapidly advancing field, and assessing current approaches against ethical considerations can help the discipline use clinical NLP to explore both healthcare biases and equitable NLP applications.
Collapse
Affiliation(s)
| | - Harry Reyes Nieva
- Department of Biomedical Informatics, Columbia University , New York, New York, USA
- Department of Medicine, Harvard Medical School , Boston, Massachusetts, USA
| | - Sandra Soo-Jin Lee
- Department of Medical Humanities and Ethics, Columbia University , New York, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University , New York, New York, USA
| |
Collapse
|
10
|
Kharrazi H, Chang HY, Weiner JP, Gudzune KA. Assessing the Added Value of Blood Pressure Information Derived from Electronic Health Records in Predicting Health Care Cost and Utilization. Popul Health Manag 2021; 25:323-334. [PMID: 34847729 DOI: 10.1089/pop.2021.0250] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Health care providers are increasingly using clinical measures derived from electronic health records (EHRs) for risk stratification and predictive modeling. EHR-specific data elements such as prescriptions, laboratory results, and vital signs have been shown to improve risk prediction models. In this study, the value of EHR-based blood pressure (BP) values was assessed in predicting health care costs (ie, total, medical, and pharmacy) and key utilization end points (ie, hospitalization, emergency department use, and being among the highest utilizers). The study population included 37,451 patients of a large integrated delivery system in the mid-western United States with complete EHR data files, who were 18-64 years old, had continuous insurance at an affiliated health plan, and had eligible BP records. Both EHRs and insurance claims of the study population were used to extract the predictors (ie, demographics, diagnosis, and BP values) and outcomes (ie, costs and utilizations). Predictors were extracted from 2012 data, whereas concurrent and prospective outcomes were extracted from 2012 to 2013 data. Three base models (BMs) were constructed to predict each of the outcomes. The first BM no. 1 used demographics. The second BM no. 2 added the Charlson comorbidity index to BM no. 1, whereas the third BM no. 3 added the Adjusted Clinical Group Dx-PM case-mix score to BM no. 1. BP was specified as means, ranges, and classes. Adding BP ranges to BM no. 1 and BM no. 2 showed the greatest improvements when predicting costs and utilization. More specifically, adjusted R2 and area under the curve of BM no. 2 improved by 32.9% and 14.1% when BP ranges were added to predict concurrent total cost and hospitalization, respectively. The effect of BP measures on improving the risk stratification models was diminished when predicting prospective outcomes after adding the measures to BM no. 3 (ie, the more comprehensive diagnostic model), specifically when represented as BP means. Given the increasing availability of BP information, this research suggests that these data should be integrated into provider-based population health analytic activities. Future research should focus on subpopulations that benefit the most from incorporating vital signs such as BP measures in risk stratification models.
Collapse
Affiliation(s)
- Hadi Kharrazi
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Division of General Internal Medicine, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Hsien-Yen Chang
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Jonathan P Weiner
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Kimberly A Gudzune
- Division of General Internal Medicine, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA.,Welch Center for Prevention, Epidemiology, and Clinical Research, Johns Hopkins Medical Institution, Baltimore, Maryland, USA
| |
Collapse
|
11
|
Martin JA, Crane-Droesch A, Lapite FC, Puhl JC, Kmiec TE, Silvestri JA, Ungar LH, Kinosian BP, Himes BE, Hubbard RA, Diamond JM, Ahya V, Sims MW, Halpern SD, Weissman GE. Development and validation of a prediction model for actionable aspects of frailty in the text of clinicians' encounter notes. J Am Med Inform Assoc 2021; 29:109-119. [PMID: 34791302 DOI: 10.1093/jamia/ocab248] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 10/16/2021] [Accepted: 10/28/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE Frailty is a prevalent risk factor for adverse outcomes among patients with chronic lung disease. However, identifying frail patients who may benefit from interventions is challenging using standard data sources. We therefore sought to identify phrases in clinical notes in the electronic health record (EHR) that describe actionable frailty syndromes. MATERIALS AND METHODS We used an active learning strategy to select notes from the EHR and annotated each sentence for 4 actionable aspects of frailty: respiratory impairment, musculoskeletal problems, fall risk, and nutritional deficiencies. We compared the performance of regression, tree-based, and neural network models to predict the labels for each sentence. We evaluated performance with the scaled Brier score (SBS), where 1 is perfect and 0 is uninformative, and the positive predictive value (PPV). RESULTS We manually annotated 155 952 sentences from 326 patients. Elastic net regression had the best performance across all 4 frailty aspects (SBS 0.52, 95% confidence interval [CI] 0.49-0.54) followed by random forests (SBS 0.49, 95% CI 0.47-0.51), and multi-task neural networks (SBS 0.39, 95% CI 0.37-0.42). For the elastic net model, the PPV for identifying the presence of respiratory impairment was 54.8% (95% CI 53.3%-56.6%) at a sensitivity of 80%. DISCUSSION Classification models using EHR notes can effectively identify actionable aspects of frailty among patients living with chronic lung disease. Regression performed better than random forest and neural network models. CONCLUSIONS NLP-based models offer promising support to population health management programs that seek to identify and refer community-dwelling patients with frailty for evidence-based interventions.
Collapse
Affiliation(s)
- Jacob A Martin
- Division of Cardiology, Department of Medicine, New York University Grossman School of Medicine, New York, New York, USA.,Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Andrew Crane-Droesch
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | | | - Joseph C Puhl
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Tyler E Kmiec
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Jasmine A Silvestri
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Lyle H Ungar
- Department of Computer and Information Science, University of Pennsylvania School of Engineering and Applied Science, Philadelphia, Pennsylvania, USA
| | - Bruce P Kinosian
- Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Division of Geriatrics, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Geriatrics and Extended Care Data Analysis Center, Corporal Michael J Crescenz VA Medical Center, Philadelphia, Pennsylvania, USA
| | - Blanca E Himes
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Rebecca A Hubbard
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Joshua M Diamond
- Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Vivek Ahya
- Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Michael W Sims
- Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Scott D Halpern
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Gary E Weissman
- Palliative and Advanced Illness Research (PAIR) Center, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Leonard Davis Institute of Health Economics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Pulmonary, Allergy, and Critical Care Division, Department of Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| |
Collapse
|
12
|
Chowdhury M, Cervantes EG, Chan WY, Seitz DP. Use of Machine Learning and Artificial Intelligence Methods in Geriatric Mental Health Research Involving Electronic Health Record or Administrative Claims Data: A Systematic Review. Front Psychiatry 2021; 12:738466. [PMID: 34616322 PMCID: PMC8488098 DOI: 10.3389/fpsyt.2021.738466] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 08/26/2021] [Indexed: 11/13/2022] Open
Abstract
Introduction: Electronic health records (EHR) and administrative healthcare data (AHD) are frequently used in geriatric mental health research to answer various health research questions. However, there is an increasing amount and complexity of data available that may lend itself to alternative analytic approaches using machine learning (ML) or artificial intelligence (AI) methods. We performed a systematic review of the current application of ML or AI approaches to the analysis of EHR and AHD in geriatric mental health. Methods: We searched MEDLINE, Embase, and PsycINFO to identify potential studies. We included all articles that used ML or AI methods on topics related to geriatric mental health utilizing EHR or AHD data. We assessed study quality either by Prediction model Risk OF Bias ASsessment Tool (PROBAST) or Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) checklist. Results: We initially identified 391 articles through an electronic database and reference search, and 21 articles met inclusion criteria. Among the selected studies, EHR was the most used data type, and the datasets were mainly structured. A variety of ML and AI methods were used, with prediction or classification being the main application of ML or AI with the random forest as the most common ML technique. Dementia was the most common mental health condition observed. The relative advantages of ML or AI techniques compared to biostatistical methods were generally not assessed. Only in three studies, low risk of bias (ROB) was observed according to all the PROBAST domains but in none according to QUADAS-2 domains. The quality of study reporting could be further improved. Conclusion: There are currently relatively few studies using ML and AI in geriatric mental health research using EHR and AHD methods, although this field is expanding. Aside from dementia, there are few studies of other geriatric mental health conditions. The lack of consistent information in the selected studies precludes precise comparisons between them. Improving the quality of reporting of ML and AI work in the future would help improve research in the field. Other courses of improvement include using common data models to collect/organize data, and common datasets for ML model validation.
Collapse
Affiliation(s)
- Mohammad Chowdhury
- Department of Psychiatry, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Eddie Gasca Cervantes
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON, Canada
| | - Wai-Yip Chan
- Department of Electrical and Computer Engineering, Queen's University, Kingston, ON, Canada
| | - Dallas P. Seitz
- Department of Psychiatry, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| |
Collapse
|
13
|
Hatef E, Ma X, Shaikh Y, Kharrazi H, Weiner JP, Gaskin DJ. Internet Access, Social Risk Factors, and Web-Based Social Support Seeking Behavior: Assessing Correlates of the "Digital Divide" Across Neighborhoods in The State of Maryland. J Med Syst 2021; 45:94. [PMID: 34537892 PMCID: PMC8449832 DOI: 10.1007/s10916-021-01769-w] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Accepted: 09/14/2021] [Indexed: 11/30/2022]
Abstract
We aimed to empirically measure the degree to which there is a “digital divide” in terms of access to the internet at the small-area community level within the State of Maryland and the City of Baltimore and to assess the relationship and association of this divide with community-level SDOH risk factors, community-based social service agency location, and web-mediated support service seeking behavior. To assess the socio-economic characteristics of the neighborhoods across the state, we calculated the Area Deprivation Index (ADI) using the U.S. Census, American Community Survey (5-year estimates) of 2017. To assess the digital divide, at the community level, we used the Federal Communications Commission (FCC) data on the number of residential fixed Internet access service connections. We assessed the availability of and web-based access to community-based social service agencies using data provided by the “Aunt Bertha” information platform. We performed community and regional level descriptive and special analyses for ADI social risk factors, connectivity, and both the availability of and web-based searches for community-based social services. To help assess potential neighborhood linked factors associated with the rates of web-based social services searches by individuals in need, we applied logistic regression using generalized estimating equation modeling. Baltimore City contained more disadvantaged neighborhoods compared to other areas in Maryland. In Baltimore City, 20.3% of neighborhoods (defined by census block groups) were disadvantaged with ADI at the 90th percentile while only 6.6% of block groups across Maryland were in this disadvantaged category. Across the State, more than half of all census tracts had 801–1000 households (per 1000 households) with internet subscription. In contrast, in Baltimore City about half of all census tracts had only 401–600 of the households (per 1000 households) with internet subscriptions. Most block groups in Maryland and Baltimore City lacked access to social services facilities (61% of block groups at the 90th percentile of disadvantage in Maryland and 61.3% of block groups at the 90th percentile of disadvantage in Baltimore City). After adjusting for other variables, a 1% increase in the ADI measure of social disadvantage, resulting in a 1.7% increase in the number of individuals seeking social services. While more work is needed, our findings support the premise that the digital divide is closely associated with other SDOH factors. The policymakers must propose policies to address the digital divide on a national level and also in disadvantaged communities experiencing the digital divide in addition to other SDOH challenges.
Collapse
Affiliation(s)
- Elham Hatef
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, US. .,Johns Hopkins Center for Health Disparities Solutions, Baltimore, MD, US.
| | - Xiaomeng Ma
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, US
| | - Yahya Shaikh
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, US
| | - Hadi Kharrazi
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, US
| | - Jonathan P Weiner
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, US
| | - Darrell J Gaskin
- Johns Hopkins Center for Health Disparities Solutions, Baltimore, MD, US
| |
Collapse
|
14
|
Bompelli A, Wang Y, Wan R, Singh E, Zhou Y, Xu L, Oniani D, Kshatriya BSA, Balls-Berry J(JE, Zhang R. Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review. HEALTH DATA SCIENCE 2021; 2021:9759016. [PMID: 38487504 PMCID: PMC10880156 DOI: 10.34133/2021/9759016] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 06/28/2021] [Indexed: 03/17/2024]
Abstract
Background. There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been limited review into how to make the most of SBDH information from EHRs using AI approaches.Methods. A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided.Results. Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, natural language processing (NLP) approaches for extracting SBDH from clinical notes, and predictive models using SBDH for health outcomes.Discussion. The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using NLP technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues.Conclusion. Despite known associations between SBDH and diseases, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, ultimately promoting health and health equity.
Collapse
Affiliation(s)
- Anusha Bompelli
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, USA
| | - Ruyuan Wan
- Department of Computer Science, University of Minnesota, USA
| | - Esha Singh
- Department of Computer Science, University of Minnesota, USA
| | - Yuqi Zhou
- Institute for Health Informatics and College of Pharmacy, University of Minnesota, USA
| | - Lin Xu
- Carlson School of Business, University of Minnesota, USA
| | - David Oniani
- Department of Computer Science and Mathematics, Luther College, USA
| | | | | | - Rui Zhang
- Institute for Health Informatics, Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| |
Collapse
|
15
|
Newman-Griffis D, Fosler-Lussier E. Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health. Front Digit Health 2021; 3:620828. [PMID: 33791684 PMCID: PMC8009547 DOI: 10.3389/fdgth.2021.620828] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Accepted: 02/16/2021] [Indexed: 11/13/2022] Open
Abstract
Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.
Collapse
Affiliation(s)
- Denis Newman-Griffis
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
- Epidemiology & Biostatistics Section, Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States
| | - Eric Fosler-Lussier
- Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, United States
| |
Collapse
|
16
|
Wray CM, Vali M, Walter LC, Christensen L, Abdelrahman S, Chapman W, Keyhani S. Examining the Interfacility Variation of Social Determinants of Health in the Veterans Health Administration. Fed Pract 2021; 38:15-19. [PMID: 33574644 DOI: 10.12788/fp.0080] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Introduction Recently, numerous studies have linked social determinants of health (SDoH) with clinical outcomes. While this association is well known, the interfacility variability of these risk favors within the Veterans Health Administration (VHA) is not known. Such information could be useful to the VHA for resource and funding allocation. The aim of this study is to explore the interfacility variability of 5 SDoH within the VHA. Methods In a cohort of patients (aged ≥ 65 years) hospitalized at VHA acute care facilities with either acute myocardial infarction (AMI), heart failure (HF), or pneumonia in 2012, we assessed (1) the proportion of patients with any of the following five documented SDoH: lives alone, marginal housing, alcohol use disorder, substance use disorder, and use of substance use services, using administrative diagnosis codes and clinic stop codes; and (2) the documented facility-level variability of these SDoH. To examine whether variability was due to regional coding differences, we assessed the variation of living alone using a validated natural language processing (NLP) algorithm. Results The proportion of veterans admitted for AMI, HF, and pneumonia with SDoH was low. Across all 3 conditions, lives alone was the most common SDoH (2.2% [interquartile range (IQR), 0.7-4.7]), followed by substance use disorder (1.3% [IQR, 0.5-2.1]), and use of substance use services (1.2% [IQR, 0.6-1.8]). Using NLP, the proportion of hospitalized veterans with lives alone was higher for HF (14.4% vs 2.0%, P < .01), pneumonia (11% vs 1.9%, P < .01), and AMI (10.2% vs 1.4%, P < .01) compared with International Classification of Diseases, Ninth Edition codes. Interfacility variability was noted with both administrative and NLP extraction methods. Conclusions The presence of SDoH in administrative data among patients hospitalized for common medical issues is low and variable across VHA facilities. Significant facility-level variation of 5 SDoH was present regardless of extraction method.
Collapse
Affiliation(s)
- Charlie M Wray
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Marzieh Vali
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Louise C Walter
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Lee Christensen
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Samir Abdelrahman
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Wendy Chapman
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| | - Salomeh Keyhani
- is an Internist in the Division of Hospital Medicine; is a Statistician in the Northern California Institute for Research and Education; is a Geriatrician in the Division of Geriatrics; and is an Internist in the Division of General Internal Medicine; all at the San Francisco Veterans Affairs Medical Center. is a Project Manager and is an Assistant Professor, both in the Department of Biomedical Informatics, University of Utah in Salt Lake City. is the Associate Dean of Digital Health and Informatics in the Centre for Digital Transformation of Health, University of Melbourne, Victoria, Australia. Charlie Wray is an Assistant Professor of Medicine, Louise Walter and Salomeh Keyhani are Professors of Medicine; all in the Department of Medicine, University of California, San Francisco
| |
Collapse
|
17
|
Kharrazi H, Ma X, Chang HY, Richards TM, Jung C. Comparing the Predictive Effects of Patient Medication Adherence Indices in Electronic Health Record and Claims-Based Risk Stratification Models. Popul Health Manag 2021; 24:601-609. [PMID: 33544044 DOI: 10.1089/pop.2020.0306] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Multiple indices are available to measure medication adherence behaviors. Medication adherence measures, however, have rarely been extracted from electronic health records (EHRs) for population-level risk predictions. This study assessed the value of medication adherence indices in improving predictive models of cost and hospitalization. This study included a 2-year retrospective cohort of patients younger than age 65 years with linked EHR and insurance claims data. Three medication adherence measures were calculated: medication regimen complexity index (MRCI), medication possession ratio (MPR), and prescription fill rate (PFR). The authors examined the effects of adding these measures to 3 predictive models of utilization: a demographics model, a conventional model (Charlson index), and an advanced diagnosis-based model. Models were trained using EHR and claims data. The study population had an overall MRCI, MPR, and PFR of 14.6 ± 17.8, .624 ± .310, and .810 ± .270, respectively. Adding MRCI and MPR to the demographic and the morbidity models using claims data improved forecasting of next-year hospitalization substantially (eg, AUC of the demographic model increased from .605 to .656 using MRCI). Nonetheless, such boosting effects were attenuated for the advanced diagnosis-based models. Although EHR models performed inferior to claims models, adding adherence indices improved EHR model performances at a larger scale (eg, adding MRCI increased AUC by 4.4% for the Charlson model using EHR data compared to 3.8% using claims). This study shows that medication adherence measures can modestly improve EHR- and claims-derived predictive models of cost and hospitalization in non-elderly patients; however, the improvements are minimal for advanced diagnosis-based models.
Collapse
Affiliation(s)
- Hadi Kharrazi
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Division of Health Sciences Informatics, Johns Hopkins School of Medicine, Baltimore Maryland, USA
| | - Xiaomeng Ma
- Dalla Lana School of Public Health, Institute of Health Policy Management and Evaluations, University of Toronto, Toronto, Canada
| | - Hsien-Yen Chang
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Thomas M Richards
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Changmi Jung
- Carey Business School, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
18
|
Veinot TC, Ancker JS, Bakken S. Health informatics and health equity: improving our reach and impact. J Am Med Inform Assoc 2021; 26:689-695. [PMID: 31411692 DOI: 10.1093/jamia/ocz132] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Health informatics studies the use of information technology to improve human health. As informaticists, we seek to reduce the gaps between current healthcare practices and our societal goals for better health and healthcare quality, safety, or cost. It is time to recognize health equity as one of these societal goals-a point underscored by this Journal of the American Medical Informatics Association Special Focus Issue, "Health Informatics and Health Equity: Improving our Reach and Impact." This Special Issue highlights health informatics research that focuses on marginalized and underserved groups, health disparities, and health equity. In particular, this Special Issue intentionally showcases high-quality research and professional experiences that encompass a broad range of subdisciplines, methods, marginalized populations, and approaches to disparities. Building on this variety of submissions and other recent developments, we highlight contents of the Special Issue and offer an assessment of the state of research at the intersection of health informatics and health equity.
Collapse
Affiliation(s)
- Tiffany C Veinot
- School of Information, University of Michigan, Ann Arbor, Michigan, USA.,Department of Health Behavior and Health Education, School of Public Health, University of Michigan, Ann Arbor, Michigan, USA
| | - Jessica S Ancker
- Division of Health Informatics, Department of Healthcare Policy & Research, Weill Cornell Medical College, New York, New York, USA
| | - Suzanne Bakken
- School of Nursing, Columbia University, New York, New York, USA.,Department of Biomedical Informatics, Columbia University, New York, New York, USA.,Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|
19
|
Bery AK, Anzaldi LJ, Boyd CM, Leff B, Kharrazi H. Potential value of electronic health records in capturing data on geriatric frailty for population health. Arch Gerontol Geriatr 2020; 91:104224. [PMID: 32829083 DOI: 10.1016/j.archger.2020.104224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2020] [Revised: 07/19/2020] [Accepted: 08/04/2020] [Indexed: 10/23/2022]
Abstract
OBJECTIVES Despite the availability of many frailty measures to identify older adults at risk, frailty instruments are not routinely used for risk assessment in population health management. Here, we assessed the potential value of electronic health records (EHRs) and administrative claims in providing the necessary data for variables used across various frailty instruments. SETTING AND PARTICIPANTS The review focused on studies conducted worldwide. Participants included older people aged 50 and older. DESIGN We identified frailty instruments published between 2011 and 2018. Frailty variables used in each of the frailty instruments were extracted, grouped, and categorized across health determinants and various clinical factors. MEASURES The availability of the extracted frailty variables across various data sources (e.g., EHRs, administrative claims, and surveys) was evaluated by experts. RESULTS We identified 135 frailty instruments, which contained 593 unique variables. Clinical determinants of health were the best represented variables across frailty instruments (n = 516; 87 %), unlike social and health services factors (n = 33; ∼5% and n = 32; ∼5%). Most frailty instruments require at least one variable that is not routinely available in EHRs or claims (n = 113; ∼83 %). Only 22 frailty instruments have the potential to completely rely on EHR (structured or free-text data) and/or claims data, and possibly be operationalized on a population-level. CONCLUSIONS AND IMPLICATIONS Frailty instruments continue to be highly survey-based. More research is therefore needed to develop EHR-based frailty instruments for population health management. This will permit organizations and societies to stratify risk and better allocate resources among different older adult populations.
Collapse
Affiliation(s)
- Anand K Bery
- Division of Neurology, Department of Medicine, The Ottawa Hospital, 1053 Carling Avenue, Ottawa, ON, K1Y 4E9, Canada; Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway, Baltimore, MD, 21205, United States.
| | - Laura J Anzaldi
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway, Baltimore, MD, 21205, United States.
| | - Cynthia M Boyd
- Center for Transformative Geriatric Research, Division of Geriatric Medicine and Gerontology, Johns Hopkins University School of Medicine, 200 Eastern Avenue, Baltimore, MD, 21224, United States.
| | - Bruce Leff
- Center for Transformative Geriatric Research, Division of Geriatric Medicine and Gerontology, Johns Hopkins University School of Medicine, 200 Eastern Avenue, Baltimore, MD, 21224, United States.
| | - Hadi Kharrazi
- Center for Population Health IT, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, 624 N Broadway, Baltimore, MD, 21205, United States; Division of Health Sciences and Informatics, Department of General Internal Medicine, Johns Hopkins University School of Medicine, 2024 East Monument St. S 1-200, Baltimore, MD, 21205, United States.
| |
Collapse
|