1
|
Farcas AM, Crowe RP, Kennel J, Little N, Haamid A, Camacho MA, Pleasant T, Owusu-Ansah S, Joiner AP, Tripp R, Kimbrell J, Grover JM, Ashford S, Burton B, Uribe J, Innes JC, Page DI, Taigman M, Dorsett M. Achieving Equity in EMS Care and Patient Outcomes Through Quality Management Systems: A Position Statement. PREHOSP EMERG CARE 2024:1-11. [PMID: 38727731 DOI: 10.1080/10903127.2024.2352582] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 04/29/2024] [Indexed: 05/18/2024]
Abstract
Improving health and safety in our communities requires deliberate focus and commitment to equity. Inequities are differences in access, treatment, and outcomes between individuals and across populations that are systemic, avoidable, and unjust. Within health care in general, and Emergency Medical Services (EMS) in particular, there are demonstrated inequities in the quality of care provided to patients based on a number of characteristics linked to discrimination, exclusion, or bias. Given the critical role that EMS plays within the health care system, it is imperative that EMS systems reduce inequities by delivering evidence-based, high-quality care for the communities and patients we serve. To achieve equity in EMS care delivery and patient outcomes, the National Association of EMS Physicians recommends that EMS systems and agencies: make health equity a strategic priority and commit to improving equity at all levels.assess and monitor clinical and safety quality measures through the lens of inequities as an integrated part of the quality management process.ensure that data elements are structured to enable equity analysis at every level and routinely evaluate data for limitations hindering equity analysis and improvement.involve patients and community stakeholders in determining data ownership and stewardship to ensure its ongoing evolution and fitness for use for measuring care inequities.address biases as they translate into the quality of care and standards of respect for patients.pursue equity through a framework rooted in the principles of improvement science.
Collapse
Affiliation(s)
- Andra M Farcas
- Department of Emergency Medicine, School of Medicine, University of Colorado, Aurora, Colorado
| | | | - Jamie Kennel
- Oregon Health & Science University and Oregon Institute of Technology, Portland, Oregon
| | | | - Ameera Haamid
- Section of Emergency Medicine, University of Chicago Medicine, Chicago, Illinois
| | - Mario Andres Camacho
- Department of Emergency Medicine, Denver Health Medical Center, School of Medicine, University of Colorado, Denver, Colorado
| | | | - Sylvia Owusu-Ansah
- Division of Pediatric Emergency Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Anjni P Joiner
- Department of Emergency Medicine, School of Medicine, Duke University, Durham, North Carolina
| | - Rickquel Tripp
- Department of Emergency Medicine, University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania
| | - Joshua Kimbrell
- Department of Pre-Hospital Care, Jamaica Hospital Medical Center, Jamaica, New York
| | - Joseph M Grover
- UNC Department of Emergency Medicine, Chapel Hill, North Carolina
| | | | - Brooke Burton
- Unified Fire Authority in Salt Lake County, Salt Lake City, Utah
| | - Jeffrey Uribe
- Department of Emergency Medicine, Medstar Health, Columbia, Maryland
| | - Johanna C Innes
- Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, New York
| | - David I Page
- Center for Prehospital Care, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California
| | | | - Maia Dorsett
- Department of Emergency Medicine, University of Rochester Medical Center, Rochester, New York
| |
Collapse
|
2
|
Ralevski A, Taiyab N, Nossal M, Mico L, Piekos SN, Hadlock J. Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.25.24306380. [PMID: 38712224 PMCID: PMC11071574 DOI: 10.1101/2024.04.25.24306380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Social Determinants of Health (SDoH) are an important part of the exposome and are known to have a large impact on variation in health outcomes. In particular, housing stability is known to be intricately linked to a patient's health status, and pregnant women experiencing housing instability (HI) are known to have worse health outcomes. Most SDoH information is stored in electronic health records (EHRs) as free text (unstructured) clinical notes, which traditionally required natural language processing (NLP) for automatic identification of relevant text or keywords. A patient's housing status can be ambiguous or subjective, and can change from note to note or within the same note, making it difficult to use existing NLP solutions. New developments in NLP allow researchers to prompt LLMs to perform complex, subjective annotation tasks that require reasoning that previously could only be attempted by human annotators. For example, large language models (LLMs) such as GPT (Generative Pre-trained Transformer) enable researchers to analyze complex, unstructured data using simple prompts. We used a secure platform within a large healthcare system to compare the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results from these LLMs were compared with results from manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx). We developed a chain-of-thought prompt requiring evidence and justification for each note from the LLMs, to help maximize the chances of finding relevant text related to HI while minimizing hallucinations and false positives. Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). In most cases, the evidence output by GPT-4 was similar or identical to that of human annotators, and there was no evidence of hallucinations in any of the outputs from GPT-4. Most cases where the annotators and GPT-4 differed were ambiguous or subjective, such as "living in an apartment with too many people". We also looked at GPT-4 performance on de-identified versions of the same notes and found that precision improved slightly (0.936 original, 0.939 de-identified), while recall dropped (0.781 original, 0.704 de-identified). This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs, when compared with manual annotation, provide a scalable, cost-effective solution with the advantage of greater recall. At the same time, further evaluation is needed to address the risk of missed cases and bias in the initial selection of housing-related notes. Additionally, while it was possible to reduce confabulation, signs of unusual justifications remained. Given these factors, together with changes in both LLMs and charting over time, this approach is not yet appropriate for use as a fully-automated process. However, these results demonstrate the potential for using LLMs for computer-assisted annotation with human review, reducing cost and increasing recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.
Collapse
Affiliation(s)
| | - Nadaa Taiyab
- Tegria, 1255 Fourier Dr Ste 101, Madison, WI, 53717, USA
| | - Michael Nossal
- Providence St Joseph Health, 1801 Lind Ave SW Renton, WA, 98057, USA
| | - Lindsay Mico
- Providence St Joseph Health, 1801 Lind Ave SW Renton, WA, 98057, USA
| | | | - Jennifer Hadlock
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
- University of Washington, Biomedical Informatics and Medical Education, Seattle, WA, USA
| |
Collapse
|
3
|
Scherbakov D, Mollalo A, Lenert L. Stressful life events in electronic health records: a scoping review. J Am Med Inform Assoc 2024; 31:1025-1035. [PMID: 38349862 PMCID: PMC10990522 DOI: 10.1093/jamia/ocae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/19/2024] [Accepted: 01/27/2024] [Indexed: 02/15/2024] Open
Abstract
OBJECTIVES Stressful life events, such as going through divorce, can have an important impact on human health. However, there are challenges in capturing these events in electronic health records (EHR). We conducted a scoping review aimed to answer 2 major questions: how stressful life events are documented in EHR and how they are utilized in research and clinical care. MATERIALS AND METHODS Three online databases (EBSCOhost platform, PubMed, and Scopus) were searched to identify papers that included information on stressful life events in EHR; paper titles and abstracts were reviewed for relevance by 2 independent reviewers. RESULTS Five hundred fifty-seven unique papers were retrieved, and of these 70 were eligible for data extraction. Most articles (n = 36, 51.4%) were focused on the statistical association between one or several stressful life events and health outcomes, followed by clinical utility (n = 15, 21.4%), extraction of events from free-text notes (n = 12, 17.1%), discussing privacy and other issues of storing life events (n = 5, 7.1%), and new EHR features related to life events (n = 4, 5.7%). The most frequently mentioned stressful life events in the publications were child abuse/neglect, arrest/legal issues, and divorce/relationship breakup. Almost half of the papers (n = 7, 46.7%) that analyzed clinical utility of stressful events were focused on decision support systems for child abuse, while others (n = 7, 46.7%) were discussing interventions related to social determinants of health in general. DISCUSSION AND CONCLUSIONS Few citations are available on the prevalence and use of stressful life events in EHR reflecting challenges in screening and storing of stressful life events.
Collapse
Affiliation(s)
- Dmitry Scherbakov
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC 29403, United States
| | - Abolfazl Mollalo
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC 29403, United States
| | - Leslie Lenert
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC 29403, United States
| |
Collapse
|
4
|
Sun S, Zack T, Williams CYK, Sushil M, Butte AJ. Topic modeling on clinical social work notes for exploring social determinants of health factors. JAMIA Open 2024; 7:ooad112. [PMID: 38223407 PMCID: PMC10788143 DOI: 10.1093/jamiaopen/ooad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 12/17/2023] [Accepted: 12/23/2023] [Indexed: 01/16/2024] Open
Abstract
Objective Existing research on social determinants of health (SDoH) predominantly focuses on physician notes and structured data within electronic medical records. This study posits that social work notes are an untapped, potentially rich source for SDoH information. We hypothesize that clinical notes recorded by social workers, whose role is to ameliorate social and economic factors, might provide a complementary information source of data on SDoH compared to physician notes, which primarily concentrate on medical diagnoses and treatments. We aimed to use word frequency analysis and topic modeling to identify prevalent terms and robust topics of discussion within a large cohort of social work notes including both outpatient and in-patient consultations. Materials and methods We retrieved a diverse, deidentified corpus of 0.95 million clinical social work notes from 181 644 patients at the University of California, San Francisco. We conducted word frequency analysis related to ICD-10 chapters to identify prevalent terms within the notes. We then applied Latent Dirichlet Allocation (LDA) topic modeling analysis to characterize this corpus and identify potential topics of discussion, which was further stratified by note types and disease groups. Results Word frequency analysis primarily identified medical-related terms associated with specific ICD10 chapters, though it also detected some subtle SDoH terms. In contrast, the LDA topic modeling analysis extracted 11 topics explicitly related to social determinants of health risk factors, such as financial status, abuse history, social support, risk of death, and mental health. The topic modeling approach effectively demonstrated variations between different types of social work notes and across patients with different types of diseases or conditions. Discussion Our findings highlight LDA topic modeling's effectiveness in extracting SDoH-related themes and capturing variations in social work notes, demonstrating its potential for informing targeted interventions for at-risk populations. Conclusion Social work notes offer a wealth of unique and valuable information on an individual's SDoH. These notes present consistent and meaningful topics of discussion that can be effectively analyzed and utilized to improve patient care and inform targeted interventions for at-risk populations.
Collapse
Affiliation(s)
- Shenghuan Sun
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Travis Zack
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Division of Hematology/Oncology, Department of Medicine, UCSF, San Francisco, CA 94143, United States
| | - Christopher Y K Williams
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA 94158, United States
- Center for Data-driven Insights and Innovation, University of California, Office of the President, Oakland, CA 94607, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, United States
| |
Collapse
|
5
|
Hatef E, Chang HY, Richards TM, Kitchen C, Budaraju J, Foroughmand I, Lasser EC, Weiner JP. Development of a Social Risk Score in the Electronic Health Record to Identify Social Needs Among Underserved Populations: Retrospective Study. JMIR Form Res 2024; 8:e54732. [PMID: 38470477 DOI: 10.2196/54732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/02/2024] [Accepted: 02/08/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Patients with unmet social needs and social determinants of health (SDOH) challenges continue to face a disproportionate risk of increased prevalence of disease, health care use, higher health care costs, and worse outcomes. Some existing predictive models have used the available data on social needs and SDOH challenges to predict health-related social needs or the need for various social service referrals. Despite these one-off efforts, the work to date suggests that many technical and organizational challenges must be surmounted before SDOH-integrated solutions can be implemented on an ongoing, wide-scale basis within most US-based health care organizations. OBJECTIVE We aimed to retrieve available information in the electronic health record (EHR) relevant to the identification of persons with social needs and to develop a social risk score for use within clinical practice to better identify patients at risk of having future social needs. METHODS We conducted a retrospective study using EHR data (2016-2021) and data from the US Census American Community Survey. We developed a prospective model using current year-1 risk factors to predict future year-2 outcomes within four 2-year cohorts. Predictors of interest included demographics, previous health care use, comorbidity, previously identified social needs, and neighborhood characteristics as reflected by the area deprivation index. The outcome variable was a binary indicator reflecting the likelihood of the presence of a patient with social needs. We applied a generalized estimating equation approach, adjusting for patient-level risk factors, the possible effect of geographically clustered data, and the effect of multiple visits for each patient. RESULTS The study population of 1,852,228 patients included middle-aged (mean age range 53.76-55.95 years), White (range 324,279/510,770, 63.49% to 290,688/488,666, 64.79%), and female (range 314,741/510,770, 61.62% to 278,488/448,666, 62.07%) patients from neighborhoods with high socioeconomic status (mean area deprivation index percentile range 28.76-30.31). Between 8.28% (37,137/448,666) and 11.55% (52,037/450,426) of patients across the study cohorts had at least 1 social need documented in their EHR, with safety issues and economic challenges (ie, financial resource strain, employment, and food insecurity) being the most common documented social needs (87,152/1,852,228, 4.71% and 58,242/1,852,228, 3.14% of overall patients, respectively). The model had an area under the curve of 0.702 (95% CI 0.699-0.705) in predicting prospective social needs in the overall study population. Previous social needs (odds ratio 3.285, 95% CI 3.237-3.335) and emergency department visits (odds ratio 1.659, 95% CI 1.634-1.684) were the strongest predictors of future social needs. CONCLUSIONS Our model provides an opportunity to make use of available EHR data to help identify patients with high social needs. Our proposed social risk score could help identify the subset of patients who would most benefit from further social needs screening and data collection to avoid potentially more burdensome primary data collection on all patients in a target population of interest.
Collapse
Affiliation(s)
- Elham Hatef
- Division of General Internal Medicine, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, United States
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hsien-Yen Chang
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Thomas M Richards
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Christopher Kitchen
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Janya Budaraju
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Iman Foroughmand
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Elyse C Lasser
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Jonathan P Weiner
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
6
|
Li C, Mowery DL, Ma X, Yang R, Vurgun U, Hwang S, Donnelly HK, Bandhey H, Akhtar Z, Senathirajah Y, Sadhu EM, Getzen E, Freda PJ, Long Q, Becich MJ. Realizing the Potential of Social Determinants Data: A Scoping Review of Approaches for Screening, Linkage, Extraction, Analysis and Interventions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.04.24302242. [PMID: 38370703 PMCID: PMC10871446 DOI: 10.1101/2024.02.04.24302242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Social determinants of health (SDoH) like socioeconomics and neighborhoods strongly influence outcomes, yet standardized SDoH data is lacking in electronic health records (EHR), limiting research and care quality. Methods We searched PubMed using keywords "SDOH" and "EHR", underwent title/abstract and full-text screening. Included records were analyzed under five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions. Results We identified 685 articles, of which 324 underwent full review. Key findings include tailored screening instruments implemented across settings, census and claims data linkage providing contextual SDoH profiles, rule-based and neural network systems extracting SDoH from notes using NLP, connections found between SDoH data and healthcare utilization/chronic disease control, and integrated care management programs executed. However, considerable variability persists across data sources, tools, and outcomes. Discussion Despite progress identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical to fulfill the potential of SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Collapse
Affiliation(s)
- Chenyu Li
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Danielle L. Mowery
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Xiaomeng Ma
- University of Toronto, Institute of Health Policy Management and Evaluations
| | - Rui Yang
- Duke-NUS Medical School, Centre for Quantitative Medicine
| | - Ugurcan Vurgun
- University of Pennsylvania, Institute for Biomedical Informatics
| | - Sy Hwang
- University of Pennsylvania, Institute for Biomedical Informatics
| | | | - Harsh Bandhey
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Zohaib Akhtar
- Northwestern University, Kellogg School of Management
| | - Yalini Senathirajah
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Eugene Mathew Sadhu
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Emily Getzen
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Philip J Freda
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Qi Long
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Michael J. Becich
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| |
Collapse
|
7
|
Guevara M, Chen S, Thomas S, Chaunzwa TL, Franco I, Kann BH, Moningi S, Qian JM, Goldstein M, Harper S, Aerts HJWL, Catalano PJ, Savova GK, Mak RH, Bitterman DS. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med 2024; 7:6. [PMID: 38200151 PMCID: PMC10781957 DOI: 10.1038/s41746-023-00970-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/15/2023] [Indexed: 01/12/2024] Open
Abstract
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
Collapse
Affiliation(s)
- Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Spencer Thomas
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tafadzwa L Chaunzwa
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Idalid Franco
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Benjamin H Kann
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jack M Qian
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Susan Harper
- Adult Resource Office, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands
| | - Paul J Catalano
- Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
8
|
Scherbakov D, Mollalo A, Lenert L. Stressful life events in electronic health records: a scoping review. RESEARCH SQUARE 2023:rs.3.rs-3458708. [PMID: 37886439 PMCID: PMC10602151 DOI: 10.21203/rs.3.rs-3458708/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Objective Stressful life events, such as going through divorce, can have an important impact on human health. However, there are challenges in capturing these events in electronic health records (EHR). We conducted a scoping review aimed to answer two major questions: how stressful life events are documented in EHR and how they are utilized in research and clinical care. Materials and Methods Three online databases (EBSCOhost platform, PubMed, and Scopus) were searched to identify papers that included information on stressful life events in EHR; paper titles and abstracts were reviewed for relevance by two independent reviewers. Results 557 unique papers were retrieved, and of these 70 were eligible for data extraction. Most articles (n=36, 51.4%) were focused on the statistical association between one or several stressful life events and health outcomes, followed by clinical utility (n=15, 21.4%), extraction of events from free-text notes (n=12, 17.1%), discussing privacy and other issues of storing life events (n=5, 7.1%), and new EHR features related to life events (n=4, 5.7%). The most frequently mentioned stressful life events in the publications were child abuse/neglect, arrest/legal issues, and divorce/relationship breakup. Almost half of the papers (n=7, 46.7%) that analyzed clinical utility of stressful events were focused on decision support systems for child abuse, while others (n=7, 46.7%) were discussing interventions related to social determinants of health in general. Discussion and Conclusions Few citations are available on the prevalence and use of stressful life events in EHR reflecting challenges in screening and storing of stressful life events.
Collapse
Affiliation(s)
- Dmitry Scherbakov
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| | - Abolfazl Mollalo
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| | - Leslie Lenert
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| |
Collapse
|
9
|
Scherbakov D, Mollalo A, Lenert L. Stressful life events in electronic health records: a scoping review. RESEARCH SQUARE 2023:rs.3.rs-3458708. [PMID: 37886439 PMCID: PMC10602151 DOI: 10.21203/rs.3.rs-3458708/v2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Objective Stressful life events, such as going through divorce, can have an important impact on human health. However, there are challenges in capturing these events in electronic health records (EHR). We conducted a scoping review aimed to answer two major questions: how stressful life events are documented in EHR and how they are utilized in research and clinical care. Materials and Methods Three online databases (EBSCOhost platform, PubMed, and Scopus) were searched to identify papers that included information on stressful life events in EHR; paper titles and abstracts were reviewed for relevance by two independent reviewers. Results 557 unique papers were retrieved, and of these 70 were eligible for data extraction. Most articles (n=36, 51.4%) were focused on the statistical association between one or several stressful life events and health outcomes, followed by clinical utility (n=15, 21.4%), extraction of events from free-text notes (n=12, 17.1%), discussing privacy and other issues of storing life events (n=5, 7.1%), and new EHR features related to life events (n=4, 5.7%). The most frequently mentioned stressful life events in the publications were child abuse/neglect, arrest/legal issues, and divorce/relationship breakup. Almost half of the papers (n=7, 46.7%) that analyzed clinical utility of stressful events were focused on decision support systems for child abuse, while others (n=7, 46.7%) were discussing interventions related to social determinants of health in general. Discussion and Conclusions Few citations are available on the prevalence and use of stressful life events in EHR reflecting challenges in screening and storing of stressful life events.
Collapse
Affiliation(s)
- Dmitry Scherbakov
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| | - Abolfazl Mollalo
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| | - Leslie Lenert
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina
| |
Collapse
|
10
|
Shafer PR, Davis A, Clark JA. Finding social need-les in a haystack: ascertaining social needs of Medicare patients recorded in the notes of care managers. BMC Health Serv Res 2023; 23:1400. [PMID: 38087286 PMCID: PMC10717654 DOI: 10.1186/s12913-023-10446-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Accepted: 12/06/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Unmet social needs may impair health and access to health care, and intervening on these holds particular promise in high-risk patient populations, such as those with multiple chronic conditions. Our objective was to identify social needs in a patient population at significant risk-Medicare enrollees with multiple chronic illnesses enrolled in care management services-and measure their prevalence prior to any systematic screening. METHODS We partnered with Renova Health, an independent Medicare Chronic Care Management (CCM) provider with patients in 10 states during our study period (January 2017 through August 2020). Our data included over 3,000 Medicare CCM patients, representing nearly 20,000 encounters. We used a dictionary-based natural language processing approach to ascertain the prevalence of six domains of barriers to care (food insecurity, housing instability, utility hardship) and unmet social needs (health care affordability, need for supportive services, transportation) in notes taken during telephonic Medicare CCM patient encounters. RESULTS Barriers to care, specifically need for supportive services (2.4%) and health care affordability (0.8%), were the most prevalent domains identified. Transportation as a barrier to care came up relatively less frequently in CCM encounters (0.1%). Unmet social needs were identified at a comparatively lower rate, with potential housing instability (0.3%) flagged most followed by potential utility hardship (0.2%) and food insecurity (0.1%). CONCLUSIONS There is substantial untapped opportunity to systematically screen for social determinants of health and unmet social needs in care management.
Collapse
Affiliation(s)
- Paul R Shafer
- Department of Health Law, Policy, and Management, School of Public Health, Boston University, 715 Albany Street, Boston, MA, 02118, United States of America.
| | - Amanda Davis
- Department of Health Law, Policy, and Management, School of Public Health, Boston University, 715 Albany Street, Boston, MA, 02118, United States of America
| | - Jack A Clark
- Department of Health Law, Policy, and Management, School of Public Health, Boston University, 715 Albany Street, Boston, MA, 02118, United States of America
| |
Collapse
|
11
|
Harris DR, Anthony N, Quesinberry D, Delcher C. Evidence of housing instability identified by addresses, clinical notes, and diagnostic codes in a real-world population with substance use disorders. J Clin Transl Sci 2023; 7:e196. [PMID: 37771412 PMCID: PMC10523293 DOI: 10.1017/cts.2023.626] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 08/23/2023] [Accepted: 08/25/2023] [Indexed: 09/30/2023] Open
Abstract
Introduction Housing instability is a social determinant of health associated with multiple negative health outcomes including substance use disorders (SUDs). Real-world evidence of housing instability is needed to improve translational research on populations with SUDs. Methods We identified evidence of housing instability by leveraging structured diagnosis codes and unstructured clinical data from electronic health records of 20,556 patients from 2017 to 2021. We applied natural language processing with named-entity recognition and pattern matching to unstructured clinical notes with free-text documentation. Additionally, we analyzed semi-structured addresses containing explicit or implicit housing-related labels. We assessed agreement on identification methods by having three experts review of 300 records. Results Diagnostic codes only identified 58.5% of the population identifiable as having housing instability, whereas 41.5% are identifiable from addresses only (7.1%), clinical notes only (30.4%), or both (4.0%). Reviewers unanimously agreed on 79.7% of cases reviewed; a Fleiss' Kappa score of 0.35 suggested fair agreement yet emphasized the difficulty of analyzing patients having ambiguous housing situations. Among those with poisoning episodes related to stimulants or opioids, diagnosis codes were only able to identify 63.9% of those with housing instability. Conclusions All three data sources yield valid evidence of housing instability; each has their own inherent practical use and limitations. Translational researchers requiring comprehensive real-world evidence of housing instability should optimize and implement use of structured and unstructured data. Understanding the role of housing instability and temporary housing facilities is salient in populations with SUDs.
Collapse
Affiliation(s)
- Daniel R. Harris
- Department of Pharmacy Practice and Science, Institute for Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Kentucky, Lexington, KY, USA
- Kentucky Injury Prevention and Research Center, University of Kentucky, Lexington, KY, USA
| | - Nicholas Anthony
- Department of Pharmacy Practice and Science, Institute for Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Kentucky, Lexington, KY, USA
| | - Dana Quesinberry
- Kentucky Injury Prevention and Research Center, University of Kentucky, Lexington, KY, USA
- Department of Health Management and Policy, College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Chris Delcher
- Department of Pharmacy Practice and Science, Institute for Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Kentucky, Lexington, KY, USA
| |
Collapse
|
12
|
Edgcomb JB, Tseng CH, Pan M, Klomhaus A, Zima BT. Assessing Detection of Children With Suicide-Related Emergencies: Evaluation and Development of Computable Phenotyping Approaches. JMIR Ment Health 2023; 10:e47084. [PMID: 37477974 PMCID: PMC10403798 DOI: 10.2196/47084] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 05/11/2023] [Accepted: 05/29/2023] [Indexed: 07/22/2023] Open
Abstract
BACKGROUND Although suicide is a leading cause of death among children, the optimal approach for using health care data sets to detect suicide-related emergencies among children is not known. OBJECTIVE This study aimed to assess the performance of suicide-related International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) codes and suicide-related chief complaint in detecting self-injurious thoughts and behaviors (SITB) among children compared with clinician chart review. The study also aimed to examine variations in performance by child sociodemographics and type of self-injury, as well as develop machine learning models trained on codified health record data (features) and clinician chart review (gold standard) and test model detection performance. METHODS A gold standard classification of suicide-related emergencies was determined through clinician manual review of clinical notes from 600 emergency department visits between 2015 and 2019 by children aged 10 to 17 years. Visits classified with nonfatal suicide attempt or intentional self-harm using the Centers for Disease Control and Prevention surveillance case definition list of ICD-10-CM codes and suicide-related chief complaint were compared with the gold standard classification. Machine learning classifiers (least absolute shrinkage and selection operator-penalized logistic regression and random forest) were then trained and tested using codified health record data (eg, child sociodemographics, medications, disposition, and laboratory testing) and the gold standard classification. The accuracy, sensitivity, and specificity of each detection approach and relative importance of features were examined. RESULTS SITB accounted for 47.3% (284/600) of the visits. Suicide-related diagnostic codes missed nearly one-third (82/284, 28.9%) and suicide-related chief complaints missed more than half (153/284, 53.9%) of the children presenting to emergency departments with SITB. Sensitivity was significantly lower for male children than for female children (0.69, 95% CI 0.61-0.77 vs 0.84, 95% CI 0.78-0.90, respectively) and for preteens compared with adolescents (0.66, 95% CI 0.54-0.78 vs 0.86, 95% CI 0.80-0.92, respectively). Specificity was significantly lower for detecting preparatory acts (0.68, 95% CI 0.64-0.72) and attempts (0.67, 95% CI 0.63-0.71) than for detecting ideation (0.79, 95% CI 0.75-0.82). Machine learning-based models significantly improved the sensitivity of detection compared with suicide-related codes and chief complaint alone. Models considering all 84 features performed similarly to models considering only mental health-related ICD-10-CM codes and chief complaints (34 features) and models considering non-ICD-10-CM code indicators and mental health-related chief complaints (53 features). CONCLUSIONS The capacity to detect children with SITB may be strengthened by applying a machine learning-based approach to codified health record data. To improve integration between clinical research informatics and child mental health care, future research is needed to evaluate the potential benefits of implementing detection approaches at the point of care and identifying precise targets for suicide prevention interventions in children.
Collapse
Affiliation(s)
- Juliet Beni Edgcomb
- Mental Health Informatics and Data Science (MINDS) Hub, Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, United States
- Department of Psychiatry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Chi-Hong Tseng
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Mengtong Pan
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Alexandra Klomhaus
- Department of Medicine Statistics Core, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| | - Bonnie T Zima
- Mental Health Informatics and Data Science (MINDS) Hub, Center for Community Health, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA, United States
- Department of Psychiatry, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, United States
| |
Collapse
|
13
|
Romanowski B, Ben Abacha A, Fan Y. Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches. J Am Med Inform Assoc 2023; 30:1448-1455. [PMID: 37100768 PMCID: PMC10354779 DOI: 10.1093/jamia/ocad071] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/07/2023] [Accepted: 04/18/2023] [Indexed: 04/28/2023] Open
Abstract
OBJECTIVE Social determinants of health (SDOH) are nonmedical factors that can influence health outcomes. This paper seeks to extract SDOH from clinical texts in the context of the National NLP Clinical Challenges (n2c2) 2022 Track 2 Task. MATERIALS AND METHODS Annotated and unannotated data from the Medical Information Mart for Intensive Care III (MIMIC-III) corpus, the Social History Annotation Corpus, and an in-house corpus were used to develop 2 deep learning models that used classification and sequence-to-sequence (seq2seq) approaches. RESULTS The seq2seq approach had the highest overall F1 scores in the challenge's 3 subtasks: 0.901 on the extraction subtask, 0.774 on the generalizability subtask, and 0.889 on the learning transfer subtask. DISCUSSION Both approaches rely on SDOH event representations that were designed to be compatible with transformer-based pretrained models, with the seq2seq representation supporting an arbitrary number of overlapping and sentence-spanning events. Models with adequate performance could be produced quickly, and the remaining mismatch between representation and task requirements was then addressed in postprocessing. The classification approach used rules to generate entity relationships from its sequence of token labels, while the seq2seq approach used constrained decoding and a constraint solver to recover entity text spans from its sequence of potentially ambiguous tokens. CONCLUSION We proposed 2 different approaches to extract SDOH from clinical texts with high accuracy. However, accuracy suffers on text from new healthcare institutions not present in the training data, and thus generalization remains an important topic for future study.
Collapse
Affiliation(s)
| | | | - Yadan Fan
- Nuance Communications, Burlington, Massachusetts, USA
| |
Collapse
|
14
|
Allen KS, Hood DR, Cummins J, Kasturi S, Mendonca EA, Vest JR. Natural language processing-driven state machines to extract social factors from unstructured clinical documentation. JAMIA Open 2023; 6:ooad024. [PMID: 37081945 PMCID: PMC10112959 DOI: 10.1093/jamiaopen/ooad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/08/2023] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Objective This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. Materials and Methods Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. Results PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Discussion We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. Conclusion The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
Collapse
Affiliation(s)
- Katie S Allen
- Corresponding Author: Katie S. Allen, BS, Center for Biomedical Informatics, Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN 46202, USA;
| | - Dan R Hood
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Jonathan Cummins
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Suranga Kasturi
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Eneida A Mendonca
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Joshua R Vest
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
- Department of Health Policy and Management, Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, Indiana, USA
| |
Collapse
|
15
|
van Baar JM, Shields-Zeeman L, Stronks K, Hagenaars LL. Lifestyle versus social determinants of health in the Dutch parliament: An automated analysis of debate transcripts. SSM Popul Health 2023; 22:101399. [PMID: 37114238 PMCID: PMC10127107 DOI: 10.1016/j.ssmph.2023.101399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 03/12/2023] [Accepted: 04/06/2023] [Indexed: 04/29/2023] Open
Abstract
Although public health scholars increasingly recognize the importance of the social determinants of health (SDOH), health policy outputs tend to emphasize downstream lifestyle factors instead. We use an automated corpus research approach to analyse fourteen years of health policy debate in the Dutch House of Representatives' Health Committee, testing three potential causes of the lack of attention for SDOH: political ideology, by which members of parliament (MPs) from some political orientations may prioritize lifestyle factors over SDOH; lifestyle drift, by which early attention for SDOH during problem analysis is replaced by a lifestyle focus in the development of solutions as the challenges in addressing SDOH become clear; and focusing events, by which political or societal chance events, known to the public and political elites simultaneously, bolster the lifestyle perspective on health. Our analysis shows that overall, the committee spent most of its time discussing neither SDOH nor lifestyle: healthcare financing and service delivery dominated instead. When SDOH or lifestyle were referenced, left-leaning MPs referred significantly more to SDOH and right-leaning MPs significantly more to lifestyle. Temporal effects related to election cycles yielded inconsistent evidence. Finally, peak attention for both lifestyle and SDOH coincided with ongoing political debate instead of exogenous, unforeseen focusing events, and these peaks were rendered relatively insignificant by the larger and more consistent attention for health care. This paper provides a first step toward automated analysis of policy debates at scale, opening up new avenues for the empirical study of health political discourse.
Collapse
Affiliation(s)
- Jeroen M. van Baar
- Trimbos Institute, Netherlands Institute for Mental Health and Addiction, Utrecht, the Netherlands
- Corresponding author. Da Costakade 45, 3521 VS, Utrecht, the Netherlands.
| | - Laura Shields-Zeeman
- Trimbos Institute, Netherlands Institute for Mental Health and Addiction, Utrecht, the Netherlands
- Utrecht University, Faculty of Social and Behavioral Sciences, Utrecht, the Netherlands
| | - Karien Stronks
- Amsterdam University Medical Centers, Department of Public and Occupational Health, University of Amsterdam, Amsterdam, the Netherlands
| | - Luc L. Hagenaars
- Amsterdam University Medical Centers, Department of Public and Occupational Health, University of Amsterdam, Amsterdam, the Netherlands
- Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, United States
| |
Collapse
|
16
|
Wang X, Gupta D, Killian M, He Z. Benchmarking Transformer-Based Models for Identifying Social Determinants of Health in Clinical Notes. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS. IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS 2023; 2023:570-574. [PMID: 38239824 PMCID: PMC10795706 DOI: 10.1109/ichi57859.2023.00102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
Electronic health records (EHR) have been widely used in building machine learning models for health outcomes prediction. However, many EHR-based models are inherently biased due to lack of risk factors on social determinants of health (SDoH), which are responsible for up to 40% preventive deaths. As SDoH information is often captured in clinical notes, recent efforts have been made to extract such information from notes with natural language processing and append it to other structured data. In this work, we benchmark 7 pre-trained transformer-based models, including BERT, ALBERT, BioBERT, BioClinicalBERT, RoBERTa, ELECTRA, and RoBERTa-MIMIC-Trial, for recognizing SDoH terms using a previously annotated corpus of MIMIC-III clinical notes. Our study shows that BioClinicalBERT model performs best on F-1 scores (0.911, 0.923) under both strict and relaxed criteria. This work shows the promise of using transformer-based models for recognizing SDoH information from clinical notes.
Collapse
Affiliation(s)
- Xiaoyu Wang
- Department of Statistics Florida State University Tallahassee, FL, USA
| | - Dipankar Gupta
- College of Medicine University of Florida Gainesville, FL, Florida
| | - Michael Killian
- College of Social Work Florida State University Tallahassee, FL, USA
| | - Zhe He
- School of Information Florida State University Tallahassee, FL, USA
| |
Collapse
|
17
|
Derton A, Guevara M, Chen S, Moningi S, Kozono DE, Liu D, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. JCO Clin Cancer Inform 2023; 7:e2200196. [PMID: 37235847 DOI: 10.1200/cci.22.00196] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/22/2023] [Accepted: 03/23/2023] [Indexed: 05/28/2023] Open
Abstract
PURPOSE There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities. METHODS This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers. RESULTS Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively). CONCLUSION Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.
Collapse
Affiliation(s)
- Abigail Derton
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - David E Kozono
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Dianbo Liu
- Mila-Quebec AI Institute, Montreal, QC, Canada
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
18
|
Lituiev DS, Lacar B, Pak S, Abramowitsch PL, De Marchis EH, Peterson TA. Automatic extraction of social determinants of health from medical notes of chronic lower back pain patients. J Am Med Inform Assoc 2023:7133957. [PMID: 37080559 DOI: 10.1093/jamia/ocad054] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 02/15/2023] [Accepted: 03/18/2023] [Indexed: 04/22/2023] Open
Abstract
OBJECTIVE We applied natural language processing and inference methods to extract social determinants of health (SDoH) information from clinical notes of patients with chronic low back pain (cLBP) to enhance future analyses of the associations between SDoH disparities and cLBP outcomes. MATERIALS AND METHODS Clinical notes for patients with cLBP were annotated for 7 SDoH domains, as well as depression, anxiety, and pain scores, resulting in 626 notes with at least one annotated entity for 364 patients. We used a 2-tier taxonomy with these 10 first-level classes (domains) and 52 second-level classes. We developed and validated named entity recognition (NER) systems based on both rule-based and machine learning approaches and validated an entailment model. RESULTS Annotators achieved a high interrater agreement (Cohen's kappa of 95.3% at document level). A rule-based system (cTAKES), RoBERTa NER, and a hybrid model (combining rules and logistic regression) achieved performance of F1 = 47.1%, 84.4%, and 80.3%, respectively, for first-level classes. DISCUSSION While the hybrid model had a lower F1 performance, it matched or outperformed RoBERTa NER model in terms of recall and had lower computational requirements. Applying an untuned RoBERTa entailment model, we detected many challenging wordings missed by NER systems. Still, the entailment model may be sensitive to hypothesis wording. CONCLUSION This study developed a corpus of annotated clinical notes covering a broad spectrum of SDoH classes. This corpus provides a basis for training machine learning models and serves as a benchmark for predictive models for NER for SDoH and knowledge extraction from clinical texts.
Collapse
Affiliation(s)
- Dmytro S Lituiev
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Benjamin Lacar
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
- Berkeley Institute for Data Science, University of California, Berkeley, California, USA
| | - Sang Pak
- Department of Physical Therapy and Rehabilitation Science, University of California San Francisco, San Francisco, California, USA
| | - Peter L Abramowitsch
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Emilia H De Marchis
- Department of Family & Community Medicine, University of California San Francisco, San Francisco, California, USA
| | - Thomas A Peterson
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
- Department of Orthopaedic Surgery, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
19
|
Lee RY, Kross EK, Torrence J, Li KS, Sibley J, Cohen T, Lober WB, Engelberg RA, Curtis JR. Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome. JAMA Netw Open 2023; 6:e231204. [PMID: 36862411 PMCID: PMC9982698 DOI: 10.1001/jamanetworkopen.2023.1204] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/03/2023] Open
Abstract
IMPORTANCE Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set from identified patients with documented goals-of-care discussions with moderate accuracy (maximal F1 score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial.
Collapse
Affiliation(s)
- Robert Y. Lee
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Erin K. Kross
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Janaki Torrence
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - Kevin S. Li
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | - James Sibley
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
| | - Trevor Cohen
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
| | - William B. Lober
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Biomedical and Health Informatics, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
- Department of Global Health, University of Washington, Seattle
| | - Ruth A. Engelberg
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
| | - J. Randall Curtis
- Cambia Palliative Care Center of Excellence at UW Medicine, University of Washington, Seattle
- Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, University of Washington, Seattle
- Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle
- Department of Health Systems and Population Health, University of Washington, Seattle
| |
Collapse
|
20
|
Stewart de Ramirez S, Shallat J, McClure K, Foulger R, Barenblat L. Screening for Social Determinants of Health: Active and Passive Information Retrieval Methods. Popul Health Manag 2022; 25:781-788. [PMID: 36454231 DOI: 10.1089/pop.2022.0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Screening for social determinants of health (SDOH) is recommended, but numerous barriers exist to implementing SDOH screening in clinical spaces. In this study, the authors identified how both active and passive information retrieval methods may be used in clinical spaces to screen for SDOH and meet patient needs. The authors conducted a retrospective sequential cohort analysis comparing the active identification of SDOH through a patient-led digital manual screening process completed in primary care offices from September 2019 to January 2020 and passive identification of SDOH through natural language processing (NLP) from September 2016 to August 2018, among 1735 patients at a large midwestern tertiary referral hospital system and its associated outlying primary care and outpatient facilities. The percent of patients identified by both the passive and active identification methods as experiencing SDOH varied from 0.3% to 4.7%. The active identification method identified social integration, domestic safety, financial resources, food insecurity, transportation, housing, and stress in proportions ranging from 5% to 36%. The passive method contributed to the identification of financial resource issues and stress, identifying 9.6% and 3% of patients to be experiencing these issues, respectively. SDOH documentation varied by provider type. The combination of passive and active SDOH screening methods can provide a more comprehensive picture by leveraging historic patient interactions, while also eliciting current patient needs. Using passive, NLP-based methods to screen for SDOH will also help providers overcome barriers that have historically prevented screening.
Collapse
Affiliation(s)
- Sarah Stewart de Ramirez
- Department of Population Health Services, OSF HealthCare System, Peoria, Illinois, USA.,Department of Emergency Medicine, University of Illinois College of Medicine at Peoria, Peoria, Illinois, USA
| | - Jaclyn Shallat
- Department of Epidemiology and Biostatistics, University of Illinois at Chicago, Chicago, Illinois, USA
| | - Keaton McClure
- University of Illinois College of Medicine at Peoria, Peoria, Illinois, USA
| | - Roopa Foulger
- Department of Health Care Analytics, OSF HealthCare System, Peoria, Illinois, USA.,Department of OSF OnCall, OSF Healthcare System, Peoria, Illinois, USA
| | | |
Collapse
|
21
|
Yang R, Zhu D, Howard LE, De Hoedt A, Williams SB, Freedland SJ, Klaassen Z. Identification of Patients With Metastatic Prostate Cancer With Natural Language Processing and Machine Learning. JCO Clin Cancer Inform 2022; 6:e2100071. [PMID: 36215673 DOI: 10.1200/cci.21.00071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
PURPOSE Understanding treatment patterns and effectiveness for patients with metastatic prostate cancer (mPCa) is dependent on accurate assessment of metastatic status. The objective was to develop a natural language processing (NLP) model for identifying patients with mPCa and evaluate the model's performance against chart-reviewed data and an International Classification of Diseases (ICD) 9/10 code-based method. METHODS In total, 139,057 radiology reports on 6,211 unique patients from the Department of Veterans Affairs were used. The gold standard was metastases by detailed chart review of radiology reports. NLP performance was assessed by sensitivity, specificity, positive predictive value, negative predictive value, and date of metastases detection. Receiver operating characteristic curves was used to assess model performance. RESULTS When compared with chart review, the NLP model had high sensitivity and specificity (85% and 96%, respectively). The NLP model was able to predict patient-level metastasis status with a sensitivity of 91% and specificity of 81%, whereas sensitivity and specificity using ICD9/10 billing codes were 73% and 86%, respectively. For the NLP model, date of metastases detection was exactly concordant and within < 1 week in 55% and 58% of patients, compared with 8% and 17%, respectively, using the ICD9/10 billing codes method. The area under the curve for the NLP model was 0.911. A limitation is the NLP model was developed on the basis of a subset of patients with mPCa and may not be generalizable to all patients with mPCa. CONCLUSION This population-level NLP model for identifying patients with mPCa was more accurate than using ICD9/10 billing codes when compared with chart-reviewed data. Upon further validation, this model may allow for efficient population-level identification of patients with mPCa.
Collapse
Affiliation(s)
- Ruixin Yang
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Di Zhu
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Lauren E Howard
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Duke Cancer Institute, Duke University School of Medicine, Durham, NC
| | - Amanda De Hoedt
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC
| | - Stephen B Williams
- Division of Urology, Department of Surgery, The University of Texas Medical Branch, Galveston, TX
| | - Stephen J Freedland
- Urology Section, Department of Surgery, Veterans Affairs Health Care System, Durham, NC.,Division of Urology, Department of Surgery, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA.,Center for Integrated Research in Cancer and Lifestyle, Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA
| | - Zachary Klaassen
- Division of Urology, Medical College of Georgia at Augusta University, Augusta, GA.,Georgia Cancer Center, Augusta, GA
| |
Collapse
|
22
|
Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci Rep 2022; 12:15146. [PMID: 36071081 PMCID: PMC9452591 DOI: 10.1038/s41598-022-19358-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 08/29/2022] [Indexed: 12/03/2022] Open
Abstract
Methods relying on diagnostic codes to identify suicidal ideation and suicide attempt in Electronic Health Records (EHRs) at scale are suboptimal because suicide-related outcomes are heavily under-coded. We propose to improve the ascertainment of suicidal outcomes using natural language processing (NLP). We developed information retrieval methodologies to search over 200 million notes from the Vanderbilt EHR. Suicide query terms were extracted using word2vec. A weakly supervised approach was designed to label cases of suicidal outcomes. The NLP validation of the top 200 retrieved patients showed high performance for suicidal ideation (area under the receiver operator curve [AUROC]: 98.6, 95% confidence interval [CI] 97.1–99.5) and suicide attempt (AUROC: 97.3, 95% CI 95.2–98.7). Case extraction produced the best performance when combining NLP and diagnostic codes and when accounting for negated suicide expressions in notes. Overall, we demonstrated that scalable and accurate NLP methods can be developed to identify suicidal behavior in EHRs to enhance prevention efforts, predictive models, and precision medicine.
Collapse
|
23
|
Dorr DA, Quiñones AR, King T, Wei MY, White K, Bejan CA. Prediction of Future Health Care Utilization Through Note-extracted Psychosocial Factors. Med Care 2022; 60:570-578. [PMID: 35658116 PMCID: PMC9262845 DOI: 10.1097/mlr.0000000000001742] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
BACKGROUND Persons with multimorbidity (≥2 chronic conditions) face an increased risk of poor health outcomes, especially as they age. Psychosocial factors such as social isolation, chronic stress, housing insecurity, and financial insecurity have been shown to exacerbate these outcomes, but are not routinely assessed during the clinical encounter. Our objective was to extract these concepts from chart notes using natural language processing and predict their impact on health care utilization for patients with multimorbidity. METHODS A cohort study to predict the 1-year likelihood of hospitalizations and emergency department visits for patients 65+ with multimorbidity with and without psychosocial factors. Psychosocial factors were extracted from narrative notes; all other covariates were extracted from electronic health record data from a large academic medical center using validated algorithms and concept sets. Logistic regression was performed to predict the likelihood of hospitalization and emergency department visit in the next year. RESULTS In all, 76,479 patients were eligible; the majority were White (89%), 54% were female, with mean age 73. Those with psychosocial factors were older, had higher baseline utilization, and more chronic illnesses. The 4 psychosocial factors all independently predicted future utilization (odds ratio=1.27-2.77, C -statistic=0.63). Accounting for demographics, specific conditions, and previous utilization, 3 of 4 of the extracted factors remained predictive (odds ratio=1.13-1.86) for future utilization. Compared with models with no psychosocial factors, they had improved discrimination. Individual predictions were mixed, with social isolation predicting depression and morbidity; stress predicting atherosclerotic cardiovascular disease onset; and housing insecurity predicting substance use disorder morbidity. DISCUSSION Psychosocial factors are known to have adverse health impacts, but are rarely measured; using natural language processing, we extracted factors that identified a higher risk segment of older adults with multimorbidity. Combining these extraction techniques with other measures of social determinants may help catalyze population health efforts to address psychosocial factors to mitigate their health impacts.
Collapse
Affiliation(s)
- David A. Dorr
- Department of Medical Informatics & Clinical Epidemiology; Oregon Health & Science University; Portland, OR
| | - Ana R. Quiñones
- Department of Family Medicine; Oregon Health & Science University; Portland, OR
| | - Taylor King
- Department of Medical Informatics & Clinical Epidemiology; Oregon Health & Science University; Portland, OR
| | | | - Kellee White
- Department of Health Policy and Management; University of Maryland; College Park, MD
| | - Cosmin A. Bejan
- Department of Biomedical Informatics; Vanderbilt University Medical Center; Nashville, TN, USA
| |
Collapse
|
24
|
Shah-Mohammadi F, Cui W, Bachi K, Hurd Y, Finkelstein J. Using Natural Language Processing of Clinical Notes to Predict Outcomes of Opioid Treatment Program. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:4415-4420. [PMID: 36085896 PMCID: PMC9472807 DOI: 10.1109/embc48229.2022.9871960] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Potential of natural language processing (NLP) in extracting patient's information from clinical notes of opioid treatment programs (OTP) and leveraging it in development of predictive models has not been fully explored. The goal of this study was to assess potential of NLP in identifying legal, social, mental, medical and family environment-based determinants of distress from clinical narratives of patients with opioid addiction, and then using this information in predicting OTP outcomes. Around 63% of patients reported improvements after completing OTP. We compared the results of logistics regression and random forest for predictive modeling. Random forest model performed slightly better than logistic regression (75% F1 score) with 74% accuracy. Clinical Relevance- Psychiatric and medical disorders, social, legal and family-based distress are important determinants of distress in patients enrolled in OTP. These information are often recorded in clinical notes. Extraction of this information and their utilization as features in machine learning models will lead to the enhancement of the performance of the OTP outcome predictive models.
Collapse
|
25
|
A case for developing domain-specific vocabularies for extracting suicide factors from healthcare notes. J Psychiatr Res 2022; 151:328-338. [PMID: 35533516 DOI: 10.1016/j.jpsychires.2022.04.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 04/09/2022] [Accepted: 04/18/2022] [Indexed: 11/23/2022]
Abstract
The onset and persistence of life events (LE) such as housing instability, job instability, and reduced social connection have been shown to increase risk of suicide. Predictive models for suicide risk have low sensitivity to many of these factors due to under-reporting in structured electronic health records (EHR) data. In this study, we show how natural language processing (NLP) can help identify LE in clinical notes at higher rates than reported medical codes. We compare domain-specific lexicons formulated from Unified Medical Language System (UMLS) selection, content analysis by subject matter experts (SME) and the Gravity Project, to data-driven expansion through contextual word embedding using Word2Vec. Our analysis covers EHR from the Veterans Affairs (VA) Corporate Data Warehouse (CDW) and measures the prevalence of LE across time for patients with known underlying cause of death in the National Death Index (NDI). We found that NLP methods had higher sensitivity of detecting LE relative to structured EHR (S-EHR) variables. We observed that, on average, suicide cases had higher rates of LE over time when compared to patients who died of non-suicide related causes with no previous history of diagnosed mental illness. When used to discriminate these outcomes, the inclusion of NLP derived variables increased the concentration of LE along the top 0.1%, 0.5% and 1% of predicted risk. LE were less informative when discriminating suicide death from non-suicide related death for patients with diagnosed mental illness.
Collapse
|
26
|
Boch S, Hussain SA, Bambach S, DeShetler C, Chisolm D, Linwood S. Locating Youth Exposed to Parental Justice Involvement in the Electronic Health Record: Development of a Natural Language Processing Model. JMIR Pediatr Parent 2022; 5:e33614. [PMID: 35311681 PMCID: PMC8981008 DOI: 10.2196/33614] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/16/2022] [Accepted: 01/25/2022] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Parental justice involvement (eg, prison, jail, parole, or probation) is an unfortunately common and disruptive household adversity for many US youths, disproportionately affecting families of color and rural families. Data on this adversity has not been captured routinely in pediatric health care settings, and if it is, it is not discrete nor able to be readily analyzed for purposes of research. OBJECTIVE In this study, we outline our process training a state-of-the-art natural language processing model using unstructured clinician notes of one large pediatric health system to identify patients who have experienced a justice-involved parent. METHODS Using the electronic health record database of a large Midwestern pediatric hospital-based institution from 2011-2019, we located clinician notes (of any type and written by any type of provider) that were likely to contain such evidence of family justice involvement via a justice-keyword search (eg, prison and jail). To train and validate the model, we used a labeled data set of 7500 clinician notes identifying whether the patient was ever exposed to parental justice involvement. We calculated the precision and recall of the model and compared those rates to the keyword search. RESULTS The development of the machine learning model increased the precision (positive predictive value) of locating children affected by parental justice involvement in the electronic health record from 61% (a simple keyword search) to 92%. CONCLUSIONS The use of machine learning may be a feasible approach to addressing the gaps in our understanding of the health and health services of underrepresented youth who encounter childhood adversities not routinely captured-particularly for children of justice-involved parents.
Collapse
Affiliation(s)
- Samantha Boch
- College of Nursing, University of Cincinnati, Cincinnati, OH, United States.,James M Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, United States
| | - Syed-Amad Hussain
- IT Research and Innovation, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States
| | - Sven Bambach
- IT Research and Innovation, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States
| | - Cameron DeShetler
- Biomedical Engineering Undergraduate Department, Notre Dame University, Notre Dame, IN, United States
| | - Deena Chisolm
- IT Research and Innovation, Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, United States.,College of Medicine and Public Health, College of Nursing, The Ohio State University, Columbus, OH, United States
| | - Simon Linwood
- Nationwide Children's Hospital, Columbus, OH, United States.,School of Medicine, University of California, Riverside, CA, United States
| |
Collapse
|
27
|
Patel SB, Nguyen NT. Creation of a Mapped, Machine-Readable Taxonomy to Facilitate Extraction of Social Determinants of Health Data from Electronic Health Records. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:959-968. [PMID: 35308929 PMCID: PMC8861691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
A comprehensive, mapped social determinants of health (SDH) taxonomy in machine readable format was developed. The framework is intended to facilitate the extraction of social risk factors (SRFs) out of electronic health record (EHR) data and categorize them by domain and determinant to facilitate interpretation. Where other SDH frameworks have been focused on data input, this framework is designed from a data extraction point of view using EHR data in conjunction with published literature, public health policy documents, and official crosswalk maps. Frameworks developed by leading public health organizations were reviewed and synthesized to create an SDH framework comprising of 97 distinct SRFs organized under 16 domains. 2,329 medical codes across three standardized medical vocabularies, 10,896 free-text diagnosis descriptors, and 25 health insurance keywords were mapped to individual SRFs in the SDH framework. The framework is available as an open-source resource in Python dictionary or JSON format.
Collapse
|
28
|
Park Y, Mulligan N, Gleize M, Kristiansen M, Bettencourt-Silva JH. Discovering Associations between Social Determinants and Health Outcomes: Merging Knowledge Graphs from Literature and Electronic Health Data. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:940-949. [PMID: 35308956 PMCID: PMC8861749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Social Determinants of Health (SDoH) are an increasingly important part of the broader research and public health efforts in understanding individuals' physical and mental well-being. Despite this, non-clinical factors affecting health are poorly recorded in electronic health databases and techniques to study how SDoH might relate to population outcomes are lacking. This paper proposes an approach to systematically identify and quantify associations between SDoH and health-related outcomes in a specific cohort of people by (1) leveraging published evidence from literature to build a knowledge graph of health and social factor associations and (2) analysing a large dataset of claims and medical records where those associations may be found. This work demonstrates how the proposed approach could be used to generate hypotheses and inform further research on SDoH in a data-driven manner.
Collapse
|
29
|
Hatef E, Rouhizadeh M, Nau C, Xie F, Rouillard C, Abu-Nasser M, Padilla A, Lyons LJ, Kharrazi H, Weiner JP, Roblin D. Development and assessment of a natural language processing model to identify residential instability in electronic health records’ unstructured data: a comparison of 3 integrated healthcare delivery systems. JAMIA Open 2022; 5:ooac006. [PMID: 35224458 PMCID: PMC8867582 DOI: 10.1093/jamiaopen/ooac006] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Revised: 01/03/2022] [Accepted: 01/27/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Objective
To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems.
Materials and methods
We included patients 18 years and older who received care at 1 of 3 healthcare systems from 2016 through 2020 and had at least 1 free-text note in the EHR during this period. We conducted the study independently; the NLP algorithm logic and method of validity assessment were identical across sites. The approach to the development of the gold standard for assessment of validity differed across sites. Using the EntityRuler module of spaCy 2.3 Python toolkit, we created a rule-based NLP system made up of expert-developed patterns indicating residential instability at the lead site and enriched the NLP system using insight gained from its application at the other 2 sites. We adapted the algorithm at each site then validated the algorithm using a split-sample approach. We assessed the performance of the algorithm by measures of positive predictive value (precision), sensitivity (recall), and specificity.
Results
The NLP algorithm performed with moderate precision (0.45, 0.73, and 1.0) at 3 sites. The sensitivity and specificity of the NLP algorithm varied across 3 sites (sensitivity: 0.68, 0.85, and 0.96; specificity: 0.69, 0.89, and 1.0).
Discussion
The performance of this NLP algorithm to identify residential instability in 3 different healthcare systems suggests the algorithm is generally valid and applicable in other healthcare systems with similar EHRs.
Conclusion
The NLP approach developed in this project is adaptable and can be modified to extract types of social needs other than residential instability from EHRs across different healthcare systems.
Collapse
Affiliation(s)
- Elham Hatef
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Masoud Rouhizadeh
- Institute for Clinical and Translational Research, Johns Hopkins Medical Institute, Baltimore, Maryland, USA
| | - Claudia Nau
- Kaiser Permanente Southern Caifornia, Pasadena, California, USA
| | - Fagen Xie
- Kaiser Permanente Southern Caifornia, Pasadena, California, USA
| | | | | | - Ariadna Padilla
- Kaiser Permanente Southern Caifornia, Pasadena, California, USA
| | | | - Hadi Kharrazi
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
- Department of Medicine Division of Health Sciences Informatics, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Jonathan P Weiner
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Douglas Roblin
- Kaiser Permanente Mid-Atlantic States, Rockville, Maryland, USA
| |
Collapse
|
30
|
Shah-Mohammadi F, Cui W, Bachi K, Hurd Y, Finkelstein J. Comparative Analysis of Patient Distress in Opioid Treatment Programs using Natural Language Processing. BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, INTERNATIONAL JOINT CONFERENCE, BIOSTEC ... REVISED SELECTED PAPERS. BIOSTEC (CONFERENCE) 2022; 2022:319-326. [PMID: 35265945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Psychiatric and medical disorders, social and family environment, and legal distress are important determinants of distress that impact the effectiveness of the treatment in opioid treatment program (OTP). This information is not routinely captured in electronic health record, but may be found in clinical notes. This study aims to explore the feasibility and effectiveness of natural language processing (NLP) strategy for identifying legal, social, mental and medical determinates of distress along with emotional pain rooted in family environment from clinical narratives of patients with opioid addiction, and then using this information to find its impact on OTP outcomes. Analysis in this study showed that mental and legal distress significantly impact the result of the treatment in OTP.
Collapse
Affiliation(s)
| | - Wanting Cui
- Icahn School of Medicine at Mount Sinai, New York, NY, U.S.A
| | - Keren Bachi
- Icahn School of Medicine at Mount Sinai, New York, NY, U.S.A
| | - Yasmin Hurd
- Icahn School of Medicine at Mount Sinai, New York, NY, U.S.A
| | | |
Collapse
|
31
|
Edgcomb J, Coverdale J, Aggarwal R, Guerrero APS, Brenner AM. Applications of Clinical Informatics to Child Mental Health Care: a Call to Action to Bridge Practice and Training. ACADEMIC PSYCHIATRY : THE JOURNAL OF THE AMERICAN ASSOCIATION OF DIRECTORS OF PSYCHIATRIC RESIDENCY TRAINING AND THE ASSOCIATION FOR ACADEMIC PSYCHIATRY 2022; 46:11-17. [PMID: 35175570 PMCID: PMC8852995 DOI: 10.1007/s40596-022-01595-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Affiliation(s)
- Juliet Edgcomb
- University of California Los Angeles Semel Institute for Neuroscience and Human Behavior, Los Angeles, CA, USA.
| | | | | | | | - Adam M Brenner
- University of Texas Southwestern Medical Center, Dallas, TX, USA
| |
Collapse
|
32
|
Patra BG, Sharma MM, Vekaria V, Adekkanattu P, Patterson OV, Glicksberg B, Lepow LA, Ryu E, Biernacka JM, Furmanchuk A, George TJ, Hogan W, Wu Y, Yang X, Bian J, Weissman M, Wickramaratne P, Mann JJ, Olfson M, Campion TR, Weiner M, Pathak J. Extracting social determinants of health from electronic health records using natural language processing: a systematic review. J Am Med Inform Assoc 2021; 28:2716-2727. [PMID: 34613399 PMCID: PMC8633615 DOI: 10.1093/jamia/ocab170] [Citation(s) in RCA: 61] [Impact Index Per Article: 20.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/09/2021] [Accepted: 08/04/2021] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. MATERIALS AND METHODS A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. RESULTS Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). CONCLUSION NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
Collapse
Affiliation(s)
- Braja G Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Mohit M Sharma
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Veer Vekaria
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Olga V Patterson
- Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, USA
- US Department of Veterans Affairs, Salt Lake City, Utah, USA
| | | | - Lauren A Lepow
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Euijung Ryu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Joanna M Biernacka
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Thomas J George
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - William Hogan
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, Florida, USA, and
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Myrna Weissman
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Priya Wickramaratne
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - J John Mann
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Mark Olfson
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Thomas R Campion
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Mark Weiner
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
33
|
Bompelli A, Wang Y, Wan R, Singh E, Zhou Y, Xu L, Oniani D, Kshatriya BSA, Balls-Berry J(JE, Zhang R. Social and Behavioral Determinants of Health in the Era of Artificial Intelligence with Electronic Health Records: A Scoping Review. HEALTH DATA SCIENCE 2021; 2021:9759016. [PMID: 38487504 PMCID: PMC10880156 DOI: 10.34133/2021/9759016] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 06/28/2021] [Indexed: 03/17/2024]
Abstract
Background. There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been limited review into how to make the most of SBDH information from EHRs using AI approaches.Methods. A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided.Results. Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, natural language processing (NLP) approaches for extracting SBDH from clinical notes, and predictive models using SBDH for health outcomes.Discussion. The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using NLP technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues.Conclusion. Despite known associations between SBDH and diseases, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, ultimately promoting health and health equity.
Collapse
Affiliation(s)
- Anusha Bompelli
- Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, USA
| | - Ruyuan Wan
- Department of Computer Science, University of Minnesota, USA
| | - Esha Singh
- Department of Computer Science, University of Minnesota, USA
| | - Yuqi Zhou
- Institute for Health Informatics and College of Pharmacy, University of Minnesota, USA
| | - Lin Xu
- Carlson School of Business, University of Minnesota, USA
| | - David Oniani
- Department of Computer Science and Mathematics, Luther College, USA
| | | | | | - Rui Zhang
- Institute for Health Informatics, Department of Pharmaceutical Care & Health Systems, University of Minnesota, USA
| |
Collapse
|
34
|
Stemerman R, Arguello J, Brice J, Krishnamurthy A, Houston M, Kitzmiller R. Identification of social determinants of health using multi-label classification of electronic health record clinical notes. JAMIA Open 2021; 4:ooaa069. [PMID: 34514351 PMCID: PMC8423426 DOI: 10.1093/jamiaopen/ooaa069] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/16/2020] [Accepted: 11/20/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES Social determinants of health (SDH), key contributors to health, are rarely systematically measured and collected in the electronic health record (EHR). We investigate how to leverage clinical notes using novel applications of multi-label learning (MLL) to classify SDH in mental health and substance use disorder patients who frequent the emergency department. METHODS AND MATERIALS We labeled a gold-standard corpus of EHR clinical note sentences (N = 4063) with 6 identified SDH-related domains recommended by the Institute of Medicine for inclusion in the EHR. We then trained 5 classification models: linear-Support Vector Machine, K-Nearest Neighbors, Random Forest, XGBoost, and bidirectional Long Short-Term Memory (BI-LSTM). We adopted 5 common evaluation measures: accuracy, average precision-recall (AP), area under the curve receiver operating characteristic (AUC-ROC), Hamming loss, and log loss to compare the performance of different methods for MLL classification using the F1 score as the primary evaluation metric. RESULTS Our results suggested that, overall, BI-LSTM outperformed the other classification models in terms of AUC-ROC (93.9), AP (0.76), and Hamming loss (0.12). The AUC-ROC values of MLL models of SDH related domains varied between (0.59-1.0). We found that 44.6% of our study population (N = 1119) had at least one positive documentation of SDH. DISCUSSION AND CONCLUSION The proposed approach of training an MLL model on an SDH rich data source can produce a high performing classifier using only unstructured clinical notes. We also provide evidence that model performance is associated with lexical diversity by health professionals and the auto-generation of clinical note sentences to document SDH.
Collapse
Affiliation(s)
- Rachel Stemerman
- Carolina Health Informatics Program, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jaime Arguello
- School of Information and Library Sciences, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Jane Brice
- Department of Emergency Medicine, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Ashok Krishnamurthy
- Department of Computer Science, The University of North Carolina, Chapel Hill, North Carolina, USA
| | - Mary Houston
- Department of Emergency Medicine, The University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Rebecca Kitzmiller
- School of Nursing, The University of North Carolina, Chapel Hill, North Carolina, USA
| |
Collapse
|
35
|
Reeves RM, Christensen L, Brown JR, Conway M, Levis M, Gobbel GT, Shah RU, Goodrich C, Ricket I, Minter F, Bohm A, Bray BE, Matheny ME, Chapman W. Adaptation of an NLP system to a new healthcare environment to identify social determinants of health. J Biomed Inform 2021; 120:103851. [PMID: 34174396 DOI: 10.1016/j.jbi.2021.103851] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 06/16/2021] [Accepted: 06/21/2021] [Indexed: 11/18/2022]
Abstract
Social determinants of health (SDoH) are increasingly important factors for population health, healthcare outcomes, and care delivery. However, many of these factors are not reliably captured within structured electronic health record (EHR) data. In this work, we evaluated and adapted a previously published NLP tool to include additional social risk factors for deployment at Vanderbilt University Medical Center in an Acute Myocardial Infarction cohort. We developed a transformation of the SDoH outputs of the tool into the OMOP common data model (CDM) for re-use across many potential use cases, yielding performance measures across 8 SDoH classes of precision 0.83 recall 0.74 and F-measure of 0.78.
Collapse
Affiliation(s)
- Ruth M Reeves
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States.
| | - Lee Christensen
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Jeremiah R Brown
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Michael Conway
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Maxwell Levis
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Glenn T Gobbel
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Division of General Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States
| | - Rashmee U Shah
- Division of Cardiovascular Medicine, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Christine Goodrich
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Iben Ricket
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Freneka Minter
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Andrew Bohm
- Department of Epidemiology and Biomedical Data Science, Dartmouth Geisel School of Medicine, Hanover, NH, United States
| | - Bruce E Bray
- Division of Cardiovascular Medicine, University of Utah School of Medicine, Salt Lake City, UT, United States; Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States
| | - Michael E Matheny
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States; Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States; Division of General Internal Medicine, Vanderbilt University Medical Center, Nashville, TN, United States; Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, Nashville, TN, United States
| | - Wendy Chapman
- Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, United States; Centre for Clinical and Public Health Informatics, University of Melbourne, Melbourne, Australia
| |
Collapse
|
36
|
Makridis CA, Strebel T, Marconi V, Alterovitz G. Designing COVID-19 mortality predictions to advance clinical outcomes: Evidence from the Department of Veterans Affairs. BMJ Health Care Inform 2021; 28:bmjhci-2020-100312. [PMID: 34108143 PMCID: PMC8190987 DOI: 10.1136/bmjhci-2020-100312] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 03/17/2021] [Accepted: 03/31/2021] [Indexed: 12/21/2022] Open
Abstract
Using administrative data on all Veterans who enter Department of Veterans Affairs (VA) medical centres throughout the USA, this paper uses artificial intelligence (AI) to predict mortality rates for patients with COVID-19 between March and August 2020. First, using comprehensive data on over 10 000 Veterans’ medical history, demographics and lab results, we estimate five AI models. Our XGBoost model performs the best, producing an area under the receive operator characteristics curve (AUROC) and area under the precision-recall curve of 0.87 and 0.41, respectively. We show how focusing on the performance of the AUROC alone can lead to unreliable models. Second, through a unique collaboration with the Washington D.C. VA medical centre, we develop a dashboard that incorporates these risk factors and the contributing sources of risk, which we deploy across local VA medical centres throughout the country. Our results provide a concrete example of how AI recommendations can be made explainable and practical for clinicians and their interactions with patients.
Collapse
Affiliation(s)
- Christos A Makridis
- National Artificial Intelligence Institute at the Department of Veterans Affairs, US Department of Veterans Affairs, Washington, District of Columbia, USA .,Digital Economy Lab, Stanford University, Stanford University, Stanford, California, USA
| | - Tim Strebel
- Washington D.C. VA Medical Center, Department of Veterans Affairs, Washington, District of Columbia, USA
| | - Vincent Marconi
- Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
| | - Gil Alterovitz
- National Artificial Intelligence Institute at the Department of Veterans Affairs, US Department of Veterans Affairs, Washington, District of Columbia, USA.,Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
37
|
Bear Don't Walk Iv OJ, Sun T, Perotte A, Elhadad N. Clinically relevant pretraining is all you need. J Am Med Inform Assoc 2021; 28:1970-1976. [PMID: 34151966 DOI: 10.1093/jamia/ocab086] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 04/19/2021] [Accepted: 05/03/2021] [Indexed: 11/14/2022] Open
Abstract
Clinical notes present a wealth of information for applications in the clinical domain, but heterogeneity across clinical institutions and settings presents challenges for their processing. The clinical natural language processing field has made strides in overcoming domain heterogeneity, while pretrained deep learning models present opportunities to transfer knowledge from one task to another. Pretrained models have performed well when transferred to new tasks; however, it is not well understood if these models generalize across differences in institutions and settings within the clinical domain. We explore if institution or setting specific pretraining is necessary for pretrained models to perform well when transferred to new tasks. We find no significant performance difference between models pretrained across institutions and settings, indicating that clinically pretrained models transfer well across such boundaries. Given a clinically pretrained model, clinical natural language processing researchers may forgo the time-consuming pretraining step without a significant performance drop.
Collapse
Affiliation(s)
| | - Tony Sun
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Adler Perotte
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| |
Collapse
|
38
|
Chen M, Tan X, Padman R. Social determinants of health in electronic health records and their impact on analysis and risk prediction: A systematic review. J Am Med Inform Assoc 2021; 27:1764-1773. [PMID: 33202021 DOI: 10.1093/jamia/ocaa143] [Citation(s) in RCA: 99] [Impact Index Per Article: 33.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2020] [Revised: 06/10/2020] [Accepted: 06/20/2020] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVE This integrative review identifies and analyzes the extant literature to examine the integration of social determinants of health (SDoH) domains into electronic health records (EHRs), their impact on risk prediction, and the specific outcomes and SDoH domains that have been tracked. MATERIALS AND METHODS In accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, we conducted a literature search in the PubMed, CINAHL, Cochrane, EMBASE, and PsycINFO databases for English language studies published until March 2020 that examined SDoH domains in the context of EHRs. RESULTS Our search strategy identified 71 unique studies that are directly related to the research questions. 75% of the included studies were published since 2017, and 68% were U.S.-based. 79% of the reviewed articles integrated SDoH information from external data sources into EHRs, and the rest of them extracted SDoH information from unstructured clinical notes in the EHRs. We found that all but 1 study using external area-level SDoH data reported minimum contribution to performance improvement in the predictive models. In contrast, studies that incorporated individual-level SDoH data reported improved predictive performance of various outcomes such as service referrals, medication adherence, and risk of 30-day readmission. We also found little consensus on the SDoH measures used in the literature and current screening tools. CONCLUSIONS The literature provides early and rapidly growing evidence that integrating individual-level SDoH into EHRs can assist in risk assessment and predicting healthcare utilization and health outcomes, which further motivates efforts to collect and standardize patient-level SDoH information.
Collapse
Affiliation(s)
- Min Chen
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Miami, Florida, USA
| | - Xuan Tan
- Department of Information Systems and Business Analytics, College of Business, Florida International University, Miami, Florida, USA
| | - Rema Padman
- The H. John Heinz III College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
39
|
Makridis CA, Zhao DY, Bejan CA, Alterovitz G. Leveraging machine learning to characterize the role of socio-economic determinants on physical health and well-being among veterans. Comput Biol Med 2021; 133:104354. [PMID: 33845269 DOI: 10.1016/j.compbiomed.2021.104354] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Revised: 03/07/2021] [Accepted: 03/20/2021] [Indexed: 02/07/2023]
Abstract
INTRODUCTION We investigate the contribution of demographic, socio-economic, and geographic characteristics as determinants of physical health and well-being to guide public health policies and preventative behavior interventions (e.g., countering coronavirus). METHODS We use machine learning to build predictive models of overall well-being and physical health among veterans as a function of these three sets of characteristics. We link Gallup's U.S. Daily Poll between 2014 and 2017 over a range of demographic and socio-economic characteristics with zipcode characteristics from the Census Bureau to build predictive models of overall and physical well-being. RESULTS Although the predictive models of overall well-being have weak performance, our classification of low levels of physical well-being performed better. Gradient boosting delivered the best results (80.2% precision, 82.4% recall, and 80.4% AUROC) with perceptions of purpose in the workplace and financial anxiety as the most predictive features. Our results suggest that additional measures of socio-economic characteristics are required to better predict physical well-being, particularly among vulnerable groups, like veterans. CONCLUSION Socio-economic characteristics explain large differences in physical and overall well-being. Effective predictive models that incorporate socio-economic data will provide opportunities to create real-time and personalized feedback to help individuals improve their quality of life.
Collapse
Affiliation(s)
- Christos A Makridis
- Stanford University Digital Economy Lab, and National Artificial Intelligence Institute at the Department of Veterans Affairs, 810 Vermont Ave NW, Washington, DC 20420, USA.
| | - David Y Zhao
- Department of Computer Science at Stanford University, Gates Computer Science Building, 353 Jane Stanford Way, Stanford, CA 94305, USA.
| | - Cosmin A Bejan
- Department Biomedical Informatics at Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN, 37203, USA.
| | - Gil Alterovitz
- Harvard Medical School, Boston Children's Hospital, National Artificial Intelligence Institute at the Department of Veterans Affairs, 810 Vermont Ave NW, Washington, DC 20420, USA.
| |
Collapse
|
40
|
Unertl KM, Walsh CG, Clayton EW. Combatting human trafficking in the United States: how can medical informatics help? J Am Med Inform Assoc 2021; 28:384-388. [PMID: 33120418 DOI: 10.1093/jamia/ocaa142] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/11/2020] [Accepted: 06/15/2020] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Human trafficking is a global problem taking many forms, including sex and labor exploitation. Trafficking victims can be any age, although most trafficking begins when victims are adolescents. Many trafficking victims have contact with health-care providers across various health-care contexts, both for emergency and routine care. MATERIALS AND METHODS We propose 4 specific areas where medical informatics can assist with combatting trafficking: screening, clinical decision support, community-facing tools, and analytics that are both descriptive and predictive. Efforts to implement health information technology interventions focused on trafficking must be carefully integrated into existing clinical work and connected to community resources to move beyond identification to provide assistance and to support trauma-informed care. RESULTS We lay forth a research and implementation agenda to integrate human trafficking identification and intervention into routine clinical practice, supported by health information technology. CONCLUSIONS A sociotechnical systems approach is recommended to ensure interventions address the complex issues involved in assisting victims of human trafficking.
Collapse
Affiliation(s)
- Kim M Unertl
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Colin G Walsh
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Ellen Wright Clayton
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,School of Law, Vanderbilt University, Nashville, Tennessee, USA.,Department of Health Policy, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
41
|
Stemerman R, Bunning T, Grover J, Kitzmiller R, Patel MD. Identifying Patient Phenotype Cohorts Using Prehospital Electronic Health Record Data. PREHOSP EMERG CARE 2021:1-14. [PMID: 33315497 DOI: 10.1080/10903127.2020.1859658] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 12/01/2020] [Indexed: 10/22/2022]
Abstract
Objective: Emergency medical services (EMS) provide critical interventions for patients with acute illness and injury and are important in implementing prehospital emergency care research. Retrospective, manual patient record review, the current reference-standard for identifying patient cohorts, requires significant time and financial investment. We developed automated classification models to identify eligible patients for prehospital clinical trials using EMS clinical notes and compared model performance to manual review.Methods: With eligibility criteria for an ongoing prehospital study of chest pain patients, we used EMS clinical notes (n = 1208) to manually classify patients as eligible, ineligible, and indeterminate. We randomly split these same records into training and test sets to develop and evaluate machine-learning (ML) algorithms using natural language processing (NLP) for feature (variable) selection. We compared models to the manual classification to calculate sensitivity, specificity, accuracy, positive predictive value, and F1 measure. We measured clinical expert time to perform review for manual and automated methods.Results: ML models' sensitivity, specificity, accuracy, positive predictive value, and F1 measure ranged from 0.93 to 0.98. Compared to manual classification (N = 363 records), the automated method excluded 90.9% of records as ineligible and leaving only 33 records for manual review.Conclusions: Our ML derived approach demonstrates the feasibility of developing a high-performing, automated classification system using EMS clinical notes to streamline the identification of a specific cardiac patient cohort. This efficient approach can be leveraged to facilitate prehospital patient-trial matching, patient phenotyping (i.e. influenza-like illness), and create prehospital patient registries.
Collapse
Affiliation(s)
- Rachel Stemerman
- Received November 19, 2020 from Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina (RS, RK); Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina (TB); Department of Emergency Medicine, University of North Carolina, Chapel Hill, North Carolina (JG, MDP) Revision received; accepted for publication December 1, 2020
| | - Thomas Bunning
- Received November 19, 2020 from Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina (RS, RK); Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina (TB); Department of Emergency Medicine, University of North Carolina, Chapel Hill, North Carolina (JG, MDP) Revision received; accepted for publication December 1, 2020
| | - Joseph Grover
- Received November 19, 2020 from Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina (RS, RK); Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina (TB); Department of Emergency Medicine, University of North Carolina, Chapel Hill, North Carolina (JG, MDP) Revision received; accepted for publication December 1, 2020
| | - Rebecca Kitzmiller
- Received November 19, 2020 from Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina (RS, RK); Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina (TB); Department of Emergency Medicine, University of North Carolina, Chapel Hill, North Carolina (JG, MDP) Revision received; accepted for publication December 1, 2020
| | - Mehul D Patel
- Received November 19, 2020 from Carolina Health Informatics Program, University of North Carolina, Chapel Hill, North Carolina (RS, RK); Department of Anesthesiology, Duke University Medical Center, Durham, North Carolina (TB); Department of Emergency Medicine, University of North Carolina, Chapel Hill, North Carolina (JG, MDP) Revision received; accepted for publication December 1, 2020
| |
Collapse
|
42
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
43
|
Bensken WP, Krieger NI, Berg KA, Einstadter D, Dalton JE, Perzynski AT. Health Status and Chronic Disease Burden of the Homeless Population: An Analysis of Two Decades of Multi-Institutional Electronic Medical Records. J Health Care Poor Underserved 2021; 32:1619-1634. [PMID: 34421052 PMCID: PMC8477616 DOI: 10.1353/hpu.2021.0153] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Using a multi-institutional EMR registry, we extracted housing status and evaluated the presence of several important comorbidities in order to describe the demographics and comorbidity burden of persons experiencing homelessness in northeast Ohio and compare this to non-homeless individuals of varying socioeconomic position. Of 1,974,766 patients in the EMR registry, we identified 15,920 (0.8%) as homeless, 351,279 (17.8%) as non-homeless and in the top quintile of area deprivation index (ADI), and 1,607,567 (81.4%) as non-homeless and in the lower four quintiles of area deprivation. The comorbidity burden was highest in the homeless population with depression (48.1%), anxiety (45.8%), hypertension (44.2%), cardiovascular disease (18.4%), and hepatitis (18.1%) among the most prevalent conditions. We conclude that it is possible to identify homeless individuals and document their comorbidity burden using a multi-institutional EMR registry, in order to guide future interventions to address the health of the homeless at the health-system and community level.
Collapse
|
44
|
Lee RY, Brumback LC, Lober WB, Sibley J, Nielsen EL, Treece PD, Kross EK, Loggers ET, Fausto JA, Lindvall C, Engelberg RA, Curtis JR. Identifying Goals of Care Conversations in the Electronic Health Record Using Natural Language Processing and Machine Learning. J Pain Symptom Manage 2021; 61:136-142.e2. [PMID: 32858164 PMCID: PMC7769906 DOI: 10.1016/j.jpainsymman.2020.08.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 08/14/2020] [Accepted: 08/20/2020] [Indexed: 11/19/2022]
Abstract
CONTEXT Goals-of-care discussions are an important quality metric in palliative care. However, goals-of-care discussions are often documented as free text in diverse locations. It is difficult to identify these discussions in the electronic health record (EHR) efficiently. OBJECTIVES To develop, train, and test an automated approach to identifying goals-of-care discussions in the EHR, using natural language processing (NLP) and machine learning (ML). METHODS From the electronic health records of an academic health system, we collected a purposive sample of 3183 EHR notes (1435 inpatient notes and 1748 outpatient notes) from 1426 patients with serious illness over 2008-2016, and manually reviewed each note for documentation of goals-of-care discussions. Separately, we developed a program to identify notes containing documentation of goals-of-care discussions using NLP and supervised ML. We estimated the performance characteristics of the NLP/ML program across 100 pairs of randomly partitioned training and test sets. We repeated these methods for inpatient-only and outpatient-only subsets. RESULTS Of 3183 notes, 689 contained documentation of goals-of-care discussions. The mean sensitivity of the NLP/ML program was 82.3% (SD 3.2%), and the mean specificity was 97.4% (SD 0.7%). NLP/ML results had a median positive likelihood ratio of 32.2 (IQR 27.5-39.2) and a median negative likelihood ratio of 0.18 (IQR 0.16-0.20). Performance was better in inpatient-only samples than outpatient-only samples. CONCLUSION Using NLP and ML techniques, we developed a novel approach to identifying goals-of-care discussions in the EHR. NLP and ML represent a potential approach toward measuring goals-of-care discussions as a research outcome and quality metric.
Collapse
Affiliation(s)
- Robert Y Lee
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA
| | - Lyndia C Brumback
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Department of Biostatistics, University of Washington, Seattle, Washington, USA
| | - William B Lober
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington, USA; Department of Bioinformatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - James Sibley
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington, USA; Department of Bioinformatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Elizabeth L Nielsen
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA
| | - Patsy D Treece
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA; Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington, USA
| | - Erin K Kross
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA
| | - Elizabeth T Loggers
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA; Seattle Cancer Care Alliance, Seattle, Washington, USA
| | - James A Fausto
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Department of Family Medicine, University of Washington, Seattle, Washington, USA
| | - Charlotta Lindvall
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Ruth A Engelberg
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA
| | - J Randall Curtis
- Cambia Palliative Care Center of Excellence, University of Washington, Seattle, Washington, USA; Division of Pulmonary, Critical Care, and Sleep Medicine, Department of Medicine, Harborview Medical Center, University of Washington, Seattle, Washington, USA; Department of Biobehavioral Nursing and Health Informatics, University of Washington, Seattle, Washington, USA; Department of Bioethics and Humanities, University of Washington, Seattle, Washington, USA.
| |
Collapse
|
45
|
Montgomery AE, Tsai J, Blosnich JR. Demographic Correlates of Veterans' Adverse Social Determinants of Health. Am J Prev Med 2020; 59:828-836. [PMID: 33220754 DOI: 10.1016/j.amepre.2020.05.024] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 04/28/2020] [Accepted: 05/14/2020] [Indexed: 10/23/2022]
Abstract
INTRODUCTION Identifying patient populations most affected by adverse social determinants of health can direct epidemiologic investigation, guide development of tailored interventions, and improve clinical care and outcomes. This study explores how demographic characteristics are associated with specific types-and cumulative burden-of adverse social determinants of health among Veterans seeking Veterans Health Administration health care. METHODS Data included electronic health records for 293,872 patients of Veterans Health Administration facilities in one region of the country between October 1, 2015 and September 30, 2016. A series of multiple logistic regressions conducted between August and December 2019 examined how demographic variables are associated with 7 adverse social determinants of health. A negative binomial regression examined the association between demographic characteristics and cumulative burden of social determinants of health. RESULTS Demographic characteristics were associated with increased odds of each type of adverse social determinant of health: minority race, unmarried status, and Veterans' service connected disability status. Conversely, living in a rural area and being aged >40 years were associated with decreased odds of most of the adverse social determinants of health studied here. Hispanic ethnicity and female sex were inconsistently associated with increased odds of some adverse social determinants of health and decreased odds of others. These results are mirrored in the analysis of predictors of cumulative burden of adverse social determinants of health. CONCLUSIONS There is increasing and ongoing interest in ways to identify and respond to patients' experiences of or exposures to adverse social determinants of health. Demographic characteristics may signal the need to assess for adverse social determinants of health. Analyses exploring latent factors among these social determinants (e.g., poverty) may inform strategies to identify patients experiencing adverse social determinants of health and provide responsive interventions.
Collapse
Affiliation(s)
- Ann Elizabeth Montgomery
- Birmingham Veterans Affairs Medical Center, U.S. Department of Veterans Affairs, Birmingham, Alabama; School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama.
| | - Jack Tsai
- National Center on Homelessness Among Veterans, U.S. Department of Veterans Affairs, Tampa, Florida
| | - John R Blosnich
- University of Southern California Suzanne Dworak-Peck School of Social Work, Los Angeles, California
| |
Collapse
|
46
|
Lynch KE, Gatsby E, Viernes B, Schliep KC, Whitcomb BW, Alba PR, DuVall SL, Blosnich JR. Evaluation of Suicide Mortality Among Sexual Minority US Veterans From 2000 to 2017. JAMA Netw Open 2020; 3:e2031357. [PMID: 33369662 PMCID: PMC7770555 DOI: 10.1001/jamanetworkopen.2020.31357] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
IMPORTANCE Identification of subgroups at greatest risk for suicide mortality is essential for prevention efforts and targeting interventions. Sexual minority individuals may have an increased risk for suicide compared with heterosexual individuals, but a lack of sufficiently powered studies with rigorous methods for determining sexual orientation has limited the knowledge on this potential health disparity. OBJECTIVE To investigate suicide mortality among sexual minority veterans using Veterans Health Administration (VHA) electronic health record data. DESIGN, SETTING, AND PARTICIPANTS This retrospective population-based cohort study used data on 8.1 million US veterans enrolled in the VHA after fiscal year 1999 that were obtained from VHA electronic health records from October 1, 1999 to September 30, 2017. Data analysis was carried out from March 1, 2020 to October 31, 2020. EXPOSURE Veterans with documentation of a minority sexual orientation. Documentation of sexual minority status was obtained through natural language processing of clinical notes and extraction of structured administrative data for sexual orientation in VHA electronic health records. MAIN OUTCOMES AND MEASURES Suicide mortality rate using data on the underlying cause of death obtained from the National Death Index. Crude and age-adjusted mortality rates were calculated for all-cause death and death from suicide among sexual minority veterans compared with the general US population and the general population of veterans. RESULTS Among the 96 893 veterans with at least 1 sexual minority documentation in the electronic health record, the mean (SD) age was 46 (16) years, 68% were male, and 70% were White. Of the 12 591 total deaths, 3.5% were from suicide. Veterans had a significantly higher rate of mortality from suicide (standardized mortality ratio, 4.50; 95% CI, 4.13-4.99) compared with the general US population. Suicide was the fifth leading cause of death in 2017 among sexual minority veterans (3.8% of deaths) and the tenth leading cause of death in the general US population (1.7% of deaths). The crude suicide rate among sexual minority veterans (82.5 per 100 000 person-years) was higher than the rate in the general veteran population (37.7 per 100 000 person-years). CONCLUSIONS AND RELEVANCE The results of this population-based cohort study suggest that sexual minority veterans have a greater risk for suicide than the general US population and the general veteran population. Further research is needed to determine whether and how suicide prevention efforts reach sexual minority veterans.
Collapse
Affiliation(s)
- Kristine E. Lynch
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
- Division of Epidemiology, Department of Internal Medicine, The University of Utah, Salt Lake City
| | - Elise Gatsby
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
| | - Benjamin Viernes
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
- Division of Epidemiology, Department of Internal Medicine, The University of Utah, Salt Lake City
| | - Karen C. Schliep
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
- Department of Family and Preventive Medicine, The University of Utah, Salt Lake City
| | - Brian W. Whitcomb
- Department of Public Health and Health Sciences, University of Massachusetts, Amherst
| | - Patrick R. Alba
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
- Division of Epidemiology, Department of Internal Medicine, The University of Utah, Salt Lake City
| | - Scott L. DuVall
- Veterans Affairs (VA) Informatics and Computing Infrastructure, VA Salt Lake City Health Care System, Salt Lake City, Utah
- Division of Epidemiology, Department of Internal Medicine, The University of Utah, Salt Lake City
| | - John R. Blosnich
- Suzanne Dworak-Peck School of Social Work, University of Southern California, Los Angeles
- Center for Health Equity Research and Promotion, VA Pittsburgh Healthcare System, Pittsburgh, Pennsylvania
| |
Collapse
|
47
|
Byrne T, Baggett T, Land T, Bernson D, Hood ME, Kennedy-Perez C, Monterrey R, Smelson D, Dones M, Bharel M. A classification model of homelessness using integrated administrative data: Implications for targeting interventions to improve the housing status, health and well-being of a highly vulnerable population. PLoS One 2020; 15:e0237905. [PMID: 32817717 PMCID: PMC7446866 DOI: 10.1371/journal.pone.0237905] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 08/06/2020] [Indexed: 11/19/2022] Open
Abstract
Homelessness is poorly captured in most administrative data sets making it difficult to understand how, when, and where this population can be better served. This study sought to develop and validate a classification model of homelessness. Our sample included 5,050,639 individuals aged 11 years and older who were included in a linked dataset of administrative records from multiple state-maintained databases in Massachusetts for the period from 2011-2015. We used logistic regression to develop a classification model with 94 predictors and subsequently tested its performance. The model had high specificity (95.4%), moderate sensitivity (77.8%) for predicting known cases of homelessness, and excellent classification properties (area under the receiver operating curve 0.94; balanced accuracy 86.4%). To demonstrate the potential opportunity that exists for using such a modeling approach to target interventions to mitigate the risk of an adverse health outcome, we also estimated the association between model predicted homeless status and fatal opioid overdoses, finding that model predicted homeless status was associated with a nearly 23-fold increase in the risk of fatal opioid overdose. This study provides a novel approach for identifying homelessness using integrated administrative data. The strong performance of our model underscores the potential value of linking data from multiple service systems to improve the identification of housing instability and to assist government in developing programs that seek to improve health and other outcomes for homeless individuals.
Collapse
Affiliation(s)
- Thomas Byrne
- Boston University School of Social Work, Boston, Massachusetts, United States of America
| | - Travis Baggett
- Boston Health Care for the Homeless Program, Boston, Massachusetts, United States of America
- Division of General Internal Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
| | - Thomas Land
- University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Dana Bernson
- Massachusetts Department of Public Health, Boston, Massachusetts, United States of America
| | - Maria-Elena Hood
- Massachusetts Department of Public Health, Boston, Massachusetts, United States of America
| | - Cheryl Kennedy-Perez
- Massachusetts Department of Public Health, Boston, Massachusetts, United States of America
| | - Rodrigo Monterrey
- Massachusetts Department of Public Health, Boston, Massachusetts, United States of America
| | - David Smelson
- University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Marc Dones
- National Innovation Service, United States of America
| | - Monica Bharel
- Massachusetts Department of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
48
|
Cohen DJ, Wyte-Lake T, Dorr DA, Gold R, Holden RJ, Koopman RJ, Colasurdo J, Warren N. Unmet information needs of clinical teams delivering care to complex patients and design strategies to address those needs. J Am Med Inform Assoc 2020; 27:690-699. [PMID: 32134456 PMCID: PMC7647291 DOI: 10.1093/jamia/ocaa010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Revised: 01/06/2020] [Accepted: 01/16/2020] [Indexed: 12/31/2022] Open
Abstract
OBJECTIVES To identify the unmet information needs of clinical teams delivering care to patients with complex medical, social, and economic needs; and to propose principles for redesigning electronic health records (EHR) to address these needs. MATERIALS AND METHODS In this observational study, we interviewed and observed care teams in 9 community health centers in Oregon and Washington to understand their use of the EHR when caring for patients with complex medical and socioeconomic needs. Data were analyzed using a comparative approach to identify EHR users' information needs, which were then used to produce EHR design principles. RESULTS Analyses of > 300 hours of observations and 51 interviews identified 4 major categories of information needs related to: consistency of social determinants of health (SDH) documentation; SDH information prioritization and changes to this prioritization; initiation and follow-up of community resource referrals; and timely communication of SDH information. Within these categories were 10 unmet information needs to be addressed by EHR designers. We propose the following EHR design principles to address these needs: enhance the flexibility of EHR documentation workflows; expand the ability to exchange information within teams and between systems; balance innovation and standardization of health information technology systems; organize and simplify information displays; and prioritize and reduce information. CONCLUSION Developing EHR tools that are simple, accessible, easy to use, and able to be updated by a range of professionals is critical. The identified information needs and design principles should inform developers and implementers working in community health centers and other settings where complex patients receive care.
Collapse
Affiliation(s)
- Deborah J Cohen
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
- Department of Medical Informatics and Clinical Epidemiology, OregonHealth and Science University, Portland, Oregon, USA
| | - Tamar Wyte-Lake
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - David A Dorr
- Department of Medical Informatics and Clinical Epidemiology, OregonHealth and Science University, Portland, Oregon, USA
| | - Rachel Gold
- Center for Health Research, Kaiser Permanente, Portland, Oregon, USA
- Department of Research, OCHIN Inc, Portland, Oregon, USA
| | - Richard J Holden
- Department of Medicine, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Richelle J Koopman
- Department of Family and Community Medicine, University of Missouri, Columbia, Missouri, USA
| | - Joshua Colasurdo
- Department of Medical Informatics and Clinical Epidemiology, OregonHealth and Science University, Portland, Oregon, USA
| | | |
Collapse
|
49
|
Feller DJ, Zucker J, Walk OBD, Yin MT, Gordon P, Elhadad N. Longitudinal analysis of social and behavioral determinants of health in the EHR: exploring the impact of patient trajectories and documentation practices. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:399-407. [PMID: 32308833 PMCID: PMC7153098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that impede disease self-management and can exacerbate clinical conditions. While recent research in the informatics community has focused on building systems that can automatically infer SBDH from the patient record, it is unclear how such determinants change overtime. This study analyzes the longitudinal characteristics of 4 common SBDH as expressed in the patient record and compares the rates of change among distinct SBDH. In addition, manual review of patient notes was undertaken to establish whether changes in patient SBDH status reflected legitimate changes in patient status or rather potential data quality issues. Our findings suggest that a patient's SBDH status is liable to change over time and that some changes reflect poor social history taking by clinicians.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Jason Zucker
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | | | - Michael T Yin
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | - Peter Gordon
- Division of Infectious Diseases, Department of Medicine, Columbia University, New York, NY USA
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
50
|
Feller DJ, Bear Don't Walk Iv OJ, Zucker J, Yin MT, Gordon P, Elhadad N. Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data. Appl Clin Inform 2020; 11:172-181. [PMID: 32131117 DOI: 10.1055/s-0040-1702214] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
Abstract
BACKGROUND Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that often impede disease management and result in sexually transmitted infections. Despite their importance, SBDH are inconsistently documented in electronic health records (EHRs) and typically collected only in an unstructured format. Evidence suggests that structured data elements present in EHRs can contribute further to identify SBDH in the patient record. OBJECTIVE Explore the automated inference of both the presence of SBDH documentation and individual SBDH risk factors in patient records. Compare the relative ability of clinical notes and structured EHR data, such as laboratory measurements and diagnoses, to support inference. METHODS We attempt to infer the presence of SBDH documentation in patient records, as well as patient status of 11 SBDH, including alcohol abuse, homelessness, and sexual orientation. We compare classification performance when considering clinical notes only, structured data only, and notes and structured data together. We perform an error analysis across several SBDH risk factors. RESULTS Classification models inferring the presence of SBDH documentation achieved good performance (F1 score: 92.7-78.7; F1 considered as the primary evaluation metric). Performance was variable for models inferring patient SBDH risk status; results ranged from F1 = 82.7 for LGBT (lesbian, gay, bisexual, and transgender) status to F1 = 28.5 for intravenous drug use. Error analysis demonstrated that lexical diversity and documentation of historical SBDH status challenge inference of patient SBDH status. Three of five classifiers inferring topic-specific SBDH documentation and 10 of 11 patient SBDH status classifiers achieved highest performance when trained using both clinical notes and structured data. CONCLUSION Our findings suggest that combining clinical free-text notes and structured data provide the best approach in classifying patient SBDH status. Inferring patient SBDH status is most challenging among SBDH with low prevalence and high lexical diversity.
Collapse
Affiliation(s)
- Daniel J Feller
- Department of Biomedical Informatics, Columbia University, New York, New York, United States
| | | | - Jason Zucker
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Michael T Yin
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Peter Gordon
- Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
| | - Noémie Elhadad
- Department of Biomedical Informatics, Columbia University, New York, New York, United States
| |
Collapse
|