1
|
Kim MH, Miramontes S, Mehta S, Schwartz GL, Kim YJ, Yang Y, Hill-Jarrett TG, Cevallos N, Chen R, Glymour MM, Ferguson EL, Zimmerman SC, Choi M, Sims KD. Extracting Housing and Food Insecurity Information From Clinical Notes Using cTAKES. Health Serv Res 2025:e14440. [PMID: 39871689 DOI: 10.1111/1475-6773.14440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2024] [Revised: 10/20/2024] [Accepted: 01/02/2025] [Indexed: 01/29/2025] Open
Abstract
OBJECTIVE To assess the utility and challenges of using natural language processing (NLP) in electronic health records (EHRs) to ascertain health-related social needs (HRSNs) among older adults. STUDY SETTING AND DESIGN We extracted HRSN information using the NLP system Clinical Text Analysis and Knowledge Extraction System (cTAKES), combined with Concept Unique Identifiers and Systematized Nomenclature for Medicine codes. We validated cTAKES performance, via manual chart review, on two HRSNs: food insecurity, which was included in the healthcare system's HRSN screening tool, and housing insecurity, which was not. DATA SOURCES AND ANALYTIC SAMPLE De-identified EHRs in a large California healthcare system (January 2013 through October 2022) from 119,127 patients aged 55+ in primary and emergency care settings (n = 1,385,259 clinical notes). PRINCIPAL FINDINGS Although cTAKES had a moderate positive predictive value (77.5%) for housing insecurity, housing challenges among older adults frequently did not align with the concepts the algorithm recognized. cTAKES performed poorly for food insecurity (positive predictive value: 18.5%) because this NLP system incorrectly flagged structured fields from the screening tool. CONCLUSION Unstandardized terminology and poor integration of HRSN screeners in EHR remain important barriers to identifying older adults' food and housing insecurity using cTAKES.
Collapse
Affiliation(s)
- Min Hee Kim
- Institute for Health, Health Care Policy, Aging Research & School of Nursing, Rutgers, The State University of New Jersey, New Brunswick, New Jersey, USA
- Philip R. Lee Institute for Health Policy Studies, University of California San Francisco, San Francisco, California, USA
| | - Silvia Miramontes
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, California, USA
| | - Shivani Mehta
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, California, USA
| | - Gabriel L Schwartz
- Department of Health Management & Policy, Drexel University Dornsife School of Public Health, Philadelphia, Pennsylvania, USA
| | - Ye Ji Kim
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Yulin Yang
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
| | - Tanisha G Hill-Jarrett
- Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, California, USA
| | - Nicolas Cevallos
- School of Medicine, University of California San Francisco, San Francisco, California, USA
| | - Ruijia Chen
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| | - M Maria Glymour
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
| | - Erin L Ferguson
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, California, USA
| | - Scott C Zimmerman
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, California, USA
| | - Minhyuk Choi
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
| | - Kendra D Sims
- Department of Epidemiology, Boston University School of Public Health, Boston, Massachusetts, USA
- Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
| |
Collapse
|
2
|
Fereydooni S, Valdez C, Williams LC, Verma A, Judson B. Racial Disparities in Perioperative Outcomes for Patients With Head and Neck Cancer. Head Neck 2024. [PMID: 39713894 DOI: 10.1002/hed.28034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Revised: 10/24/2024] [Accepted: 12/03/2024] [Indexed: 12/24/2024] Open
Abstract
OBJECTIVE To characterize the perioperative complications after ablative and reconstructive surgery in patients with head and neck cancer (HNC) based on race. METHODS We conducted a retrospective study of the 2015-2020 National Surgical Quality Improvement Program Database. We compared the perioperative outcomes between White, Asian, Black, Native Hawaiian or Pacific Islander, and American Indian or Alaskan Native patients with bivariate analysis. Multivariate logistic regression assessed the independent association of race with perioperative complications. RESULTS Black patients experienced longer surgeries (aβ, 43; 95% CI, 33, 53), longer hospital stays (aβ, 1.6 [95% CI, 1.1-2.1]), and were less likely to be discharged home (aOR, 0.64; [95% CI, 0.54, 0.76]). Black patients also had higher major complications risk (aOR, 1.38; [95% CI, 1.13-1.67]) with the most common being reintubation/ventilation (Black, 4.4% vs. White 2.7%; p = 0.003) and sepsis/septic shock (Black, 3.4% vs. White 1.8%; p = < 0.001). Black patients had higher reoperation rates (aOR, 1.33; [95% CI, 1.12-1.56]) with incision and drainage of abscess and hematoma, exploration of postoperative hemorrhage, thrombosis or infection, or surgical debridement being the top reasons for reoperation. Concordantly, they were at higher risk of postoperative transfusion (Black, 18%; White, 7.2%; p = < 0.001) and wound dehiscence (Black, 4.1%; White, 2.1%; p = < 0.001). CONCLUSION There is evidence of racial disparities in HNC surgery perioperatively. Black patients face an increased risk of major complications, reoperation, extended hospital stay, and non-home discharge. Developing a comprehensive surgical database with more social determinants of health variables and using a socioecological framework of health can help us identify contributors to these disparities and design high-leverage solutions.
Collapse
Affiliation(s)
| | | | | | - Avanti Verma
- Yale School of Medicine, New Haven, Connecticut, USA
- Otolaryngology- Head and Neck Surgery, New Haven, Connecticut, USA
| | - Benjamin Judson
- Yale School of Medicine, New Haven, Connecticut, USA
- Otolaryngology- Head and Neck Surgery, New Haven, Connecticut, USA
| |
Collapse
|
3
|
Mun M, Kim A, Woo K. Natural Language Processing Application in Nursing Research: A Study Using Text Network Analysis and Topic Modeling. Comput Inform Nurs 2024; 42:889-897. [PMID: 38913983 DOI: 10.1097/cin.0000000000001158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Although the potential of natural language processing and an increase in its application in nursing research is evident, there is a lack of understanding of the research trends. This study conducts text network analysis and topic modeling to uncover the underlying knowledge structures, research trends, and emergent research themes within nursing literature related to natural language processing. In addition, this study aims to provide a foundation for future scholarly inquiries and enhance the integration of natural language processing in the analysis of nursing research. We analyzed 443 literature abstracts and performed core keyword analysis and topic modeling based on frequency and centrality. The following topics emerged: (1) Term Identification and Communication; (2) Application of Machine Learning; (3) Exploration of Health Outcome Factors; (4) Intervention and Participant Experience; and (5) Disease-Related Algorithms. Nursing meta-paradigm elements were identified within the core keyword analysis, which led to understanding and expanding the meta-paradigm. Although still in its infancy in nursing research with limited topics and research volumes, natural language processing can potentially enhance research efficiency and nursing quality. The findings emphasize the possibility of integrating natural language processing in nursing-related subjects, validating nursing value, and fostering the exploration of essential paradigms in nursing science.
Collapse
Affiliation(s)
- Minji Mun
- Author Affiliations: College of Nursing (Mrs Mun, Mrs Kim, and Dr Woo), and The Research Institute of Nursing Science, College of Nursing (Dr Woo), Seoul National University, South Korea
| | | | | |
Collapse
|
4
|
Ralevski A, Taiyab N, Nossal M, Mico L, Piekos S, Hadlock J. Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study. J Med Internet Res 2024; 26:e63445. [PMID: 39561354 PMCID: PMC11615547 DOI: 10.2196/63445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 08/31/2024] [Accepted: 09/25/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND Social determinants of health (SDoH) such as housing insecurity are known to be intricately linked to patients' health status. More efficient methods for abstracting structured data on SDoH can help accelerate the inclusion of exposome variables in biomedical research and support health care systems in identifying patients who could benefit from proactive outreach. Large language models (LLMs) developed from Generative Pre-trained Transformers (GPTs) have shown potential for performing complex abstraction tasks on unstructured clinical notes. OBJECTIVE Here, we assess the performance of GPTs on identifying temporal aspects of housing insecurity and compare results between both original and deidentified notes. METHODS We compared the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results were compared with manual abstraction, a named entity recognition model, and regular expressions. RESULTS Compared with GPT-3.5 and the named entity recognition model, GPT-4 had the highest performance and had a much higher recall (0.924) than human abstractors (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human abstractors (0.971). GPT-4's precision improved slightly (0.936 original, 0.939 deidentified) on deidentified versions of the same notes, while recall dropped (0.781 original, 0.704 deidentified). CONCLUSIONS This work demonstrates that while manual abstraction is likely to yield slightly more accurate results overall, LLMs can provide a scalable, cost-effective solution with the advantage of greater recall. This could support semiautomated abstraction, but given the potential risk for harm, human review would be essential before using results for any patient engagement or care decisions. Furthermore, recall was lower when notes were deidentified prior to LLM abstraction.
Collapse
Affiliation(s)
- Alexandra Ralevski
- Institute for Systems Biology, Seattle, WA, United States
- Providence Health & Services, Renton, WA, United States
| | | | | | - Lindsay Mico
- Providence Health & Services, Renton, WA, United States
| | | | - Jennifer Hadlock
- Institute for Systems Biology, Seattle, WA, United States
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States
| |
Collapse
|
5
|
Li C, Mowery DL, Ma X, Yang R, Vurgun U, Hwang S, Donnelly HK, Bandhey H, Senathirajah Y, Visweswaran S, Sadhu EM, Akhtar Z, Getzen E, Freda PJ, Long Q, Becich MJ. Realizing the potential of social determinants data in EHR systems: A scoping review of approaches for screening, linkage, extraction, analysis, and interventions. J Clin Transl Sci 2024; 8:e147. [PMID: 39478779 PMCID: PMC11523026 DOI: 10.1017/cts.2024.571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 07/08/2024] [Accepted: 07/29/2024] [Indexed: 11/02/2024] Open
Abstract
Background Social determinants of health (SDoH), such as socioeconomics and neighborhoods, strongly influence health outcomes. However, the current state of standardized SDoH data in electronic health records (EHRs) is lacking, a significant barrier to research and care quality. Methods We conducted a PubMed search using "SDOH" and "EHR" Medical Subject Headings terms, analyzing included articles across five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions. Results Of 685 articles identified, 324 underwent full review. Key findings include implementation of tailored screening instruments, census and claims data linkage for contextual SDoH profiles, NLP systems extracting SDoH from notes, associations between SDoH and healthcare utilization and chronic disease control, and integrated care management programs. However, variability across data sources, tools, and outcomes underscores the need for standardization. Discussion Despite progress in identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical for SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately, widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Collapse
Affiliation(s)
- Chenyu Li
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Danielle L. Mowery
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Xiaomeng Ma
- Institute of Health Policy Management and Evaluations, University of Toronto, Toronto, ON, Canada
| | - Rui Yang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore, Singapore
| | - Ugurcan Vurgun
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Sy Hwang
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Harsh Bandhey
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Yalini Senathirajah
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Eugene M. Sadhu
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| | - Zohaib Akhtar
- Kellogg School of Management, Northwestern University, Evanston, IL, USA
| | - Emily Getzen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Philip J. Freda
- Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Qi Long
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Michael J. Becich
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Chang E, Sung S. Use of SNOMED CT in Large Language Models: Scoping Review. JMIR Med Inform 2024; 12:e62924. [PMID: 39374057 PMCID: PMC11494256 DOI: 10.2196/62924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/22/2024] [Accepted: 09/15/2024] [Indexed: 10/08/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed. OBJECTIVE This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks. METHODS Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized. RESULTS The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics. CONCLUSIONS This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT's relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.
Collapse
Affiliation(s)
- Eunsuk Chang
- Republic of Korea Air Force Aerospace Medical Center, Cheongju, Republic of Korea
| | - Sumi Sung
- Department of Nursing Science, Research Institute of Nursing Science, Chungbuk National University, Cheongju, Republic of Korea
| |
Collapse
|
7
|
Gabriel RA, Litake O, Simpson S, Burton BN, Waterman RS, Macias AA. On the development and validation of large language model-based classifiers for identifying social determinants of health. Proc Natl Acad Sci U S A 2024; 121:e2320716121. [PMID: 39284061 PMCID: PMC11441499 DOI: 10.1073/pnas.2320716121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 08/08/2024] [Indexed: 10/02/2024] Open
Abstract
The assessment of social determinants of health (SDoH) within healthcare systems is crucial for comprehensive patient care and addressing health disparities. Current challenges arise from the limited inclusion of structured SDoH information within electronic health record (EHR) systems, often due to the lack of standardized diagnosis codes. This study delves into the transformative potential of large language models (LLM) to overcome these challenges. LLM-based classifiers-using Bidirectional Encoder Representations from Transformers (BERT) and A Robustly Optimized BERT Pretraining Approach (RoBERTa)-were developed for SDoH concepts, including homelessness, food insecurity, and domestic violence, using synthetic training datasets generated by generative pre-trained transformers combined with authentic clinical notes. Models were then validated on separate datasets: Medical Information Mart for Intensive Care-III and our institutional EHR data. When training the model with a combination of synthetic and authentic notes, validation on our institutional dataset yielded an area under the receiver operating characteristics curve of 0.78 for detecting homelessness, 0.72 for detecting food insecurity, and 0.83 for detecting domestic violence. This study underscores the potential of LLMs in extracting SDoH information from clinical text. Automated detection of SDoH may be instrumental for healthcare providers in identifying at-risk patients, guiding targeted interventions, and contributing to population health initiatives aimed at mitigating disparities.
Collapse
Affiliation(s)
- Rodney A Gabriel
- Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037
- Department of Biomedical Informatics, University of California, San Diego Health, La Jolla, CA 92037
| | - Onkar Litake
- Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037
| | - Sierra Simpson
- Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037
| | - Brittany N Burton
- Department of Anesthesiology, University of California, Los Angeles, CA 90095
| | - Ruth S Waterman
- Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037
| | - Alvaro A Macias
- Division of Perioperative Informatics, Department of Anesthesiology, University of California, San Diego, La Jolla, CA 92037
| |
Collapse
|
8
|
Field C, Wang XY, Costantine MM, Landon MB, Grobman WA, Venkatesh KK. Social Determinants of Health and Diabetes in Pregnancy. Am J Perinatol 2024. [PMID: 39209304 DOI: 10.1055/a-2405-2409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/04/2024]
Abstract
Social determinants of health (SDOH) are the conditions in which people are born, grow, work, live, and age. SDOH are systemic factors that may explain, perpetuate, and exacerbate disparities in health outcomes for different populations and can be measured at both an individual and neighborhood or community level (iSDOH, nSDOH). In pregnancy, increasing evidence shows that adverse iSDOH and/or nSDOH are associated with a greater likelihood that diabetes develops, and that when it develops, there is worse glycemic control and a greater frequency of adverse pregnancy outcomes. Future research should not only continue to examine the relationships between SDOH and adverse pregnancy outcomes with diabetes but should determine whether multi-level interventions that seek to mitigate adverse SDOH result in equitable maternal care and improved patient health outcomes for pregnant individuals living with diabetes. KEY POINTS: · SDOH are conditions in which people are born, grow, work, live, and age.. · SDOH are systemic factors that may explain, perpetuate, and exacerbate disparities in health outcomes.. · SDOH can be measured at the individual and neighborhood level.. · Adverse SDOH are associated with worse outcomes for pregnant individuals living with diabetes.. · Interventions that mitigate adverse SDOH to improve maternal health equity and outcomes are needed..
Collapse
Affiliation(s)
- Christine Field
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, Ohio
| | - Xiao-Yu Wang
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, Ohio
| | - Maged M Costantine
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, Ohio
| | - Mark B Landon
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, Ohio
| | - William A Grobman
- Department of Obstetrics and Gynecology, Brown University, Providence, Rhode Island
| | - Kartik K Venkatesh
- Department of Obstetrics and Gynecology, The Ohio State University, Columbus, Ohio
| |
Collapse
|
9
|
Raza S, Ding C. Improving Clinical Decision Making With a Two-Stage Recommender System. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1180-1190. [PMID: 37738190 DOI: 10.1109/tcbb.2023.3318209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/24/2023]
Abstract
Clinical decision-making is complex and time-intensive. To help in this effort, clinical recommender systems (RS) have been designed to facilitate healthcare practitioners with personalized advice. However, designing an effective clinical RS poses challenges due to the multifaceted nature of clinical data and the demand for tailored recommendations. In this article, we introduce a 2-Stage Recommendation framework for clinical decision-making, which leverages a publicly accessible dataset of electronic health records. In the first stage, a deep neural network-based model is employed to extract a set of candidate items, such as diagnoses, medications, and prescriptions, from a patient's electronic health records. Subsequently, the second stage utilizes a deep learning model to rank and pinpoint the most relevant items for healthcare providers. Both retriever and ranker are based on pre-trained transformer models that are stacked together as a pipeline. To validate our model, we compared its performance against several baseline models using different evaluation metrics. The results reveal that our proposed model attains a performance gain of approximately 12.3% macro-average F1 compared to the second best performing baseline. Qualitative analysis across various dimensions also confirms the model's high performance. Furthermore, we discuss challenges like data availability, privacy concerns, and shed light on future exploration in this domain.
Collapse
|
10
|
Volkmer S, Meyer-Lindenberg A, Schwarz E. Large language models in psychiatry: Opportunities and challenges. Psychiatry Res 2024; 339:116026. [PMID: 38909412 DOI: 10.1016/j.psychres.2024.116026] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 05/17/2024] [Accepted: 06/10/2024] [Indexed: 06/25/2024]
Abstract
The ability of Large Language Models (LLMs) to analyze and respond to freely written text is causing increasing excitement in the field of psychiatry; the application of such models presents unique opportunities and challenges for psychiatric applications. This review article seeks to offer a comprehensive overview of LLMs in psychiatry, their model architecture, potential use cases, and clinical considerations. LLM frameworks such as ChatGPT/GPT-4 are trained on huge amounts of text data that are sometimes fine-tuned for specific tasks. This opens up a wide range of possible psychiatric applications, such as accurately predicting individual patient risk factors for specific disorders, engaging in therapeutic intervention, and analyzing therapeutic material, to name a few. However, adoption in the psychiatric setting presents many challenges, including inherent limitations and biases in LLMs, concerns about explainability and privacy, and the potential damage resulting from produced misinformation. This review covers potential opportunities and limitations and highlights potential considerations when these models are applied in a real-world psychiatric context.
Collapse
Affiliation(s)
- Sebastian Volkmer
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany; Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Andreas Meyer-Lindenberg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany
| | - Emanuel Schwarz
- Hector Institute for Artificial Intelligence in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany; Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Germany.
| |
Collapse
|
11
|
Zainal NH, Bossarte RM, Gildea SM, Hwang I, Kennedy CJ, Liu H, Luedtke A, Marx BP, Petukhova MV, Post EP, Ross EL, Sampson NA, Sverdrup E, Turner B, Wager S, Kessler RC. Developing an individualized treatment rule for Veterans with major depressive disorder using electronic health records. Mol Psychiatry 2024; 29:2335-2345. [PMID: 38486050 PMCID: PMC11399319 DOI: 10.1038/s41380-024-02500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 02/23/2024] [Accepted: 02/27/2024] [Indexed: 09/16/2024]
Abstract
Efforts to develop an individualized treatment rule (ITR) to optimize major depressive disorder (MDD) treatment with antidepressant medication (ADM), psychotherapy, or combined ADM-psychotherapy have been hampered by small samples, small predictor sets, and suboptimal analysis methods. Analyses of large administrative databases designed to approximate experiments followed iteratively by pragmatic trials hold promise for resolving these problems. The current report presents a proof-of-concept study using electronic health records (EHR) of n = 43,470 outpatients beginning MDD treatment in Veterans Health Administration Primary Care Mental Health Integration (PC-MHI) clinics, which offer access not only to ADMs but also psychotherapy and combined ADM-psychotherapy. EHR and geospatial databases were used to generate an extensive baseline predictor set (5,865 variables). The outcome was a composite measure of at least one serious negative event (suicide attempt, psychiatric emergency department visit, psychiatric hospitalization, suicide death) over the next 12 months. Best-practices methods were used to adjust for nonrandom treatment assignment and to estimate a preliminary ITR in a 70% training sample and to evaluate the ITR in the 30% test sample. Statistically significant aggregate variation was found in overall probability of the outcome related to baseline predictors (AU-ROC = 0.68, S.E. = 0.01), with test sample outcome prevalence of 32.6% among the 5% of patients having highest predicted risk compared to 7.1% in the remainder of the test sample. The ITR found that psychotherapy-only was the optimal treatment for 56.0% of patients (roughly 20% lower risk of the outcome than if receiving one of the other treatments) and that treatment type was unrelated to outcome risk among other patients. Change in aggregate treatment costs of implementing this ITR would be negligible, as 16.1% fewer patients would be prescribed ADMs and 2.9% more would receive psychotherapy. A pragmatic trial would be needed to confirm the accuracy of the ITR.
Collapse
Affiliation(s)
- Nur Hani Zainal
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Robert M Bossarte
- Department of Psychiatry and Behavioral Neurosciences, University of South Florida, Tampa, FL, USA
| | - Sarah M Gildea
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Irving Hwang
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Chris J Kennedy
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Howard Liu
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
- Center of Excellence for Suicide Prevention, Canandaigua VA Medical Center, Canandaigua, NY, USA
| | - Alex Luedtke
- Department of Statistics, University of Washington, Seattle, WA, USA
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Brian P Marx
- National Center for PTSD, VA Boston Healthcare System, Boston, MA, USA
- Department of Psychiatry, Boston University School of Medicine, Boston, MA, USA
| | - Maria V Petukhova
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Edward P Post
- Center for Clinical Management Research, VA Ann Arbor Health Care System, Ann Arbor, MI, USA
- Department of Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Eric L Ross
- Department of Psychiatry, Larner College of Medicine, University of Vermont, Burlington, VT, USA
| | - Nancy A Sampson
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA
| | - Erik Sverdrup
- Graduate School of Business, Stanford University, Stanford, CA, USA
| | - Brett Turner
- Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Stefan Wager
- Graduate School of Business, Stanford University, Stanford, CA, USA
| | - Ronald C Kessler
- Department of Health Care Policy, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
12
|
Sikström S, Valavičiūtė I, Kuusela I, Evors N. Question-based computational language approach outperforms rating scales in quantifying emotional states. COMMUNICATIONS PSYCHOLOGY 2024; 2:45. [PMID: 39242812 PMCID: PMC11332055 DOI: 10.1038/s44271-024-00097-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/03/2024] [Indexed: 09/09/2024]
Abstract
Psychological constructs are commonly quantified with closed-ended rating scales. However, recent advancements in natural language processing (NLP) enable the quantification of open-ended language responses. Here we demonstrate that descriptive word responses analyzed using NLP show higher accuracy in categorizing emotional states compared to traditional rating scales. One group of participants (N = 297) generated narratives related to depression, anxiety, satisfaction, or harmony, summarized them with five descriptive words, and rated them using rating scales. Another group (N = 434) evaluated these narratives (with descriptive words and rating scales) from the author's perspective. The descriptive words were quantified using NLP, and machine learning was used to categorize the responses into the corresponding emotional states. The results showed a significantly higher number of accurate categorizations of the narratives based on descriptive words (64%) than on rating scales (44%), questioning the notion that rating scales are more precise in measuring emotional states than language-based measures.
Collapse
Affiliation(s)
- Sverker Sikström
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden.
| | - Ieva Valavičiūtė
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| | - Inari Kuusela
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| | - Nicole Evors
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| |
Collapse
|
13
|
Keloth VK, Selek S, Chen Q, Gilman C, Fu S, Dang Y, Chen X, Hu X, Zhou Y, He H, Fan JW, Wang K, Brandt C, Tao C, Liu H, Xu H. Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.21.24307726. [PMID: 38826441 PMCID: PMC11142292 DOI: 10.1101/2024.05.21.24307726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.
Collapse
Affiliation(s)
- Vipina K. Keloth
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Salih Selek
- Department of Psychiatry and Behavioral Sciences, UTHealth McGovern Medical School, Houston, TX, USA
| | - Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Christopher Gilman
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xinghan Chen
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - Yujia Zhou
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Huan He
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Jungwei W. Fan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Karen Wang
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
- Equity Research and Innovation Center, Yale School of Medicine, New Haven, CT, USA
| | - Cynthia Brandt
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
14
|
Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024; 153:104642. [PMID: 38621641 PMCID: PMC11141428 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]
Abstract
OBJECTIVE To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.
Collapse
Affiliation(s)
- Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Chong Dang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
| | - Braja Gopal Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Debbie L Wilson
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Ching-Yuan Chang
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Wei-Hsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Thomas J George
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|
15
|
Ralevski A, Taiyab N, Nossal M, Mico L, Piekos SN, Hadlock J. Using Large Language Models to Annotate Complex Cases of Social Determinants of Health in Longitudinal Clinical Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.25.24306380. [PMID: 38712224 PMCID: PMC11071574 DOI: 10.1101/2024.04.25.24306380] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Social Determinants of Health (SDoH) are an important part of the exposome and are known to have a large impact on variation in health outcomes. In particular, housing stability is known to be intricately linked to a patient's health status, and pregnant women experiencing housing instability (HI) are known to have worse health outcomes. Most SDoH information is stored in electronic health records (EHRs) as free text (unstructured) clinical notes, which traditionally required natural language processing (NLP) for automatic identification of relevant text or keywords. A patient's housing status can be ambiguous or subjective, and can change from note to note or within the same note, making it difficult to use existing NLP solutions. New developments in NLP allow researchers to prompt LLMs to perform complex, subjective annotation tasks that require reasoning that previously could only be attempted by human annotators. For example, large language models (LLMs) such as GPT (Generative Pre-trained Transformer) enable researchers to analyze complex, unstructured data using simple prompts. We used a secure platform within a large healthcare system to compare the ability of GPT-3.5 and GPT-4 to identify instances of both current and past housing instability, as well as general housing status, from 25,217 notes from 795 pregnant women. Results from these LLMs were compared with results from manual annotation, a named entity recognition (NER) model, and regular expressions (RegEx). We developed a chain-of-thought prompt requiring evidence and justification for each note from the LLMs, to help maximize the chances of finding relevant text related to HI while minimizing hallucinations and false positives. Compared with GPT-3.5 and the NER model, GPT-4 had the highest performance and had a much higher recall (0.924) than human annotators (0.702) in identifying patients experiencing current or past housing instability, although precision was lower (0.850) compared with human annotators (0.971). In most cases, the evidence output by GPT-4 was similar or identical to that of human annotators, and there was no evidence of hallucinations in any of the outputs from GPT-4. Most cases where the annotators and GPT-4 differed were ambiguous or subjective, such as "living in an apartment with too many people". We also looked at GPT-4 performance on de-identified versions of the same notes and found that precision improved slightly (0.936 original, 0.939 de-identified), while recall dropped (0.781 original, 0.704 de-identified). This work demonstrates that, while manual annotation is likely to yield slightly more accurate results overall, LLMs, when compared with manual annotation, provide a scalable, cost-effective solution with the advantage of greater recall. At the same time, further evaluation is needed to address the risk of missed cases and bias in the initial selection of housing-related notes. Additionally, while it was possible to reduce confabulation, signs of unusual justifications remained. Given these factors, together with changes in both LLMs and charting over time, this approach is not yet appropriate for use as a fully-automated process. However, these results demonstrate the potential for using LLMs for computer-assisted annotation with human review, reducing cost and increasing recall. More efficient methods for obtaining structured SDoH data can help accelerate inclusion of exposome variables in biomedical research, and support healthcare systems in identifying patients who could benefit from proactive outreach.
Collapse
Affiliation(s)
| | - Nadaa Taiyab
- Tegria, 1255 Fourier Dr Ste 101, Madison, WI, 53717, USA
| | - Michael Nossal
- Providence St Joseph Health, 1801 Lind Ave SW Renton, WA, 98057, USA
| | - Lindsay Mico
- Providence St Joseph Health, 1801 Lind Ave SW Renton, WA, 98057, USA
| | | | - Jennifer Hadlock
- Institute for Systems Biology, 401 Terry Ave N, Seattle, WA, 98109, USA
- University of Washington, Biomedical Informatics and Medical Education, Seattle, WA, USA
| |
Collapse
|
16
|
Mahbub M, Goethert I, Danciu I, Knight K, Srinivasan S, Tamang S, Rozenberg-Ben-Dror K, Solares H, Martins S, Trafton J, Begoli E, Peterson GD. Question-answering system extracts information on injection drug use from clinical notes. COMMUNICATIONS MEDICINE 2024; 4:61. [PMID: 38570620 PMCID: PMC10991373 DOI: 10.1038/s43856-024-00470-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 02/29/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Injection drug use (IDU) can increase mortality and morbidity. Therefore, identifying IDU early and initiating harm reduction interventions can benefit individuals at risk. However, extracting IDU behaviors from patients' electronic health records (EHR) is difficult because there is no other structured data available, such as International Classification of Disease (ICD) codes, and IDU is most often documented in unstructured free-text clinical notes. Although natural language processing can efficiently extract this information from unstructured data, there are no validated tools. METHODS To address this gap in clinical information, we design a question-answering (QA) framework to extract information on IDU from clinical notes for use in clinical operations. Our framework involves two main steps: (1) generating a gold-standard QA dataset and (2) developing and testing the QA model. We use 2323 clinical notes of 1145 patients curated from the US Department of Veterans Affairs (VA) Corporate Data Warehouse to construct the gold-standard dataset for developing and evaluating the QA model. We also demonstrate the QA model's ability to extract IDU-related information from temporally out-of-distribution data. RESULTS Here, we show that for a strict match between gold-standard and predicted answers, the QA model achieves a 51.65% F1 score. For a relaxed match between the gold-standard and predicted answers, the QA model obtains a 78.03% F1 score, along with 85.38% Precision and 79.02% Recall scores. Moreover, the QA model demonstrates consistent performance when subjected to temporally out-of-distribution data. CONCLUSIONS Our study introduces a QA framework designed to extract IDU information from clinical notes, aiming to enhance the accurate and efficient detection of people who inject drugs, extract relevant information, and ultimately facilitate informed patient care.
Collapse
Affiliation(s)
- Maria Mahbub
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.
| | - Ian Goethert
- Information Technology Services Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Ioana Danciu
- Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
- Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | - Kathryn Knight
- Information Technology Services Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Sudarshan Srinivasan
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Suzanne Tamang
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Hugo Solares
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Susana Martins
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Jodie Trafton
- Program Evaluation and Resource Center, Office of Mental Health and Suicide Prevention, Department of Veterans Affairs, Menlo Park, CA, USA
| | - Edmon Begoli
- Cyber Resilience and Intelligence Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Gregory D Peterson
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Knoxville, TN, USA
| |
Collapse
|
17
|
Bala J, Newson JJ, Thiagarajan TC. Hierarchy of demographic and social determinants of mental health: analysis of cross-sectional survey data from the Global Mind Project. BMJ Open 2024; 14:e075095. [PMID: 38490653 PMCID: PMC10946366 DOI: 10.1136/bmjopen-2023-075095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 02/16/2024] [Indexed: 03/17/2024] Open
Abstract
OBJECTIVES To understand the extent to which various demographic and social determinants predict mental health status and their relative hierarchy of predictive power in order to prioritise and develop population-based preventative approaches. DESIGN Cross-sectional analysis of survey data. SETTING Internet-based survey from 32 countries across North America, Europe, Latin America, Middle East and North Africa, Sub-Saharan Africa, South Asia and Australia, collected between April 2020 and December 2021. PARTICIPANTS 270 000 adults aged 18-85+ years who participated in the Global Mind Project. OUTCOME MEASURES We used 120+ demographic and social determinants to predict aggregate mental health status and scores of individuals (mental health quotient (MHQ)) and determine their relative predictive influence using various machine learning models including gradient boosting and random forest classification for various demographic stratifications by age, gender, geographical region and language. Outcomes reported include model performance metrics of accuracy, precision, recall, F1 scores and importance of individual factors determined by reduction in the squared error attributable to that factor. RESULTS Across all demographic classification models, 80% of those with negative MHQs were correctly identified, while regression models predicted specific MHQ scores within ±15% of the position on the scale. Predictions were higher for older ages (0.9+ accuracy, 0.9+ F1 Score; 65+ years) and poorer for younger ages (0.68 accuracy, 0.68 F1 Score; 18-24 years). Across all age groups, genders, regions and language groups, lack of social interaction and sufficient sleep were several times more important than all other factors. For younger ages (18-24 years), other highly predictive factors included cyberbullying and sexual abuse while not being able to work was high for ages 45-54 years. CONCLUSION Social determinants of traumas, adversities and lifestyle can account for 60%-90% of mental health challenges. However, additional factors are at play, particularly for younger ages, that are not included in these data and need further investigation.
Collapse
|
18
|
Hatef E, Chang HY, Richards TM, Kitchen C, Budaraju J, Foroughmand I, Lasser EC, Weiner JP. Development of a Social Risk Score in the Electronic Health Record to Identify Social Needs Among Underserved Populations: Retrospective Study. JMIR Form Res 2024; 8:e54732. [PMID: 38470477 DOI: 10.2196/54732] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 02/02/2024] [Accepted: 02/08/2024] [Indexed: 03/13/2024] Open
Abstract
BACKGROUND Patients with unmet social needs and social determinants of health (SDOH) challenges continue to face a disproportionate risk of increased prevalence of disease, health care use, higher health care costs, and worse outcomes. Some existing predictive models have used the available data on social needs and SDOH challenges to predict health-related social needs or the need for various social service referrals. Despite these one-off efforts, the work to date suggests that many technical and organizational challenges must be surmounted before SDOH-integrated solutions can be implemented on an ongoing, wide-scale basis within most US-based health care organizations. OBJECTIVE We aimed to retrieve available information in the electronic health record (EHR) relevant to the identification of persons with social needs and to develop a social risk score for use within clinical practice to better identify patients at risk of having future social needs. METHODS We conducted a retrospective study using EHR data (2016-2021) and data from the US Census American Community Survey. We developed a prospective model using current year-1 risk factors to predict future year-2 outcomes within four 2-year cohorts. Predictors of interest included demographics, previous health care use, comorbidity, previously identified social needs, and neighborhood characteristics as reflected by the area deprivation index. The outcome variable was a binary indicator reflecting the likelihood of the presence of a patient with social needs. We applied a generalized estimating equation approach, adjusting for patient-level risk factors, the possible effect of geographically clustered data, and the effect of multiple visits for each patient. RESULTS The study population of 1,852,228 patients included middle-aged (mean age range 53.76-55.95 years), White (range 324,279/510,770, 63.49% to 290,688/488,666, 64.79%), and female (range 314,741/510,770, 61.62% to 278,488/448,666, 62.07%) patients from neighborhoods with high socioeconomic status (mean area deprivation index percentile range 28.76-30.31). Between 8.28% (37,137/448,666) and 11.55% (52,037/450,426) of patients across the study cohorts had at least 1 social need documented in their EHR, with safety issues and economic challenges (ie, financial resource strain, employment, and food insecurity) being the most common documented social needs (87,152/1,852,228, 4.71% and 58,242/1,852,228, 3.14% of overall patients, respectively). The model had an area under the curve of 0.702 (95% CI 0.699-0.705) in predicting prospective social needs in the overall study population. Previous social needs (odds ratio 3.285, 95% CI 3.237-3.335) and emergency department visits (odds ratio 1.659, 95% CI 1.634-1.684) were the strongest predictors of future social needs. CONCLUSIONS Our model provides an opportunity to make use of available EHR data to help identify patients with high social needs. Our proposed social risk score could help identify the subset of patients who would most benefit from further social needs screening and data collection to avoid potentially more burdensome primary data collection on all patients in a target population of interest.
Collapse
Affiliation(s)
- Elham Hatef
- Division of General Internal Medicine, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, United States
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hsien-Yen Chang
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Thomas M Richards
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Christopher Kitchen
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Janya Budaraju
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Iman Foroughmand
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Elyse C Lasser
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Jonathan P Weiner
- Center for Population Health Information Technology, Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| |
Collapse
|
19
|
Iott BE, Rivas S, Gottlieb LM, Adler-Milstein J, Pantell MS. Structured and unstructured social risk factor documentation in the electronic health record underestimates patients' self-reported risks. J Am Med Inform Assoc 2024; 31:714-719. [PMID: 38216127 PMCID: PMC10873825 DOI: 10.1093/jamia/ocad261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 12/17/2023] [Accepted: 12/28/2023] [Indexed: 01/14/2024] Open
Abstract
OBJECTIVES National attention has focused on increasing clinicians' responsiveness to the social determinants of health, for example, food security. A key step toward designing responsive interventions includes ensuring that information about patients' social circumstances is captured in the electronic health record (EHR). While prior work has assessed levels of EHR "social risk" documentation, the extent to which documentation represents the true prevalence of social risk is unknown. While no gold standard exists to definitively characterize social risks in clinical populations, here we used the best available proxy: social risks reported by patient survey. MATERIALS AND METHODS We compared survey results to respondents' EHR social risk documentation (clinical free-text notes and International Statistical Classification of Diseases and Related Health Problems [ICD-10] codes). RESULTS Surveys indicated much higher rates of social risk (8.2%-40.9%) than found in structured (0%-2.0%) or unstructured (0%-0.2%) documentation. DISCUSSION Ideally, new care standards that include incentives to screen for social risk will increase the use of documentation tools and clinical teams' awareness of and interventions related to social adversity, while balancing potential screening and documentation burden on clinicians and patients. CONCLUSION EHR documentation of social risk factors currently underestimates their prevalence.
Collapse
Affiliation(s)
- Bradley E Iott
- Center for Clinical Informatics and Improvement Research, University of California, San Francisco, San Francisco, CA, United States
- Social Interventions Research and Evaluation Network, University of California, San Francisco, San Francisco, CA, United States
| | - Samantha Rivas
- Social Interventions Research and Evaluation Network, University of California, San Francisco, San Francisco, CA, United States
| | - Laura M Gottlieb
- Social Interventions Research and Evaluation Network, University of California, San Francisco, San Francisco, CA, United States
- Center for Health and Community, University of California, San Francisco, San Francisco, CA, United States
- Department of Family and Community Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Julia Adler-Milstein
- Center for Clinical Informatics and Improvement Research, University of California, San Francisco, San Francisco, CA, United States
- Department of Medicine, University of California, San Francisco, San Francisco, CA, United States
| | - Matthew S Pantell
- Center for Health and Community, University of California, San Francisco, San Francisco, CA, United States
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA, United States
| |
Collapse
|
20
|
Li C, Mowery DL, Ma X, Yang R, Vurgun U, Hwang S, Donnelly HK, Bandhey H, Akhtar Z, Senathirajah Y, Sadhu EM, Getzen E, Freda PJ, Long Q, Becich MJ. Realizing the Potential of Social Determinants Data: A Scoping Review of Approaches for Screening, Linkage, Extraction, Analysis and Interventions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.04.24302242. [PMID: 38370703 PMCID: PMC10871446 DOI: 10.1101/2024.02.04.24302242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Background Social determinants of health (SDoH) like socioeconomics and neighborhoods strongly influence outcomes, yet standardized SDoH data is lacking in electronic health records (EHR), limiting research and care quality. Methods We searched PubMed using keywords "SDOH" and "EHR", underwent title/abstract and full-text screening. Included records were analyzed under five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions. Results We identified 685 articles, of which 324 underwent full review. Key findings include tailored screening instruments implemented across settings, census and claims data linkage providing contextual SDoH profiles, rule-based and neural network systems extracting SDoH from notes using NLP, connections found between SDoH data and healthcare utilization/chronic disease control, and integrated care management programs executed. However, considerable variability persists across data sources, tools, and outcomes. Discussion Despite progress identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical to fulfill the potential of SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Collapse
Affiliation(s)
- Chenyu Li
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Danielle L. Mowery
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Xiaomeng Ma
- University of Toronto, Institute of Health Policy Management and Evaluations
| | - Rui Yang
- Duke-NUS Medical School, Centre for Quantitative Medicine
| | - Ugurcan Vurgun
- University of Pennsylvania, Institute for Biomedical Informatics
| | - Sy Hwang
- University of Pennsylvania, Institute for Biomedical Informatics
| | | | - Harsh Bandhey
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Zohaib Akhtar
- Northwestern University, Kellogg School of Management
| | - Yalini Senathirajah
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Eugene Mathew Sadhu
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| | - Emily Getzen
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Philip J Freda
- Cedars-Sinai Medical Center, Department of Computational Biomedicine
| | - Qi Long
- University of Pennsylvania, Institute for Biomedical Informatics
- University of Pennsylvania, Department of Biostatistics, Epidemiology and Informatics
| | - Michael J. Becich
- University of Pittsburgh School of Medicine Department of Biomedical Informatics
| |
Collapse
|
21
|
Guevara M, Chen S, Thomas S, Chaunzwa TL, Franco I, Kann BH, Moningi S, Qian JM, Goldstein M, Harper S, Aerts HJWL, Catalano PJ, Savova GK, Mak RH, Bitterman DS. Large language models to identify social determinants of health in electronic health records. NPJ Digit Med 2024; 7:6. [PMID: 38200151 PMCID: PMC10781957 DOI: 10.1038/s41746-023-00970-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 47.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 11/15/2023] [Indexed: 01/12/2024] Open
Abstract
Social determinants of health (SDoH) play a critical role in patient outcomes, yet their documentation is often missing or incomplete in the structured data of electronic health records (EHRs). Large language models (LLMs) could enable high-throughput extraction of SDoH from the EHR to support research and clinical care. However, class imbalance and data limitations present challenges for this sparsely documented yet critical information. Here, we investigated the optimal methods for using LLMs to extract six SDoH categories from narrative text in the EHR: employment, housing, transportation, parental status, relationship, and social support. The best-performing models were fine-tuned Flan-T5 XL for any SDoH mentions (macro-F1 0.71), and Flan-T5 XXL for adverse SDoH mentions (macro-F1 0.70). Adding LLM-generated synthetic data to training varied across models and architecture, but improved the performance of smaller Flan-T5 models (delta F1 + 0.12 to +0.23). Our best-fine-tuned models outperformed zero- and few-shot performance of ChatGPT-family models in the zero- and few-shot setting, except GPT4 with 10-shot prompting for adverse SDoH. Fine-tuned models were less likely than ChatGPT to change their prediction when race/ethnicity and gender descriptors were added to the text, suggesting less algorithmic bias (p < 0.05). Our models identified 93.8% of patients with adverse SDoH, while ICD-10 codes captured 2.0%. These results demonstrate the potential of LLMs in improving real-world evidence on SDoH and assisting in identifying patients who could benefit from resource support.
Collapse
Affiliation(s)
- Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Spencer Thomas
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Tafadzwa L Chaunzwa
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Idalid Franco
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Benjamin H Kann
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jack M Qian
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Susan Harper
- Adult Resource Office, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Hugo J W L Aerts
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
- Radiology and Nuclear Medicine, GROW & CARIM, Maastricht University, Maastricht, The Netherlands
| | - Paul J Catalano
- Department of Data Science, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA, USA.
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
22
|
Ratnayake I, Pepper S, Anderson A, Alsup A, Mudaranthakam DP. An R Shiny Application (SDOH) for Predictive Modeling Using Regional Social Determinants of Health Survey Responses. INTERNATIONAL JOURNAL OF SOCIAL DETERMINANTS OF HEALTH AND HEALTH SERVICES 2024; 54:21-27. [PMID: 37697462 PMCID: PMC10797831 DOI: 10.1177/27551938231201011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/06/2023] [Accepted: 08/01/2023] [Indexed: 09/13/2023]
Abstract
Social determinants of health (SDoH) surveys are data sets that provide useful health-related information about individuals and communities. This study aims to develop a user-friendly web application that allows clinicians to get a predictive insight into the social needs of their patients before their in-patient visits using SDoH survey data to provide an improved and personalized service. The study used a longitudinal survey that consisted of 108,563 patient responses to 12 questions. Questions were designed to have a binary outcome as the response and the patient's most recent responses for each of these questions were modeled independently by incorporating explanatory variables. Multiple classification and regression techniques were used, including logistic regression, Bayesian generalized linear model, extreme gradient boosting, gradient boosting, neural networks, and random forests. Based on the area under the curve values, gradient boosting models provided the highest precision values. Finally, the models were incorporated into an R Shiny application, enabling users to predict and compare the impact of SDoH on patients' lives. The tool is freely hosted online by the University of Kansas Medical Center's Department of Biostatistics and Data Science. The supporting materials for the application are publicly accessible on GitHub.
Collapse
Affiliation(s)
- Isuru Ratnayake
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Sam Pepper
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Aliyah Anderson
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Alexander Alsup
- PULM Pulmonary and Critical Care Medicine, The University of Kansas Medical Center, Kansas City, KS, USA
| | | |
Collapse
|
23
|
van de Kamp E, Ma J, Monangi N, Tsui FR, Jani SG, Kim JH, Kahn RS, Wang CJ. Addressing Health-Related Social Needs and Mental Health Needs in the Neonatal Intensive Care Unit: Exploring Challenges and the Potential of Technology. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:7161. [PMID: 38131713 PMCID: PMC10742453 DOI: 10.3390/ijerph20247161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 11/21/2023] [Accepted: 12/06/2023] [Indexed: 12/23/2023]
Abstract
Unaddressed health-related social needs (HRSNs) and parental mental health needs in an infant's environment can negatively affect their health outcomes. This study examines the challenges and potential technological solutions for addressing these needs in the neonatal intensive care unit (NICU) setting and beyond. In all, 22 semistructured interviews were conducted with members of the NICU care team and other relevant stakeholders, based on an interpretive description approach. The participants were selected from three safety net hospitals in the U.S. with level IV NICUs. The challenges identified include navigating the multitude of burdens families in the NICU experience, resource constraints within and beyond the health system, a lack of streamlined or consistent processes, no closed-loop referrals to track status and outcomes, and gaps in support postdischarge. Opportunities for leveraging technology to facilitate screening and referral include automating screening, initiating risk-based referrals, using remote check-ins, facilitating resource navigation, tracking referrals, and providing language support. However, technological implementations should avoid perpetuating disparities and consider potential privacy or data-sharing concerns. Although advances in technological health tools alone cannot address all the challenges, they have the potential to offer dynamic tools to support the healthcare setting in identifying and addressing the unique needs and circumstances of each family in the NICU.
Collapse
Affiliation(s)
- Eline van de Kamp
- Athena Institute, Faculty of Science, Vrije Universiteit Amsterdam, 1081 HV Amsterdam, The Netherlands;
| | - Jasmin Ma
- Center for Policy, Outcomes, and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (J.M.); (S.G.J.)
| | - Nagendra Monangi
- Division of Neonatology, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA; (N.M.); (J.H.K.)
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA;
| | - Fuchiang Rich Tsui
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children’s Hospital of Philadelphia, Philadelphia, PA 19146, USA;
- Department of Anesthesiology and Critical Care Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shilpa G. Jani
- Center for Policy, Outcomes, and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (J.M.); (S.G.J.)
| | - Jae H. Kim
- Division of Neonatology, Perinatal Institute, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA; (N.M.); (J.H.K.)
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA;
| | - Robert S. Kahn
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA;
- Michael Fisher Child Health Equity Center, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, USA
| | - C. Jason Wang
- Center for Policy, Outcomes, and Prevention, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA 94305, USA; (J.M.); (S.G.J.)
- Department of Pediatrics and Department of Health Policy, Stanford University School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
24
|
Espinoza JC, Sehgal S, Phuong J, Bahroos N, Starren J, Wilcox A, Meeker D. Development of a social and environmental determinants of health informatics maturity model. J Clin Transl Sci 2023; 7:e266. [PMID: 38380394 PMCID: PMC10877515 DOI: 10.1017/cts.2023.691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 11/04/2023] [Accepted: 11/29/2023] [Indexed: 02/22/2024] Open
Abstract
Introduction Integrating social and environmental determinants of health (SEDoH) into enterprise-wide clinical workflows and decision-making is one of the most important and challenging aspects of improving health equity. We engaged domain experts to develop a SEDoH informatics maturity model (SIMM) to help guide organizations to address technical, operational, and policy gaps. Methods We established a core expert group consisting of developers, informaticists, and subject matter experts to identify different SIMM domains and define maturity levels. The candidate model (v0.9) was evaluated by 15 informaticists at a Center for Data to Health community meeting. After incorporating feedback, a second evaluation round for v1.0 collected feedback and self-assessments from 35 respondents from the National COVID Cohort Collaborative, the Center for Leading Innovation and Collaboration's Informatics Enterprise Committee, and a publicly available online self-assessment tool. Results We developed a SIMM comprising seven maturity levels across five domains: data collection policies, data collection methods and technologies, technology platforms for analysis and visualization, analytics capacity, and operational and strategic impact. The evaluation demonstrated relatively high maturity in analytics and technological capacity, but more moderate maturity in operational and strategic impact among academic medical centers. Changes made to the tool in between rounds improved its ability to discriminate between intermediate maturity levels. Conclusion The SIMM can help organizations identify current gaps and next steps in improving SEDoH informatics. Improving the collection and use of SEDoH data is one important component of addressing health inequities.
Collapse
Affiliation(s)
- Juan C. Espinoza
- Stanley Manne Children’s Research Institute, Ann & Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, USA
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Shruti Sehgal
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Jimmy Phuong
- Division of Biomedical and Health Informatics, University of Washington, Seattle, WA, USA
- Harborview Injury Prevention Research Center, University of Washington, Seattle, WA, USA
| | - Neil Bahroos
- University of Southern California Keck School of Medicine, Los Angeles, CA, USA
| | - Justin Starren
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | - Adam Wilcox
- Institute for Informatics, Department of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Daniella Meeker
- Department of Biomedical Informatics & Data Science, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
25
|
Hobensack M, Song J, Oh S, Evans L, Davoudi A, Bowles KH, McDonald MV, Barrón Y, Sridharan S, Wallace AS, Topaz M. Social Risk Factors are Associated with Risk for Hospitalization in Home Health Care: A Natural Language Processing Study. J Am Med Dir Assoc 2023; 24:1874-1880.e4. [PMID: 37553081 PMCID: PMC10839109 DOI: 10.1016/j.jamda.2023.06.031] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/23/2023] [Accepted: 06/25/2023] [Indexed: 08/10/2023]
Abstract
OBJECTIVE This study aimed to develop a natural language processing (NLP) system that identified social risk factors in home health care (HHC) clinical notes and to examine the association between social risk factors and hospitalization or an emergency department (ED) visit. DESIGN Retrospective cohort study. SETTING AND PARTICIPANTS We used standardized assessments and clinical notes from one HHC agency located in the northeastern United States. This included 86,866 episodes of care for 65,593 unique patients. Patients received HHC services between 2015 and 2017. METHODS Guided by HHC experts, we created a vocabulary of social risk factors that influence hospitalization or ED visit risk in the HHC setting. We then developed an NLP system to automatically identify social risk factors documented in clinical notes. We used an adjusted logistic regression model to examine the association between the NLP-based social risk factors and hospitalization or an ED visit. RESULTS On the basis of expert consensus, the following social risk factors emerged: Social Environment, Physical Environment, Education and Literacy, Food Insecurity, Access to Care, and Housing and Economic Circumstances. Our NLP system performed "very good" with an F score of 0.91. Approximately 4% of clinical notes (33% episodes of care) documented a social risk factor. The most frequently documented social risk factors were Physical Environment and Social Environment. Except for Housing and Economic Circumstances, all NLP-based social risk factors were associated with higher odds of hospitalization and ED visits. CONCLUSIONS AND IMPLICATIONS HHC clinicians assess and document social risk factors associated with hospitalizations and ED visits in their clinical notes. Future studies can explore the social risk factors documented in HHC to improve communication across the health care system and to predict patients at risk for being hospitalized or visiting the ED.
Collapse
Affiliation(s)
| | - Jiyoun Song
- Columbia University School of Nursing, New York City, NY, USA
| | - Sungho Oh
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Anahita Davoudi
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Kathryn H Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA; Department of Biobehavioral Health Sciences, NewCourtland Center for Transitions and Health, University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | | | - Yolanda Barrón
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Andrea S Wallace
- The University of Utah College of Nursing, Salt Lake City, UT, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York City, NY, USA; Center for Home Care Policy & Research, VNS Health, New York, NY, USA; Data Science Institute, Columbia University, New York City, NY, USA
| |
Collapse
|
26
|
Wu W, Holkeboer KJ, Kolawole TO, Carbone L, Mahmoudi E. Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records. Health Serv Res 2023; 58:1292-1302. [PMID: 37534741 PMCID: PMC10622277 DOI: 10.1111/1475-6773.14210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023] Open
Abstract
OBJECTIVE To develop a natural language processing (NLP) algorithm that identifies social determinants of health (SDoH), including housing, transportation, food, and medication insecurities, social isolation, abuse, neglect, or exploitation, and financial difficulties for patients with Alzheimer's disease and related dementias (ADRD) from unstructured electronic health records (EHRs). DATA SOURCES AND STUDY SETTING We leveraged 1000 medical notes randomly selected from 7401 emergency department and inpatient social worker notes generated between 2015 and 2019 for 231 unique patients diagnosed with ADRD at Michigan Medicine. STUDY DESIGN We developed a rule-based NLP algorithm for the identification of seven domains of SDoH noted above. We also compared the rule-based algorithm with deep learning and regularized logistic regression approaches. These models were compared using accuracy, sensitivity, specificity, F1 score, and the area under the receiver operating characteristic curve (AUC). All notes were split into 700 notes for training NLP algorithms, and 300 notes for validation. DATA COLLECTION/EXTRACTION METHODS Social worker notes used in this study were extracted from the Michigan Medicine EHR database. PRINCIPAL FINDINGS Of the 700 notes for training, F1 and AUC for the rule-based algorithm were at least 0.94 and 0.95, respectively, for all SDoH categories. Of the 300 notes for validation, F1 and AUC were at least 0.80 and 0.97, respectively, for all SDoH except housing and medication insecurities. The deep learning and regularized logistic regression algorithms had unsatisfactory performance. CONCLUSIONS The rule-based algorithm can accurately extract SDoH information in all seven domains of SDoH except housing and medication insecurities. Findings from the algorithm can be used by clinicians and social workers to proactively address social needs of patients with ADRD and other vulnerable patient populations.
Collapse
Affiliation(s)
- Wenbo Wu
- Departments of Population Health and Medicine, Grossman School of MedicineNew York UniversityNew York CityNew YorkUSA
- Center for Data ScienceNew York UniversityNew York CityNew YorkUSA
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Kaes J. Holkeboer
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
- College of Literature, Science, and the ArtsUniversity of MichiganAnn ArborMichiganUSA
| | - Temidun O. Kolawole
- Krieger School of Arts and SciencesJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Lorrie Carbone
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Elham Mahmoudi
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
- Institute for Healthcare Policy and InnovationUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
27
|
Javed Z, Kundi H, Chang R, Titus A, Arshad H. Polysocial Risk Scores: Implications for Cardiovascular Disease Risk Assessment and Management. Curr Atheroscler Rep 2023; 25:1059-1068. [PMID: 38048008 DOI: 10.1007/s11883-023-01173-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/10/2023] [Indexed: 12/05/2023]
Abstract
PURPOSE OF REVIEW To review current evidence, discuss key knowledge gaps and identify opportunities for development, validation and application of polysocial risk scores (pSRS) for cardiovascular disease (CVD) risk prediction and population cardiovascular health management. RECENT FINDINGS Limited existing evidence suggests that pSRS are promising tools to capture cumulative social determinants of health (SDOH) burden and improve CVD risk prediction beyond traditional risk factors. However, available tools lack generalizability, are cross-sectional in nature or do not assess social risk holistically across SDOH domains. Available SDOH and clinical risk factor data in large population-based databases are under-utilized for pSRS development. Recent advances in machine learning and artificial intelligence present unprecedented opportunities for SDOH integration and assessment in real-world data, with implications for pSRS development and validation for both clinical and healthcare utilization outcomes. pSRS presents unique opportunities to potentially improve traditional "clinical" models of CVD risk prediction. Future efforts should focus on fully utilizing available SDOH data in large epidemiological databases, testing pSRS efficacy in diverse population subgroups, and integrating pSRS into real-world clinical decision support systems to inform clinical care and advance cardiovascular health equity.
Collapse
Affiliation(s)
- Zulqarnain Javed
- Center for Cardiovascular Computational Health and Precision Medicine (C3-PH), Houston Methodist, Houston, TX, 77030, USA.
- Division of Cardiovascular Prevention and Wellness, Houston Methodist DeBakey Heart and Vascular Center, Houston, TX, 77030, USA.
- Houston Methodist Academic Institute, Houston, TX, 77030, USA.
| | - Harun Kundi
- Center for Cardiovascular Computational Health and Precision Medicine (C3-PH), Houston Methodist, Houston, TX, 77030, USA
- Division of Cardiovascular Prevention and Wellness, Houston Methodist DeBakey Heart and Vascular Center, Houston, TX, 77030, USA
| | - Ryan Chang
- Baylor College of Medicine, Houston, TX, USA
| | - Anoop Titus
- Division of Cardiovascular Prevention and Wellness, Houston Methodist DeBakey Heart and Vascular Center, Houston, TX, 77030, USA
| | - Hassaan Arshad
- Division of Cardiovascular Prevention and Wellness, Houston Methodist DeBakey Heart and Vascular Center, Houston, TX, 77030, USA
| |
Collapse
|
28
|
Rice BM. Using nursing science to advance policy and practice in the context of social and structural determinants of health. Nurs Outlook 2023; 71:102060. [PMID: 37852871 PMCID: PMC10843015 DOI: 10.1016/j.outlook.2023.102060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 09/17/2023] [Indexed: 10/20/2023]
Abstract
BACKGROUND Social and structural determinants of health play a large role in health inequities. PURPOSE To highlight how nursing science can be used to advance policy and practice in the context of social and structural determinants of health. METHODS This paper reports on the author's keynote presentation from the 2022 State of The Science Conference on Social and Structural Determinants of Health presented by the Council for the Advancement of Nursing Science. Key concepts are overviewed and defined, followed by examples of two community-engaged research projects with findings that inform practice and policy. The author concludes with individual-, social- and structural-level recommendations as a clarion call for nurses to use research to eliminate health inequities and promote justice for all. CONCLUSION What we know is, in part, only as good as what we do with that knowledge. When lives are at stake, gone are the days of knowing something and failing to act on that knowledge.
Collapse
Affiliation(s)
- Bridgette M Rice
- M. Louise Fitzpatrick College of Nursing, Villanova University, Villanova, PA.
| |
Collapse
|
29
|
Johnson TR, Berner ES, Feldman SS, Jones J, Valenta AL, Borbolla D, Deckard G, Manos L. Mapping the delineation of practice to the AMIA foundational domains for applied health informatics. J Am Med Inform Assoc 2023; 30:1593-1598. [PMID: 37500598 PMCID: PMC10531098 DOI: 10.1093/jamia/ocad146] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 07/25/2023] [Indexed: 07/29/2023] Open
Abstract
OBJECTIVE This article reports on the alignment between the foundational domains and the delineation of practice (DoP) for health informatics, both developed by the American Medical Informatics Association (AMIA). Whereas the foundational domains guide graduate-level curriculum development and accreditation assessment, providing an educational pathway to the minimum competencies needed as a health informatician, the DoP defines the domains, tasks, knowledge, and skills that a professional needs to competently perform in the discipline of health informatics. The purpose of this article is to determine whether the foundational domains need modification to better reflect applied practice. MATERIALS AND METHODS Using an iterative process and through individual and collective approaches, the foundational domains and the DoP statements were analyzed for alignment and eventual harmonization. Tables and Sankey plot diagrams were used to detail and illustrate the resulting alignment. RESULTS We were able to map all the individual DoP knowledge statements and tasks to the AMIA foundational domains, but the statements within a single DoP domain did not all map to the same foundational domain. Even though the AMIA foundational domains and DoP domains are not in perfect alignment, the DoP provides good examples of specific health informatics competencies for most of the foundational domains. There are, however, limited DoP knowledge statements and tasks mapping to foundational domain 6-Social and Behavioral Aspects of Health. DISCUSSION Both the foundational domains and the DoP were developed independently, several years apart, and for different purposes. The mapping analyses reveal similarities and differences between the practice experience and the curricular needs of health informaticians. CONCLUSIONS The overall alignment of both domains may be explained by the fact that both describe the current and/or future health informatics professional. One can think of the foundational domains as representing the broad foci for educational programs for health informaticians and, hence, they are appropriately the focus of organizations that accredit these programs.
Collapse
Affiliation(s)
- Todd R Johnson
- D. Bradley McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Eta S Berner
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Sue S Feldman
- Department of Health Services Administration, University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Josette Jones
- Luddy School of Informatics, Computing, and Engineering, Indiana University, Regenstrief Institute, Indianapolis, Indiana, USA
| | - Annette L Valenta
- Department of Biomedical and Health Information Sciences, University of Illinois Chicago, Chicago, Illinois, USA
| | - Damian Borbolla
- Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA
| | - Gloria Deckard
- Information Systems and Business Analytics, Florida International University, Miami, Florida, USA
| | - LaVerne Manos
- KU Center for Health Informatics, The University of Kansas KU Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
30
|
Lybarger K, Dobbins NJ, Long R, Singh A, Wedgeworth P, Uzuner Ö, Yetisgen M. Leveraging natural language processing to augment structured social determinants of health data in the electronic health record. J Am Med Inform Assoc 2023; 30:1389-1397. [PMID: 37130345 PMCID: PMC10354760 DOI: 10.1093/jamia/ocad073] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2022] [Revised: 04/06/2023] [Accepted: 04/12/2023] [Indexed: 05/04/2023] Open
Abstract
OBJECTIVE Social determinants of health (SDOH) impact health outcomes and are documented in the electronic health record (EHR) through structured data and unstructured clinical notes. However, clinical notes often contain more comprehensive SDOH information, detailing aspects such as status, severity, and temporality. This work has two primary objectives: (1) develop a natural language processing information extraction model to capture detailed SDOH information and (2) evaluate the information gain achieved by applying the SDOH extractor to clinical narratives and combining the extracted representations with existing structured data. MATERIALS AND METHODS We developed a novel SDOH extractor using a deep learning entity and relation extraction architecture to characterize SDOH across various dimensions. In an EHR case study, we applied the SDOH extractor to a large clinical data set with 225 089 patients and 430 406 notes with social history sections and compared the extracted SDOH information with existing structured data. RESULTS The SDOH extractor achieved 0.86 F1 on a withheld test set. In the EHR case study, we found extracted SDOH information complements existing structured data with 32% of homeless patients, 19% of current tobacco users, and 10% of drug users only having these health risk factors documented in the clinical narrative. CONCLUSIONS Utilizing EHR data to identify SDOH health risk factors and social needs may improve patient care and outcomes. Semantic representations of text-encoded SDOH information can augment existing structured data, and this more comprehensive SDOH representation can assist health systems in identifying and addressing these social needs.
Collapse
Affiliation(s)
- Kevin Lybarger
- Department of Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA
| | - Nicholas J Dobbins
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, Washington, USA
- Department of Research IT, UW Medicine, University of Washington, Seattle, Washington, USA
| | - Ritche Long
- Department of Research IT, UW Medicine, University of Washington, Seattle, Washington, USA
| | - Angad Singh
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Patrick Wedgeworth
- Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Özlem Uzuner
- Department of Information Sciences and Technology, George Mason University, Fairfax, Virginia, USA
| | - Meliha Yetisgen
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, Washington, USA
| |
Collapse
|
31
|
Romanowski B, Ben Abacha A, Fan Y. Extracting social determinants of health from clinical note text with classification and sequence-to-sequence approaches. J Am Med Inform Assoc 2023; 30:1448-1455. [PMID: 37100768 PMCID: PMC10354779 DOI: 10.1093/jamia/ocad071] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 03/07/2023] [Accepted: 04/18/2023] [Indexed: 04/28/2023] Open
Abstract
OBJECTIVE Social determinants of health (SDOH) are nonmedical factors that can influence health outcomes. This paper seeks to extract SDOH from clinical texts in the context of the National NLP Clinical Challenges (n2c2) 2022 Track 2 Task. MATERIALS AND METHODS Annotated and unannotated data from the Medical Information Mart for Intensive Care III (MIMIC-III) corpus, the Social History Annotation Corpus, and an in-house corpus were used to develop 2 deep learning models that used classification and sequence-to-sequence (seq2seq) approaches. RESULTS The seq2seq approach had the highest overall F1 scores in the challenge's 3 subtasks: 0.901 on the extraction subtask, 0.774 on the generalizability subtask, and 0.889 on the learning transfer subtask. DISCUSSION Both approaches rely on SDOH event representations that were designed to be compatible with transformer-based pretrained models, with the seq2seq representation supporting an arbitrary number of overlapping and sentence-spanning events. Models with adequate performance could be produced quickly, and the remaining mismatch between representation and task requirements was then addressed in postprocessing. The classification approach used rules to generate entity relationships from its sequence of token labels, while the seq2seq approach used constrained decoding and a constraint solver to recover entity text spans from its sequence of potentially ambiguous tokens. CONCLUSION We proposed 2 different approaches to extract SDOH from clinical texts with high accuracy. However, accuracy suffers on text from new healthcare institutions not present in the training data, and thus generalization remains an important topic for future study.
Collapse
Affiliation(s)
| | | | - Yadan Fan
- Nuance Communications, Burlington, Massachusetts, USA
| |
Collapse
|
32
|
Allen KS, Hood DR, Cummins J, Kasturi S, Mendonca EA, Vest JR. Natural language processing-driven state machines to extract social factors from unstructured clinical documentation. JAMIA Open 2023; 6:ooad024. [PMID: 37081945 PMCID: PMC10112959 DOI: 10.1093/jamiaopen/ooad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/08/2023] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Objective This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. Materials and Methods Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. Results PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Discussion We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. Conclusion The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
Collapse
Affiliation(s)
- Katie S Allen
- Corresponding Author: Katie S. Allen, BS, Center for Biomedical Informatics, Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN 46202, USA;
| | - Dan R Hood
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Jonathan Cummins
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Suranga Kasturi
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Eneida A Mendonca
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Joshua R Vest
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
- Department of Health Policy and Management, Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, Indiana, USA
| |
Collapse
|
33
|
Derton A, Guevara M, Chen S, Moningi S, Kozono DE, Liu D, Miller TA, Savova GK, Mak RH, Bitterman DS. Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes. JCO Clin Cancer Inform 2023; 7:e2200196. [PMID: 37235847 DOI: 10.1200/cci.22.00196] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Revised: 02/22/2023] [Accepted: 03/23/2023] [Indexed: 05/28/2023] Open
Abstract
PURPOSE There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities. METHODS This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers. RESULTS Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively). CONCLUSION Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.
Collapse
Affiliation(s)
- Abigail Derton
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
| | - Marco Guevara
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shan Chen
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Shalini Moningi
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - David E Kozono
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Dianbo Liu
- Mila-Quebec AI Institute, Montreal, QC, Canada
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA
| | - Raymond H Mak
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| | - Danielle S Bitterman
- Artificial Intelligence in Medicine (AIM) Program, Mass General Brigham, Harvard Medical School, Boston, MA
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA
| |
Collapse
|
34
|
Ahmad PN, Shah AM, Lee K. A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain. Healthcare (Basel) 2023; 11:1268. [PMID: 37174810 PMCID: PMC10178605 DOI: 10.3390/healthcare11091268] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/24/2023] [Accepted: 04/26/2023] [Indexed: 05/15/2023] Open
Abstract
Biomedical-named entity recognition (bNER) is critical in biomedical informatics. It identifies biomedical entities with special meanings, such as people, places, and organizations, as predefined semantic types in electronic health records (EHR). bNER is essential for discovering novel knowledge using computational methods and Information Technology. Early bNER systems were configured manually to include domain-specific features and rules. However, these systems were limited in handling the complexity of the biomedical text. Recent advances in deep learning (DL) have led to the development of more powerful bNER systems. DL-based bNER systems can learn the patterns of biomedical text automatically, making them more robust and efficient than traditional rule-based systems. This paper reviews the healthcare domain of bNER, using DL techniques and artificial intelligence in clinical records, for mining treatment prediction. bNER-based tools are categorized systematically and represent the distribution of input, context, and tag (encoder/decoder). Furthermore, to create a labeled dataset for our machine learning sentiment analyzer to analyze the sentiment of a set of tweets, we used a manual coding approach and the multi-task learning method to bias the training signals with domain knowledge inductively. To conclude, we discuss the challenges facing bNER systems and future directions in the healthcare field.
Collapse
Affiliation(s)
- Pir Noman Ahmad
- School of Computer Science, Harbin Institute of Technology, Harbin 150001, China
| | - Adnan Muhammad Shah
- Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea
| | - KangYoon Lee
- Department of Computer Engineering, Gachon University, Seongnam 13120, Republic of Korea
| |
Collapse
|
35
|
Leightley D, Palmer L, Williamson C, Leal R, Chandran D, Murphy D, Fear NT, Stevelink SAM. Identifying Military Service Status in Electronic Healthcare Records from Psychiatric Secondary Healthcare Services: A Validation Exercise Using the Military Service Identification Tool. Healthcare (Basel) 2023; 11:healthcare11040524. [PMID: 36833058 PMCID: PMC9957026 DOI: 10.3390/healthcare11040524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 02/03/2023] [Accepted: 02/07/2023] [Indexed: 02/12/2023] Open
Abstract
Electronic healthcare records (EHRs) are a rich source of information with a range of uses in secondary research. In the United Kingdom, there is no pan-national or nationally accepted marker indicating veteran status across all healthcare services. This presents significant obstacles to determining the healthcare needs of veterans using EHRs. To address this issue, we developed the Military Service Identification Tool (MSIT), using an iterative two-staged approach. In the first stage, a Structured Query Language approach was developed to identify veterans using a keyword rule-based approach. This informed the second stage, which was the development of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90, a sensitivity of 0.91, and a negative predictive value of 0.98. To further validate the performance of the MSIT, the present study sought to verify the accuracy of the EHRs that trained the MSIT models. To achieve this, we surveyed 902 patients of a local specialist mental healthcare service, with 146 (16.2%) being asked if they had or had not served in the Armed Forces. In total 112 (76.7%) reported that they had not served, and 34 (23.3%) reported that they had served in the Armed Forces (accuracy: 0.84, sensitivity: 0.82, specificity: 0.91). The MSIT has the potential to be used for identifying veterans in the UK from free-text clinical documents and future use should be explored.
Collapse
Affiliation(s)
- Daniel Leightley
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
- Correspondence:
| | - Laura Palmer
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
| | - Charlotte Williamson
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
| | - Ray Leal
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
| | - Dave Chandran
- Biomedical Research Centre (BRC), Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE58AB, UK
| | - Dominic Murphy
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
- Combat Stress, Tyrwhitt House, Oaklawn Road, Leatherhead, London KT22 0BX, UK
| | - Nicola T. Fear
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
- Academic Department of Military Mental Health, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
| | - Sharon A. M. Stevelink
- King’s Centre for Military Health Research, King’s College London, Weston Education Centre, Cutcombe Road, London SE5 9RJ, UK
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London SE58AB, UK
| |
Collapse
|
36
|
Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023; 30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used. MATERIALS AND METHODS We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies. RESULTS Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions. DISCUSSION Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released. CONCLUSION Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.
Collapse
Affiliation(s)
- Siyue Yang
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
| | | | - Ellen Stephenson
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Jessica Gronsbell
- Department of Statistical Sciences, University of Toronto, Toronto, Ontario, Canada
- Department of Family & Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
37
|
Thompson HM, Sharma B, Smith DL, Bhalla S, Erondu I, Hazra A, Ilyas Y, Pachwicewicz P, Sheth NK, Chhabra N, Karnik NS, Afshar M. Machine Learning Techniques to Explore Clinical Presentations of COVID-19 Severity and to Test the Association With Unhealthy Opioid Use: Retrospective Cross-sectional Cohort Study. JMIR Public Health Surveill 2022; 8:e38158. [PMID: 36265163 PMCID: PMC9746674 DOI: 10.2196/38158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/23/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
BACKGROUND The COVID-19 pandemic has exacerbated health inequities in the United States. People with unhealthy opioid use (UOU) may face disproportionate challenges with COVID-19 precautions, and the pandemic has disrupted access to opioids and UOU treatments. UOU impairs the immunological, cardiovascular, pulmonary, renal, and neurological systems and may increase severity of outcomes for COVID-19. OBJECTIVE We applied machine learning techniques to explore clinical presentations of hospitalized patients with UOU and COVID-19 and to test the association between UOU and COVID-19 disease severity. METHODS This retrospective, cross-sectional cohort study was conducted based on data from 4110 electronic health record patient encounters at an academic health center in Chicago between January 1, 2020, and December 31, 2020. The inclusion criterion was an unplanned admission of a patient aged ≥18 years; encounters were counted as COVID-19-positive if there was a positive test for COVID-19 or 2 COVID-19 International Classification of Disease, Tenth Revision codes. Using a predefined cutoff with optimal sensitivity and specificity to identify UOU, we ran a machine learning UOU classifier on the data for patients with COVID-19 to estimate the subcohort of patients with UOU. Topic modeling was used to explore and compare the clinical presentations documented for 2 subgroups: encounters with UOU and COVID-19 and those with no UOU and COVID-19. Mixed effects logistic regression accounted for multiple encounters for some patients and tested the association between UOU and COVID-19 outcome severity. Severity was measured with 3 utilization metrics: low-severity unplanned admission, medium-severity unplanned admission and receiving mechanical ventilation, and high-severity unplanned admission with in-hospital death. All models controlled for age, sex, race/ethnicity, insurance status, and BMI. RESULTS Topic modeling yielded 10 topics per subgroup and highlighted unique comorbidities associated with UOU and COVID-19 (eg, HIV) and no UOU and COVID-19 (eg, diabetes). In the regression analysis, each incremental increase in the classifier's predicted probability of UOU was associated with 1.16 higher odds of COVID-19 outcome severity (odds ratio 1.16, 95% CI 1.04-1.29; P=.009). CONCLUSIONS Among patients hospitalized with COVID-19, UOU is an independent risk factor associated with greater outcome severity, including in-hospital death. Social determinants of health and opioid-related overdose are unique comorbidities in the clinical presentation of the UOU patient subgroup. Additional research is needed on the role of COVID-19 therapeutics and inpatient management of acute COVID-19 pneumonia for patients with UOU. Further research is needed to test associations between expanded evidence-based harm reduction strategies for UOU and vaccination rates, hospitalizations, and risks for overdose and death among people with UOU and COVID-19. Machine learning techniques may offer more exhaustive means for cohort discovery and a novel mixed methods approach to population health.
Collapse
Affiliation(s)
- Hale M Thompson
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
- Center for Education, Research, and Advocacy, Department of Social and Behavioral Research, Howard Brown Health, Chicago, IL, United States
| | - Brihat Sharma
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Dale L Smith
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Sameer Bhalla
- Department of Internal Medicine, Rush University Medical Center, Chicago, IL, United States
| | - Ihuoma Erondu
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Aniruddha Hazra
- Section of Infectious Diseases and Global Health, Department of Medicine, University of Chicago, Chicago, IL, United States
| | - Yousaf Ilyas
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Paul Pachwicewicz
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Neeral K Sheth
- Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Neeraj Chhabra
- Department of Emergency Medicine, Rush University Medical College, Rush University Medical Center, Chicago, IL, United States
| | - Niranjan S Karnik
- Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
| | - Majid Afshar
- Division of Pulmonary and Critical Care, Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, United States
| |
Collapse
|
38
|
Anetta K, Horak A, Wojakowski W, Wita K, Jadczyk T. Deep Learning Analysis of Polish Electronic Health Records for Diagnosis Prediction in Patients with Cardiovascular Diseases. J Pers Med 2022; 12:jpm12060869. [PMID: 35743653 PMCID: PMC9225281 DOI: 10.3390/jpm12060869] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2022] [Revised: 05/12/2022] [Accepted: 05/23/2022] [Indexed: 02/05/2023] Open
Abstract
Electronic health records naturally contain most of the medical information in the form of doctor’s notes as unstructured or semi-structured texts. Current deep learning text analysis approaches allow researchers to reveal the inner semantics of text information and even identify hidden consequences that can offer extra decision support to doctors. In the presented article, we offer a new automated analysis of Polish summary texts of patient hospitalizations. The presented models were found to be able to predict the final diagnosis with almost 70% accuracy based just on the patient’s medical history (only 132 words on average), with possible accuracy increases when adding further sentences from hospitalization results; even one sentence was found to improve the results by 4%, and the best accuracy of 78% was achieved with five extra sentences. In addition to detailed descriptions of the data and methodology, we present an evaluation of the analysis using more than 50,000 Polish cardiology patient texts and dive into a detailed error analysis of the approach. The results indicate that the deep analysis of just the medical history summary can suggest the direction of diagnosis with a high probability that can be further increased just by supplementing the records with further examination results.
Collapse
Affiliation(s)
- Kristof Anetta
- Natural Language Processing Centre, Faculty of Informatics, Masaryk University, 602 00 Brno, Czech Republic;
| | - Ales Horak
- Natural Language Processing Centre, Faculty of Informatics, Masaryk University, 602 00 Brno, Czech Republic;
- Correspondence: (A.H.); (T.J.)
| | - Wojciech Wojakowski
- Department of Cardiology and Structural Heart Diseases, School of Medicine in Katowice, Medical University of Silesia, 40-055 Katowice, Poland;
| | - Krystian Wita
- First Department of Cardiology, Medical University of Silesia, 40-055 Katowice, Poland;
| | - Tomasz Jadczyk
- Department of Cardiology and Structural Heart Diseases, School of Medicine in Katowice, Medical University of Silesia, 40-055 Katowice, Poland;
- Interventional Cardiac Electrophysiology Group, International Clinical Research Center, St. Anne’s University Hospital Brno, 656 91 Brno, Czech Republic
- Correspondence: (A.H.); (T.J.)
| |
Collapse
|