1
|
Abulibdeh R, Tu K, Butt DA, Train A, Crampton N, Sejdić E. Assessing the capture of sociodemographic information in electronic medical records to inform clinical decision making. PLoS One 2025; 20:e0317599. [PMID: 39823404 PMCID: PMC11741650 DOI: 10.1371/journal.pone.0317599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Accepted: 01/01/2025] [Indexed: 01/19/2025] Open
Abstract
There is a growing need to document sociodemographic factors in electronic medical records to produce representative cohorts for medical research and to perform focused research for potentially vulnerable populations. The objective of this work was to assess the content of family physicians' electronic medical records and characterize the quality of the documentation of sociodemographic characteristics. Descriptive statistics were reported for each sociodemographic characteristic. The association between the completeness rates of the sociodemographic data and the various clinics, electronic medical record vendors, and physician characteristics was analyzed. Supervised machine learning models were used to determine the absence or presence of each characteristic for all adult patients over the age of 18 in the database. Documentation of marital status (51.0%) and occupation (47.2%) were significantly higher compared to the rest of the variables. Race (1.4%), sexual orientation (2.5%), and gender identity (0.8%) had the lowest documentation rates with a 97.5% missingness rate or higher. The correlation analysis for vendor type demonstrated that there was significant variation in the availability of marital and occupation information between vendors (χ2 > 6.0, P < 0.05). Variability in documentation between clinics indicated that the majority of characteristics exhibited high variation in completeness rates with the highest variation for occupation (median: 47.2, interquartile range: 60.6%) and marital status (median: 45.6, interquartile: 59.7%). Finally, physician sex, years since a physician graduated, and whether a physician was a foreign vs a Canadian medical graduate were significantly associated with documentation rates of place of birth, citizenship status, occupation, and education in the electronic medical records. Our findings suggest a crucial need to implement better documentation strategies for sociodemographic information in the healthcare setting. To improve completeness rates, healthcare systems should monitor, encourage, enforce, or incentivize sociodemographic data collection standards.
Collapse
Affiliation(s)
- Rawan Abulibdeh
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
| | - Karen Tu
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
- Toronto Western Hospital Family Health Team, University Health Network, Toronto, Ontario, Canada
| | - Debra A. Butt
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
- Department of Family and Community Medicine, Scarborough Health Network, Scarborough, Ontario, Canada
| | - Anthony Train
- Department of Family Medicine, Queen’s University, Kingston, Ontario, Canada
| | - Noah Crampton
- Department of Family and Community Medicine, University of Toronto, Toronto, Ontario, Canada
| | - Ervin Sejdić
- Department of Electrical and Computer Engineering, University of Toronto, Toronto, Ontario, Canada
- North York General Hospital, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Hobensack M, Scharp D, Song J, Topaz M. Documentation of social determinants of health across individuals from different racial and ethnic groups in home healthcare. J Nurs Scholarsh 2025; 57:39-46. [PMID: 38739091 DOI: 10.1111/jnu.12980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 04/23/2024] [Accepted: 04/24/2024] [Indexed: 05/14/2024]
Abstract
INTRODUCTION Home healthcare (HHC) enables patients to receive healthcare services within their homes to manage chronic conditions and recover from illnesses. Recent research has identified disparities in HHC based on race or ethnicity. Social determinants of health (SDOH) describe the external factors influencing a patient's health, such as access to care and social support. Individuals from racially or ethnically minoritized communities are known to be disproportionately affected by SDOH. Existing evidence suggests that SDOH are documented in clinical notes. However, no prior study has investigated the documentation of SDOH across individuals from different racial or ethnic backgrounds in the HHC setting. This study aimed to (1) describe frequencies of SDOH documented in clinical notes by race or ethnicity and (2) determine associations between race or ethnicity and SDOH documentation. DESIGN Retrospective data analysis. METHODS We conducted a cross-sectional secondary data analysis of 86,866 HHC episodes representing 65,693 unique patients from one large HHC agency in New York collected between January 1, 2015, and December 31, 2017. We reported the frequency of six SDOH (physical environment, social environment, housing and economic circumstances, food insecurity, access to care, and education and literacy) documented in clinical notes across individuals reported as Asian/Pacific Islander, Black, Hispanic, multi-racial, Native American, or White. We analyzed differences in SDOH documentation by race or ethnicity using logistic regression models. RESULTS Compared to patients reported as White, patients across other racial or ethnic groups had higher frequencies of SDOH documented in their clinical notes. Our results suggest that race or ethnicity is associated with SDOH documentation in HHC. CONCLUSION As the study of SDOH in HHC continues to evolve, our results provide a foundation to evaluate social information in the HHC setting and understand how it influences the quality of care provided. CLINICAL RELEVANCE The results of this exploratory study can help clinicians understand the differences in SDOH across individuals from different racial and ethnic groups and serve as a foundation for future research aimed at fostering more inclusive HHC documentation practices.
Collapse
Affiliation(s)
- Mollie Hobensack
- Department of Geriatrics and Palliative Care, Icahn School of Medicine at Mount Sinai, New York City, New York, USA
| | - Danielle Scharp
- Columbia University School of Nursing, New York City, New York, USA
| | - Jiyoun Song
- University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York City, New York, USA
- Data Science Institute, Columbia University, New York City, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York City, New York, USA
| |
Collapse
|
3
|
Kim BY, Anthopolos R, Do H, Zhong J. Model-based estimation of individual-level social determinants of health and its applications in All of Us. J Am Med Inform Assoc 2024; 31:2880-2889. [PMID: 39003521 PMCID: PMC11631124 DOI: 10.1093/jamia/ocae168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 06/11/2024] [Accepted: 07/07/2024] [Indexed: 07/15/2024] Open
Abstract
OBJECTIVES We introduce a widely applicable model-based approach for estimating individual-level Social Determinants of Health (SDoH) and evaluate its effectiveness using the All of Us Research Program. MATERIALS AND METHODS Our approach utilizes aggregated SDoH datasets to estimate individual-level SDoH, demonstrated with examples of no high school diploma (NOHSDP) and no health insurance (UNINSUR) variables. Models are estimated using American Community Survey data and applied to derive individual-level estimates for All of Us participants. We assess concordance between model-based SDoH estimates and self-reported SDoHs in All of Us and examine associations with undiagnosed hypertension and diabetes. RESULTS Compared to self-reported SDoHs, the area under the curve for NOHSDP is 0.727 (95% CI, 0.724-0.730) and for UNINSUR is 0.730 (95% CI, 0.727-0.733) among the 329 074 All of Us participants, both significantly higher than aggregated SDoHs. The association between model-based NOHSDP and undiagnosed hypertension is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.649. Similarly, the association between model-based NOHSDP and undiagnosed diabetes is concordant with those estimated using self-reported NOHSDP, with a correlation coefficient of 0.900. DISCUSSION AND CONCLUSION The model-based SDoH estimation method offers a scalable and easily standardized approach for estimating individual-level SDoHs. Using the All of Us dataset, we demonstrate reasonable concordance between model-based SDoH estimates and self-reported SDoHs, along with consistent associations with health outcomes. Our findings also underscore the critical role of geographic contexts in SDoH estimation and in evaluating the association between SDoHs and health outcomes.
Collapse
Affiliation(s)
- Bo Young Kim
- Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Rebecca Anthopolos
- Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Hyungrok Do
- Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Judy Zhong
- Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
| |
Collapse
|
4
|
Grant RW, McCloskey JK, Uratsu CS, Ranatunga D, Ralston JD, Bayliss EA, Sofrygin O. Predicting Self-Reported Social Risk in Medically Complex Adults Using Electronic Health Data. Med Care 2024; 62:590-598. [PMID: 38833715 DOI: 10.1097/mlr.0000000000002021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2024]
Abstract
BACKGROUND Social barriers to health care, such as food insecurity, financial distress, and housing instability, may impede effective clinical management for individuals with chronic illness. Systematic strategies are needed to more efficiently identify at-risk individuals who may benefit from proactive outreach by health care systems for screening and referral to available social resources. OBJECTIVE To create a predictive model to identify a higher likelihood of food insecurity, financial distress, and/or housing instability among adults with multiple chronic medical conditions. RESEARCH DESIGN AND SUBJECTS We developed and validated a predictive model in adults with 2 or more chronic conditions who were receiving care within Kaiser Permanente Northern California (KPNC) between January 2017 and February 2020. The model was developed to predict the likelihood of a "yes" response to any of 3 validated self-reported survey questions related to current concerns about food insecurity, financial distress, and/or housing instability. External model validation was conducted in a separate cohort of adult non-Medicaid KPNC members aged 35-85 who completed a survey administered to a random sample of health plan members between April and June 2021 (n = 2820). MEASURES We examined the performance of multiple model iterations by comparing areas under the receiver operating characteristic curves (AUCs). We also assessed algorithmic bias related to race/ethnicity and calculated model performance at defined risk thresholds for screening implementation. RESULTS Patients in the primary modeling cohort (n = 11,999) had a mean age of 53.8 (±19.3) years, 64.7% were women, and 63.9% were of non-White race/ethnicity. The final, simplified model with 30 predictors (including utilization, diagnosis, behavior, insurance, neighborhood, and pharmacy-based variables) had an AUC of 0.68. The model remained robust within different race/ethnic strata. CONCLUSIONS Our results demonstrated that a predictive model developed using information gleaned from the medical record and from public census tract data can be used to identify patients who may benefit from proactive social needs assessment. Depending on the prevalence of social needs in the target population, different risk output thresholds could be set to optimize positive predictive value for successful outreach. This predictive model-based strategy provides a pathway for prioritizing more intensive social risk outreach and screening efforts to the patients who may be in greatest need.
Collapse
Affiliation(s)
- Richard W Grant
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - Jodi K McCloskey
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - Connie S Uratsu
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - Dilrini Ranatunga
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| | - James D Ralston
- Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle WA
| | | | - Oleg Sofrygin
- Division of Research, Kaiser Permanente Northern California, Oakland, CA
| |
Collapse
|
5
|
Schmidt J, Schutte NM, Buttigieg S, Novillo-Ortiz D, Sutherland E, Anderson M, de Witte B, Peolsson M, Unim B, Pavlova M, Stern AD, Mossialos E, van Kessel R. Mapping the regulatory landscape for artificial intelligence in health within the European Union. NPJ Digit Med 2024; 7:229. [PMID: 39191937 PMCID: PMC11350181 DOI: 10.1038/s41746-024-01221-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 08/11/2024] [Indexed: 08/29/2024] Open
Abstract
Regulatory frameworks for artificial intelligence (AI) are needed to mitigate risks while ensuring the ethical, secure, and effective implementation of AI technology in healthcare and population health. In this article, we present a synthesis of 141 binding policies applicable to AI in healthcare and population health in the EU and 10 European countries. The EU AI Act sets the overall regulatory framework for AI, while other legislations set social, health, and human rights standards, address the safety of technologies and the implementation of innovation, and ensure the protection and safe use of data. Regulation specifically pertaining to AI is still nascent and scarce, though a combination of data, technology, innovation, and health and human rights policy has already formed a baseline regulatory framework for AI in health. Future work should explore specific regulatory challenges, especially with respect to AI medical devices, data protection, and data enablement.
Collapse
Affiliation(s)
- Jelena Schmidt
- Department of International Health, Care and Public Health Research Institute (CAPHRI), Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Nienke M Schutte
- Innovation in Health Information Systems Unit, SD Data Governance, Sciensano, Brussels, Belgium
| | - Stefan Buttigieg
- Ministry for Health and Active Ageing, Valletta, Malta
- Faculty of Health Sciences, University of Malta, Msida, Malta
| | - David Novillo-Ortiz
- Division of Country Health Policies and Systems, World Health Organization Regional Office for Europe, Copenhagen, Denmark
| | | | - Michael Anderson
- LSE Health, Department of Health Policy, London School of Economics and Political Science, London, United Kingdom
| | | | | | - Brigid Unim
- Department of Cardiovascular, Endocrine-Metabolic Diseases and Aging, National Institute of Health, Rome, Italy
| | - Milena Pavlova
- Department of Health Services Research, Care and Public Health Research Institute (CAPHRI), Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands
| | - Ariel Dora Stern
- Harvard Business School Technology and Operations Management, Boston, MS, USA
- Harvard-MIT Center for Regulatory Science, Boston, MS, USA
- Digital Health Cluster, Hasso-Plattner Institute, University of Potsdam, Potsdam, Germany
| | - Elias Mossialos
- LSE Health, Department of Health Policy, London School of Economics and Political Science, London, United Kingdom
- Institute of Global Health Innovation, Imperial College London, London, United Kingdom
| | - Robin van Kessel
- Department of International Health, Care and Public Health Research Institute (CAPHRI), Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, Netherlands.
- LSE Health, Department of Health Policy, London School of Economics and Political Science, London, United Kingdom.
- Department of Psychiatry, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
6
|
Garies S, Liang S, Weyman K, Durant S, Ramji N, Alhaj M, Pinto A. Artificial intelligence in primary care practice: Qualitative study to understand perspectives on using AI to derive patient social data. CANADIAN FAMILY PHYSICIAN MEDECIN DE FAMILLE CANADIEN 2024; 70:e102-e109. [PMID: 39122422 PMCID: PMC11328713 DOI: 10.46747/cfp.700708e102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/12/2024]
Abstract
OBJECTIVE To understand the perspectives of primary care clinicians and health system leaders on the use of artificial intelligence (AI) to derive information about patients' social determinants of health. DESIGN Qualitative study. SETTING Ontario, Canada. METHODS Semistructured, 30-minute virtual interviews were conducted with eligible participants across Ontario wherein they were asked about their perceptions of using AI to derive social data for patients. A descriptive content analysis was used to elicit themes from the data. MAIN FINDINGS A total of 12 interviews were conducted with 7 family physicians, 3 clinical team members of various health professions, and 2 health system leaders. Five main themes described the current state of social determinants of health information, perceived benefits of and concerns with using AI to derive social data, how participants would want to see and use AI-derived social data, and suggestions for ethical principles that should underpin the development of this AI tool. CONCLUSION Most participants were enthusiastic about the possibility of using AI to derive social data for patients in primary care but noted concerns that should be addressed first. These findings can guide the development of AI-based tools for use in primary care settings.
Collapse
Affiliation(s)
- Stephanie Garies
- Postdoctoral fellow (at the time of writing) affiliated with the Department of Family and Community Medicine through St Michael's Hospital at Unity Health Toronto in Ontario, and with the Upstream Lab in the MAP Centre for Urban Health Solutions
| | - Simon Liang
- Family medicine resident (at the time of writing) in Department of Family & Community Medicine at St Michael's Hospital through Unity Health Toronto
| | - Karen Weyman
- Associate Professor in the Department of Family & Community Medicine at the University of Toronto in Ontario and a family physician at St Michael's Hospital
| | - Steve Durant
- Research coordinator (at the time of writing) of the Upstream Lab
| | - Noor Ramji
- Family physician and Practice Improvement Program Director in the Department of Family and Community Medicine at the University of Toronto
| | - Mo Alhaj
- Quality Improvement Specialist at St Michael's Hospital
| | - Andrew Pinto
- Director of the Upstream Lab, a public health and preventive medicine specialist and family physician at St Michael's Hospital, and Associate Professor at the University of Toronto
| |
Collapse
|
7
|
Keloth VK, Selek S, Chen Q, Gilman C, Fu S, Dang Y, Chen X, Hu X, Zhou Y, He H, Fan JW, Wang K, Brandt C, Tao C, Liu H, Xu H. Large Language Models for Social Determinants of Health Information Extraction from Clinical Notes - A Generalizable Approach across Institutions. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.21.24307726. [PMID: 38826441 PMCID: PMC11142292 DOI: 10.1101/2024.05.21.24307726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
The consistent and persuasive evidence illustrating the influence of social determinants on health has prompted a growing realization throughout the health care sector that enhancing health and health equity will likely depend, at least to some extent, on addressing detrimental social determinants. However, detailed social determinants of health (SDoH) information is often buried within clinical narrative text in electronic health records (EHRs), necessitating natural language processing (NLP) methods to automatically extract these details. Most current NLP efforts for SDoH extraction have been limited, investigating on limited types of SDoH elements, deriving data from a single institution, focusing on specific patient cohorts or note types, with reduced focus on generalizability. This study aims to address these issues by creating cross-institutional corpora spanning different note types and healthcare systems, and developing and evaluating the generalizability of classification models, including novel large language models (LLMs), for detecting SDoH factors from diverse types of notes from four institutions: Harris County Psychiatric Center, University of Texas Physician Practice, Beth Israel Deaconess Medical Center, and Mayo Clinic. Four corpora of deidentified clinical notes were annotated with 21 SDoH factors at two levels: level 1 with SDoH factor types only and level 2 with SDoH factors along with associated values. Three traditional classification algorithms (XGBoost, TextCNN, Sentence BERT) and an instruction tuned LLM-based approach (LLaMA) were developed to identify multiple SDoH factors. Substantial variation was noted in SDoH documentation practices and label distributions based on patient cohorts, note types, and hospitals. The LLM achieved top performance with micro-averaged F1 scores over 0.9 on level 1 annotated corpora and an F1 over 0.84 on level 2 annotated corpora. While models performed well when trained and tested on individual datasets, cross-dataset generalization highlighted remaining obstacles. To foster collaboration, access to partial annotated corpora and models trained by merging all annotated datasets will be made available on the PhysioNet repository.
Collapse
Affiliation(s)
- Vipina K. Keloth
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Salih Selek
- Department of Psychiatry and Behavioral Sciences, UTHealth McGovern Medical School, Houston, TX, USA
| | - Qingyu Chen
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Christopher Gilman
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Sunyang Fu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Yifang Dang
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xinghan Chen
- School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xinyue Hu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
| | - Yujia Zhou
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Huan He
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Jungwei W. Fan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Karen Wang
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
- Equity Research and Innovation Center, Yale School of Medicine, New Haven, CT, USA
| | - Cynthia Brandt
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| | - Cui Tao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Jacksonville, FL, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Hongfang Liu
- McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hua Xu
- Department of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, CT, USA
| |
Collapse
|
8
|
Sushil M, Butte AJ, Schuit E, van Smeden M, Leeuwenberg AM. Cross-institution natural language processing for reliable clinical association studies: a methodological exploration. J Clin Epidemiol 2024; 167:111258. [PMID: 38219811 DOI: 10.1016/j.jclinepi.2024.111258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 12/21/2023] [Accepted: 01/08/2024] [Indexed: 01/16/2024]
Abstract
OBJECTIVES Natural language processing (NLP) of clinical notes in electronic medical records is increasingly used to extract otherwise sparsely available patient characteristics, to assess their association with relevant health outcomes. Manual data curation is resource intensive and NLP methods make these studies more feasible. However, the methodology of using NLP methods reliably in clinical research is understudied. The objective of this study is to investigate how NLP models could be used to extract study variables (specifically exposures) to reliably conduct exposure-outcome association studies. STUDY DESIGN AND SETTING In a convenience sample of patients admitted to the intensive care unit of a US academic health system, multiple association studies are conducted, comparing the association estimates based on NLP-extracted vs. manually extracted exposure variables. The association studies varied in NLP model architecture (Bidirectional Encoder Decoder from Transformers, Long Short-Term Memory), training paradigm (training a new model, fine-tuning an existing external model), extracted exposures (employment status, living status, and substance use), health outcomes (having a do-not-resuscitate/intubate code, length of stay, and in-hospital mortality), missing data handling (multiple imputation vs. complete case analysis), and the application of measurement error correction (via regression calibration). RESULTS The study was conducted on 1,174 participants (median [interquartile range] age, 61 [50, 73] years; 60.6% male). Additionally, up to 500 discharge reports of participants from the same health system and 2,528 reports of participants from an external health system were used to train the NLP models. Substantial differences were found between the associations based on NLP-extracted and manually extracted exposures under all settings. The error in association was only weakly correlated with the overall F1 score of the NLP models. CONCLUSION Associations estimated using NLP-extracted exposures should be interpreted with caution. Further research is needed to set conditions for reliable use of NLP in medical association studies.
Collapse
Affiliation(s)
- Madhumita Sushil
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, USA
| | - Ewoud Schuit
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten van Smeden
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Artuur M Leeuwenberg
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
9
|
Ong JCL, Seng BJJ, Law JZF, Low LL, Kwa ALH, Giacomini KM, Ting DSW. Artificial intelligence, ChatGPT, and other large language models for social determinants of health: Current state and future directions. Cell Rep Med 2024; 5:101356. [PMID: 38232690 PMCID: PMC10829781 DOI: 10.1016/j.xcrm.2023.101356] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 10/12/2023] [Accepted: 12/10/2023] [Indexed: 01/19/2024]
Abstract
This perspective highlights the importance of addressing social determinants of health (SDOH) in patient health outcomes and health inequity, a global problem exacerbated by the COVID-19 pandemic. We provide a broad discussion on current developments in digital health and artificial intelligence (AI), including large language models (LLMs), as transformative tools in addressing SDOH factors, offering new capabilities for disease surveillance and patient care. Simultaneously, we bring attention to challenges, such as data standardization, infrastructure limitations, digital literacy, and algorithmic bias, that could hinder equitable access to AI benefits. For LLMs, we highlight potential unique challenges and risks including environmental impact, unfair labor practices, inadvertent disinformation or "hallucinations," proliferation of bias, and infringement of copyrights. We propose the need for a multitiered approach to digital inclusion as an SDOH and the development of ethical and responsible AI practice frameworks globally and provide suggestions on bridging the gap from development to implementation of equitable AI technologies.
Collapse
Affiliation(s)
- Jasmine Chiat Ling Ong
- Division of Pharmacy, Singapore General Hospital, Singapore, Singapore; SingHealth Duke-NUS Medicine Academic Clinical Programme, Singapore, Singapore
| | - Benjamin Jun Jie Seng
- MOHH Holdings (Singapore) Pte., Ltd., Singapore, Singapore; SingHealth Duke-NUS Family Medicine Academic Clinical Programme, Singapore, Singapore
| | | | - Lian Leng Low
- SingHealth Duke-NUS Family Medicine Academic Clinical Programme, Singapore, Singapore; Population Health and Integrated Care Office, Singapore General Hospital, Singapore, Singapore; Centre for Population Health Research and Implementation, SingHealth Regional Health System, Singapore, Singapore; Outram Community Hospital, SingHealth Community Hospitals, Singapore, Singapore
| | - Andrea Lay Hoon Kwa
- Division of Pharmacy, Singapore General Hospital, Singapore, Singapore; SingHealth Duke-NUS Medicine Academic Clinical Programme, Singapore, Singapore; Emerging Infectious Diseases, Duke-NUS Medical School, Singapore, Singapore
| | - Kathleen M Giacomini
- Department of Bioengineering and Therapeutic Sciences, Schools of Pharmacy and Medicine, University of California, San Francisco, San Francisco, CA, USA
| | - Daniel Shu Wei Ting
- Artificial Intelligence and Digital Innovation Research Group, Singapore Eye Research, Singapore, Singapore; Duke-NUS Medical School, National University of Singapore, Singapore, Singapore; Byers Eye Institute, Stanford University, Stanford, CA, USA.
| |
Collapse
|
10
|
Wu W, Holkeboer KJ, Kolawole TO, Carbone L, Mahmoudi E. Natural language processing to identify social determinants of health in Alzheimer's disease and related dementia from electronic health records. Health Serv Res 2023; 58:1292-1302. [PMID: 37534741 PMCID: PMC10622277 DOI: 10.1111/1475-6773.14210] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/04/2023] Open
Abstract
OBJECTIVE To develop a natural language processing (NLP) algorithm that identifies social determinants of health (SDoH), including housing, transportation, food, and medication insecurities, social isolation, abuse, neglect, or exploitation, and financial difficulties for patients with Alzheimer's disease and related dementias (ADRD) from unstructured electronic health records (EHRs). DATA SOURCES AND STUDY SETTING We leveraged 1000 medical notes randomly selected from 7401 emergency department and inpatient social worker notes generated between 2015 and 2019 for 231 unique patients diagnosed with ADRD at Michigan Medicine. STUDY DESIGN We developed a rule-based NLP algorithm for the identification of seven domains of SDoH noted above. We also compared the rule-based algorithm with deep learning and regularized logistic regression approaches. These models were compared using accuracy, sensitivity, specificity, F1 score, and the area under the receiver operating characteristic curve (AUC). All notes were split into 700 notes for training NLP algorithms, and 300 notes for validation. DATA COLLECTION/EXTRACTION METHODS Social worker notes used in this study were extracted from the Michigan Medicine EHR database. PRINCIPAL FINDINGS Of the 700 notes for training, F1 and AUC for the rule-based algorithm were at least 0.94 and 0.95, respectively, for all SDoH categories. Of the 300 notes for validation, F1 and AUC were at least 0.80 and 0.97, respectively, for all SDoH except housing and medication insecurities. The deep learning and regularized logistic regression algorithms had unsatisfactory performance. CONCLUSIONS The rule-based algorithm can accurately extract SDoH information in all seven domains of SDoH except housing and medication insecurities. Findings from the algorithm can be used by clinicians and social workers to proactively address social needs of patients with ADRD and other vulnerable patient populations.
Collapse
Affiliation(s)
- Wenbo Wu
- Departments of Population Health and Medicine, Grossman School of MedicineNew York UniversityNew York CityNew YorkUSA
- Center for Data ScienceNew York UniversityNew York CityNew YorkUSA
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Kaes J. Holkeboer
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
- College of Literature, Science, and the ArtsUniversity of MichiganAnn ArborMichiganUSA
| | - Temidun O. Kolawole
- Krieger School of Arts and SciencesJohns Hopkins UniversityBaltimoreMarylandUSA
| | - Lorrie Carbone
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
| | - Elham Mahmoudi
- Department of Family MedicineUniversity of Michigan Medical SchoolAnn ArborMichiganUSA
- Institute for Healthcare Policy and InnovationUniversity of MichiganAnn ArborMichiganUSA
| |
Collapse
|
11
|
Dang Y, Li F, Hu X, Keloth VK, Zhang M, Fu S, Amith MF, Fan JW, Du J, Yu E, Liu H, Jiang X, Xu H, Tao C. Systematic design and data-driven evaluation of social determinants of health ontology (SDoHO). J Am Med Inform Assoc 2023; 30:1465-1473. [PMID: 37301740 PMCID: PMC10436148 DOI: 10.1093/jamia/ocad096] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/23/2023] [Accepted: 06/02/2023] [Indexed: 06/12/2023] Open
Abstract
OBJECTIVE Social determinants of health (SDoH) play critical roles in health outcomes and well-being. Understanding the interplay of SDoH and health outcomes is critical to reducing healthcare inequalities and transforming a "sick care" system into a "health-promoting" system. To address the SDOH terminology gap and better embed relevant elements in advanced biomedical informatics, we propose an SDoH ontology (SDoHO), which represents fundamental SDoH factors and their relationships in a standardized and measurable way. MATERIAL AND METHODS Drawing on the content of existing ontologies relevant to certain aspects of SDoH, we used a top-down approach to formally model classes, relationships, and constraints based on multiple SDoH-related resources. Expert review and coverage evaluation, using a bottom-up approach employing clinical notes data and a national survey, were performed. RESULTS We constructed the SDoHO with 708 classes, 106 object properties, and 20 data properties, with 1,561 logical axioms and 976 declaration axioms in the current version. Three experts achieved 0.967 agreement in the semantic evaluation of the ontology. A comparison between the coverage of the ontology and SDOH concepts in 2 sets of clinical notes and a national survey instrument also showed satisfactory results. DISCUSSION SDoHO could potentially play an essential role in providing a foundation for a comprehensive understanding of the associations between SDoH and health outcomes and paving the way for health equity across populations. CONCLUSION SDoHO has well-designed hierarchies, practical objective properties, and versatile functionalities, and the comprehensive semantic and coverage evaluation achieved promising performance compared to the existing ontologies relevant to SDoH.
Collapse
Affiliation(s)
- Yifang Dang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Fang Li
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Xinyue Hu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Vipina K Keloth
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA
| | - Meng Zhang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Sunyang Fu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Muhammad F Amith
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Department of Information Science, University of North Texas, Denton, Texas, USA
- Department of Biostatistics and Data Science, School of Population Health, University of Texas Medical Branch, Galveston, Texas, USA
- Department of Internal Medicine, John Sealy School of Medicine, University of Texas Medical Branch, Galveston, Texas, USA
| | - J Wilfred Fan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Jingcheng Du
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Evan Yu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hongfang Liu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, Minnesota, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Hua Xu
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
- Section of Biomedical Informatics and Data Science, Yale School of Medicine, New Haven, Connecticut, USA
| | - Cui Tao
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA
| |
Collapse
|
12
|
Hewner S, Smith E, Sullivan SS. Identifying High-Need Primary Care Patients Using Nursing Knowledge and Machine Learning Methods. Appl Clin Inform 2023; 14:408-417. [PMID: 36882152 PMCID: PMC10208721 DOI: 10.1055/a-2048-7343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 02/20/2023] [Indexed: 03/09/2023] Open
Abstract
BACKGROUND Patient cohorts generated by machine learning can be enhanced with clinical knowledge to increase translational value and provide a practical approach to patient segmentation based on a mix of medical, behavioral, and social factors. OBJECTIVES This study aimed to generate a pragmatic example of how machine learning could be used to quickly and meaningfully cohort patients using unsupervised classification methods. Additionally, to demonstrate increased translational value of machine learning models through the integration of nursing knowledge. METHODS A primary care practice dataset (N = 3,438) of high-need patients defined by practice criteria was parsed to a subset population of patients with diabetes (n = 1233). Three expert nurses selected variables for k-means cluster analysis using knowledge of critical factors for care coordination. Nursing knowledge was again applied to describe the psychosocial phenotypes in four prominent clusters, aligned with social and medical care plans. RESULTS Four distinct clusters interpreted and mapped to psychosocial need profiles, allowing for immediate translation to clinical practice through the creation of actionable social and medical care plans. (1) A large cluster of racially diverse female, non-English speakers with low medical complexity, and history of childhood illness; (2) a large cluster of English speakers with significant comorbidities (obesity and respiratory disease); (3) a small cluster of males with substance use disorder and significant comorbidities (mental health, liver and cardiovascular disease) who frequently visit the hospital; and (4) a moderate cluster of older, racially diverse patients with renal failure. CONCLUSION This manuscript provides a practical method for analysis of primary care practice data using machine learning in tandem with expert clinical knowledge.
Collapse
Affiliation(s)
- Sharon Hewner
- Department of Family, Community and Health Systems Science, School of Nursing, University at Buffalo, The State University of New York, Buffalo, New York, United States
| | - Erica Smith
- Department of Family, Community and Health Systems Science, School of Nursing, University at Buffalo, The State University of New York, Buffalo, New York, United States
| | - Suzanne S. Sullivan
- Department of Family, Community and Health Systems Science, School of Nursing, University at Buffalo, The State University of New York, Buffalo, New York, United States
| |
Collapse
|
13
|
Hobensack M, Song J, Chae S, Kennedy E, Zolnoori M, Bowles KH, McDonald MV, Evans L, Topaz M. Capturing Concerns about Patient Deterioration in Narrative Documentation in Home Healthcare. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:552-559. [PMID: 37128448 PMCID: PMC10148365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Home healthcare (HHC) agencies provide care to more than 3.4 million adults per year. There is value in studying HHC narrative notes to identify patients at risk for deterioration. This study aimed to build machine learning algorithms to identify "concerning" narrative notes of HHC patients and identify emerging themes. Six algorithms were applied to narrative notes (n = 4,000) from a HHC agency to classify notes as either "concerning" or "not concerning." Topic modeling using Latent Dirichlet Allocation bag of words was conducted to identify emerging themes from the concerning notes. Gradient Boosted Trees demonstrated the best performance with a F-score = 0.74 and AUC = 0.96. Emerging themes were related to patient-clinician communication, HHC services provided, gait challenges, mobility concerns, wounds, and caregivers. Most themes have been cited by previous literature as increasing risk for adverse events. In the future, such algorithms can support early identification of patients at risk for deterioration.
Collapse
Affiliation(s)
| | - Jiyoun Song
- Columbia University School of Nursing, New York, NY, USA
| | - Sena Chae
- University of Iowa College of Nursing, Iowa City, IA, USA
| | - Erin Kennedy
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | | | - Kathryn H Bowles
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | - Margaret V McDonald
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York, NY, USA
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| |
Collapse
|
14
|
Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]
Abstract
OBJECTIVE To identify and synthesise research on applications of natural language processing (NLP) for information extraction and retrieval from clinical notes in dentistry. MATERIALS AND METHODS A predefined search strategy was applied in EMBASE, CINAHL and Medline. Studies eligible for inclusion were those that that described, evaluated, or applied NLP to clinical notes containing either human or simulated patient information. Quality of the study design and reporting was independently assessed based on a set of questions derived from relevant tools including CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). A narrative synthesis was conducted to present the results. RESULTS Of the 17 included studies, 10 developed and evaluated NLP methods and 7 described applications of NLP-based information retrieval methods in dental records. Studies were published between 2015 and 2021, most were missing key details needed for reproducibility, and there was no consistency in design or reporting. The 10 studies developing or evaluating NLP methods used document classification or entity extraction, and 4 compared NLP methods to non-NLP methods. The quality of reporting on NLP studies in dentistry has modestly improved over time. CONCLUSIONS Study design heterogeneity and incomplete reporting of studies currently limits our ability to synthesise NLP applications in dental records. Standardisation of reporting and improved connections between NLP methods and applied NLP in dentistry may improve how we can make use of clinical notes from dentistry in population health or decision support systems. PROTOCOL REGISTRATION PROSPERO CRD42021227823.
Collapse
Affiliation(s)
- Farhana Pethani
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia.
| |
Collapse
|
15
|
Hobensack M, Song J, Scharp D, Bowles KH, Topaz M. Machine learning applied to electronic health record data in home healthcare: A scoping review. Int J Med Inform 2023; 170:104978. [PMID: 36592572 PMCID: PMC9869861 DOI: 10.1016/j.ijmedinf.2022.104978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 12/13/2022] [Accepted: 12/23/2022] [Indexed: 12/31/2022]
Abstract
OBJECTIVE Despite recent calls for home healthcare (HHC) to integrate informatics, the application of machine learning in HHC is relatively unknown. Thus, this study aimed to synthesize and appraise the literature describing the application of machine learning to predict adverse outcomes (e.g., hospitalization, mortality) using electronic health record (EHR) data in the HHC setting. Our secondary aim was to evaluate the comprehensiveness of predictors used in the machine learning algorithms guided by the Biopsychosocial Model. METHODS During March 2022 we conducted a literature search in four databases: PubMed, Embase, CINAHL, and Scopus. Inclusion criteria were 1) describing services provided in the HHC setting, 2) applying machine learning algorithms to predict adverse outcomes, defined as outcomes related to patient deterioration, 3) using EHR data and 4) focusing on the adult population. Predictors were mapped to the Biopsychosocial Model. A risk of bias analysis was conducted using the Prediction Model Risk Of Bias Assessment Tool. RESULTS The final sample included 20 studies. Eighteen studies used predictors from standardized assessments integrated in the EHR. The most common outcome of interest was hospitalization (55%), followed by mortality (25%). Psychological predictors were frequently excluded (35%). Tree based algorithms were most frequently applied (75%). Most studies demonstrated high or unclear risk of bias (75%). CONCLUSION Future studies in HHC should consider incorporating machine learning algorithms into clinical decision support systems to identify patients at risk. Based on the Biopsychosocial model, psychological and interpersonal characteristics should be used along with biological characteristics to enhance risk prediction. To facilitate the widespread adoption of machine learning, stakeholders should encourage standardization in the HHC setting.
Collapse
Affiliation(s)
| | - Jiyoun Song
- Columbia University School of Nursing, New York, NY, USA.
| | | | - Kathryn H Bowles
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, PA, USA; Center for Home Care Policy & Research, VNS Health, New York, NY, USA.
| | - Maxim Topaz
- Columbia University School of Nursing, New York, NY, USA; Center for Home Care Policy & Research, VNS Health, New York, NY, USA; Data Science Institute, Columbia University, New York, NY, USA.
| |
Collapse
|
16
|
Boch S, Sezgin E, Lin Linwood S. Ethical artificial intelligence in paediatrics. THE LANCET. CHILD & ADOLESCENT HEALTH 2022; 6:833-835. [PMID: 36084667 DOI: 10.1016/s2352-4642(22)00243-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 08/12/2022] [Accepted: 08/15/2022] [Indexed: 06/15/2023]
Affiliation(s)
- Samantha Boch
- College of Nursing, University of Cincinnati, Cincinnati, OH, USA; James M Anderson Center for Health Systems Excellence, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Emre Sezgin
- Center for Biobehavioral Health, Abigail Wexner Research Institute at Nationwide Children's Hospital, Columbus, OH 43205, USA; Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA.
| | | |
Collapse
|
17
|
Istasy P, Lee WS, Iansavichene A, Upshur R, Gyawali B, Burkell J, Sadikovic B, Lazo-Langner A, Chin-Yee B. The Impact of Artificial Intelligence on Health Equity in Oncology: Scoping Review. J Med Internet Res 2022; 24:e39748. [PMID: 36005841 PMCID: PMC9667381 DOI: 10.2196/39748] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Revised: 08/11/2022] [Accepted: 08/24/2022] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND The field of oncology is at the forefront of advances in artificial intelligence (AI) in health care, providing an opportunity to examine the early integration of these technologies in clinical research and patient care. Hope that AI will revolutionize health care delivery and improve clinical outcomes has been accompanied by concerns about the impact of these technologies on health equity. OBJECTIVE We aimed to conduct a scoping review of the literature to address the question, "What are the current and potential impacts of AI technologies on health equity in oncology?" METHODS Following PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines for scoping reviews, we systematically searched MEDLINE and Embase electronic databases from January 2000 to August 2021 for records engaging with key concepts of AI, health equity, and oncology. We included all English-language articles that engaged with the 3 key concepts. Articles were analyzed qualitatively for themes pertaining to the influence of AI on health equity in oncology. RESULTS Of the 14,011 records, 133 (0.95%) identified from our review were included. We identified 3 general themes in the literature: the use of AI to reduce health care disparities (58/133, 43.6%), concerns surrounding AI technologies and bias (16/133, 12.1%), and the use of AI to examine biological and social determinants of health (55/133, 41.4%). A total of 3% (4/133) of articles focused on many of these themes. CONCLUSIONS Our scoping review revealed 3 main themes on the impact of AI on health equity in oncology, which relate to AI's ability to help address health disparities, its potential to mitigate or exacerbate bias, and its capability to help elucidate determinants of health. Gaps in the literature included a lack of discussion of ethical challenges with the application of AI technologies in low- and middle-income countries, lack of discussion of problems of bias in AI algorithms, and a lack of justification for the use of AI technologies over traditional statistical methods to address specific research questions in oncology. Our review highlights a need to address these gaps to ensure a more equitable integration of AI in cancer research and clinical practice. The limitations of our study include its exploratory nature, its focus on oncology as opposed to all health care sectors, and its analysis of solely English-language articles.
Collapse
Affiliation(s)
- Paul Istasy
- Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
- Rotman Institute of Philosophy, Western University, London, ON, Canada
| | - Wen Shen Lee
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine, Western University, London, ON, Canada
| | | | - Ross Upshur
- Division of Clinical Public Health, Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
- Bridgepoint Collaboratory for Research and Innovation, Lunenfeld Tanenbaum Research Institute, Sinai Health System, Toronto, ON, Canada
| | - Bishal Gyawali
- Division of Cancer Care and Epidemiology, Department of Oncology, Queen's University, Kingston, ON, Canada
- Division of Cancer Care and Epidemiology, Department of Public Health Sciences, Queen's University, Kingston, ON, Canada
| | - Jacquelyn Burkell
- Faculty of Information and Media Studies, Western University, London, ON, Canada
| | - Bekim Sadikovic
- Department of Pathology & Laboratory Medicine, Schulich School of Medicine, Western University, London, ON, Canada
| | - Alejandro Lazo-Langner
- Division of Hematology, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
| | - Benjamin Chin-Yee
- Rotman Institute of Philosophy, Western University, London, ON, Canada
- Division of Hematology, Schulich School of Medicine and Dentistry, Western University, London, ON, Canada
- Division of Hematology, Department of Medicine, London Health Sciences Centre, London, ON, Canada
| |
Collapse
|
18
|
Han S, Zhang RF, Shi L, Richie R, Liu H, Tseng A, Quan W, Ryan N, Brent D, Tsui FR. Classifying Social Determinants of Health from Unstructured Electronic Health Records Using Deep Learning-based Natural Language Processing. J Biomed Inform 2022; 127:103984. [PMID: 35007754 DOI: 10.1016/j.jbi.2021.103984] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2021] [Revised: 12/28/2021] [Accepted: 12/29/2021] [Indexed: 12/23/2022]
Abstract
OBJECTIVE Social determinants of health (SDOH) are non-medical factors that can profoundly impact patient health outcomes. However, SDOH are rarely available in structured electronic health record (EHR) data such as diagnosis codes, and more commonly found in unstructured narrative clinical notes. Hence, identifying social context from unstructured EHR data has become increasingly important. Yet, previous work on using natural language processing to automate extraction of SDOH from text (a) usually focuses on an ad hoc selection of SDOH, and (b) does not use the latest advances in deep learning. Our objective was to advance automatic extraction of SDOH from clinical text by (a) systematically creating a set of SDOH based on standard biomedical and psychiatric ontologies, and (b) training state-of-the-art deep neural networks to extract mentions of these SDOH from clinical notes. DESIGN A retrospective cohort study. SETTING AND PARTICIPANTS Data were extracted from the Medical Information Mart for Intensive Care (MIMIC-III) database. The corpus comprised 3,504 social related sentences from 2,670 clinical notes. METHODS We developed a framework for automated classification of multiple SDOH categories. Our dataset comprised narrative clinical notes under the "Social Work" category in the MIMIC-III Clinical Database. Using standard terminologies, SNOMED-CT and DSM-IV, we systematically curated a set of 13 SDOH categories and created annotation guidelines for these. After manually annotating the 3,504 sentences, we developed and tested three deep neural network (DNN) architectures - convolutional neural network (CNN), long short-term memory (LSTM) network, and the Bidirectional Encoder Representations from Transformers (BERT) - for automated detection of eight SDOH categories. We also compared these DNNs to three baselines models: (1) cTAKES, as well as (2) L2-regularized logistic regression and (3) random forests on bags-of-words. Model evaluation metrics included micro- and macro- F1, and area under the receiver operating characteristic curve (AUC). RESULTS All three DNN models accurately classified all SDOH categories (minimum micro-F1 =.632, minimum macro-AUC=.854). Compared to the CNN and LSTM, BERT performed best in most key metrics (micro-F1 = 0.690, macro-AUC=0.907). The BERT model most effectively identified the "occupational" category (F1=.774, AUC=.965) and least effectively identified the "non-SDOH" category (F=.491, AUC=.788). BERT outperformed cTAKES in distinguishing social vs non-social sentences (BERT F1 = .87 vs. cTAKES F1=.06), and outperformed logistic regression (micro-F1=0.649, macro-AUC=0.696) and random forest (micro-F1=0.502, macro-AUC=0.523) trained on bag-of-words. CONCLUSIONS Our study framework with DNN models demonstrated improved performance for efficiently identifying a systematic range of SDOH categories from clinical notes in the EHR. Improved identification of patient SDOH may further improve healthcare outcomes.
Collapse
Affiliation(s)
- Sifei Han
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Robert F Zhang
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA; Perelman School of Medicine, University of Pennsylvania, PA, USA
| | - Lingyun Shi
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Russell Richie
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Haixia Liu
- Central South University, Changsha, Hunan, CN
| | | | - Wei Quan
- New York University Abu Dhabi, Abu Dhabi, AE
| | - Neal Ryan
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - David Brent
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - Fuchiang R Tsui
- Tsui Laboratory, Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA; Perelman School of Medicine, University of Pennsylvania, PA, USA.
| |
Collapse
|