1
|
Allen KS, Hood DR, Cummins J, Kasturi S, Mendonca EA, Vest JR. Natural language processing-driven state machines to extract social factors from unstructured clinical documentation. JAMIA Open 2023; 6:ooad024. [PMID: 37081945 PMCID: PMC10112959 DOI: 10.1093/jamiaopen/ooad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2022] [Revised: 03/08/2023] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Objective This study sought to create natural language processing algorithms to extract the presence of social factors from clinical text in 3 areas: (1) housing, (2) financial, and (3) unemployment. For generalizability, finalized models were validated on data from a separate health system for generalizability. Materials and Methods Notes from 2 healthcare systems, representing a variety of note types, were utilized. To train models, the study utilized n-grams to identify keywords and implemented natural language processing (NLP) state machines across all note types. Manual review was conducted to determine performance. Sampling was based on a set percentage of notes, based on the prevalence of social need. Models were optimized over multiple training and evaluation cycles. Performance metrics were calculated using positive predictive value (PPV), negative predictive value, sensitivity, and specificity. Results PPV for housing rose from 0.71 to 0.95 over 3 training runs. PPV for financial rose from 0.83 to 0.89 over 2 training iterations, while PPV for unemployment rose from 0.78 to 0.88 over 3 iterations. The test data resulted in PPVs of 0.94, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Final specificity scores were 0.95, 0.97, and 0.95 for housing, financial, and unemployment, respectively. Discussion We developed 3 rule-based NLP algorithms, trained across health systems. While this is a less sophisticated approach, the algorithms demonstrated a high degree of generalizability, maintaining >0.85 across all predictive performance metrics. Conclusion The rule-based NLP algorithms demonstrated consistent performance in identifying 3 social factors within clinical text. These methods may be a part of a strategy to measure social factors within an institution.
Collapse
Affiliation(s)
- Katie S Allen
- Corresponding Author: Katie S. Allen, BS, Center for Biomedical Informatics, Regenstrief Institute, Inc., 1101 W. 10th Street, Indianapolis, IN 46202, USA;
| | - Dan R Hood
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Jonathan Cummins
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Suranga Kasturi
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
| | - Eneida A Mendonca
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Joshua R Vest
- Center for Biomedical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA
- Department of Health Policy and Management, Richard M. Fairbanks School of Public Health, IUPUI, Indianapolis, Indiana, USA
| |
Collapse
|
2
|
Masukawa K, Aoyama M, Yokota S, Nakamura J, Ishida R, Nakayama M, Miyashita M. Machine learning models to detect social distress, spiritual pain, and severe physical psychological symptoms in terminally ill patients with cancer from unstructured text data in electronic medical records. Palliat Med 2022; 36:1207-1216. [PMID: 35773973 DOI: 10.1177/02692163221105595] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
BACKGROUND Few studies have developed automatic systems for identifying social distress, spiritual pain, and severe physical and phycological symptoms from text data in electronic medical records. AIM To develop models to detect social distress, spiritual pain, and severe physical and psychological symptoms in terminally ill patients with cancer from unstructured text data contained in electronic medical records. DESIGN A retrospective study of 1,554,736 narrative clinical records was analyzed 1 month before patients died. Supervised machine learning models were trained to detect comprehensive symptoms, and the performance of the models was tested using the area under the receiver operating characteristic curve (AUROC) and precision recall curve (AUPRC). SETTING/PARTICIPANTS A total of 808 patients was included in the study using records obtained from a university hospital in Japan between January 1, 2018 and December 31, 2019. As training data, we used medical records labeled for detecting social distress (n = 10,000) and spiritual pain (n = 10,000), and records that could be combined with the Support Team Assessment Schedule (based on date) for detecting severe physical/psychological symptoms (n = 5409). RESULTS Machine learning models for detecting social distress had AUROC and AUPRC values of 0.98 and 0.61, respectively; values for spiritual pain, were 0.90 and 0.58, respectively. The machine learning models accurately identified severe symptoms (pain, dyspnea, nausea, insomnia, and anxiety) with a high level of discrimination (AUROC > 0.8). CONCLUSION The machine learning models could detect social distress, spiritual pain, and severe symptoms in terminally ill patients with cancer from text data contained in electronic medical records.
Collapse
Affiliation(s)
- Kento Masukawa
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Maho Aoyama
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Shinichiroh Yokota
- Faculty of Medicine, The University of Tokyo, Hongo, Tokyo, Japan.,Department of Healthcare Information Management, The University of Tokyo Hospital, Hongo, Tokyo, Japan
| | - Jyunya Nakamura
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Ryoka Ishida
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Masaharu Nakayama
- Department of Medical Informatics, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Mitsunori Miyashita
- Department of Palliative Nursing, Health Sciences, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|
3
|
Patra BG, Sharma MM, Vekaria V, Adekkanattu P, Patterson OV, Glicksberg B, Lepow LA, Ryu E, Biernacka JM, Furmanchuk A, George TJ, Hogan W, Wu Y, Yang X, Bian J, Weissman M, Wickramaratne P, Mann JJ, Olfson M, Campion TR, Weiner M, Pathak J. Extracting social determinants of health from electronic health records using natural language processing: a systematic review. J Am Med Inform Assoc 2021; 28:2716-2727. [PMID: 34613399 PMCID: PMC8633615 DOI: 10.1093/jamia/ocab170] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2021] [Revised: 07/09/2021] [Accepted: 08/04/2021] [Indexed: 11/27/2022] Open
Abstract
OBJECTIVE Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs. MATERIALS AND METHODS A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review. RESULTS Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9). CONCLUSION NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
Collapse
Affiliation(s)
- Braja G Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Mohit M Sharma
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Veer Vekaria
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Olga V Patterson
- Department of Internal Medicine, Division of Epidemiology, University of Utah, Salt Lake City, Utah, USA
- US Department of Veterans Affairs, Salt Lake City, Utah, USA
| | | | - Lauren A Lepow
- Icahn School of Medicine at Mount Sinai, New York, New York, USA
| | - Euijung Ryu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | - Joanna M Biernacka
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, Minnesota, USA
| | | | - Thomas J George
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - William Hogan
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, Florida, USA, and
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, Florida, USA
| | - Myrna Weissman
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Priya Wickramaratne
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - J John Mann
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Mark Olfson
- Vagelos College of Physicians and Surgeons, Columbia University, New York, New York, USA
| | - Thomas R Campion
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
- Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA
| | - Mark Weiner
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA
| |
Collapse
|
4
|
Skaljic M, Patel IH, Pellegrini AM, Castro VM, Perlis RH, Gordon DD. Prevalence of Financial Considerations Documented in Primary Care Encounters as Identified by Natural Language Processing Methods. JAMA Netw Open 2019; 2:e1910399. [PMID: 31469397 PMCID: PMC6724154 DOI: 10.1001/jamanetworkopen.2019.10399] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
IMPORTANCE Quantifying patient-physician cost conversations is challenging but important as out-of-pocket spending by US patients increases and patients are increasingly interested in discussing costs with their physicians. OBJECTIVE To characterize the prevalence of financial considerations documented in narrative clinical records of primary care encounters and their association with patient-level features. DESIGN, SETTING, AND PARTICIPANTS This cohort study applied natural language processing to narrative clinical notes obtained from electronic health records for adult primary care visits. Participants included patients aged 18 years and older with at least 1 primary care visit for an annual preventive examination at outpatient clinics at a US academic health system between January 2, 2008, and July 30, 2013. Data were analyzed in March 2019. MAIN OUTCOMES AND MEASURES Presence of financial content documented in narrative clinical notes. RESULTS The data set included 222 457 primary care visits for 46 244 individuals aged 18 years and older; 30 556 patients (60.1%) were female, 27 869 patients (60.3%) were white, and the mean (SD) age was 51.3 (17.7) years. In total, 6058 patients (13.1%) had at least 1 narrative clinical note indicating a financial conversation with their physician. In fully adjusted regression models, the odds of having a financial note were greater among patients with Medicare (odds ratio [OR], 1.27; 95% CI, 1.15-1.41; P < .001) or Medicaid (OR, 1.43; 95% CI, 1.25-1.64; P < .001) insurance, those residing in zip codes with lower median income (OR, 0.97; 95% CI, 0.96-0.98; P < .001), black individuals (OR, 1.40; 95% CI, 1.28-1.53; P < .001), Hispanic individuals (OR, 1.10; 95% CI, 1.01-1.20; P = .03), and those who were unmarried (OR, 1.23; 95% CI, 1.15-1.33; P < .001). CONCLUSIONS AND RELEVANCE Cost considerations were more likely to be noted in annual preventive examinations than previously observed in intensive care unit admissions, but still infrequently. Associations with particular patient subgroups may indicate differential financial burden or willingness to discuss financial concerns.
Collapse
Affiliation(s)
- Meliha Skaljic
- Perelman School of Medicine, University of Pennsylvania, Philadelphia
| | - Ihsaan H. Patel
- Mossavar-Rahmani Center for Business and Government, Harvard Kennedy School, Cambridge, Massachusetts
| | - Amelia M. Pellegrini
- Center for Quantitative Health, Division of Clinical Research, Massachusetts General Hospital, Harvard Medical School, Boston
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston
| | - Victor M. Castro
- Center for Quantitative Health, Division of Clinical Research, Massachusetts General Hospital, Harvard Medical School, Boston
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston
| | - Roy H. Perlis
- Center for Quantitative Health, Division of Clinical Research, Massachusetts General Hospital, Harvard Medical School, Boston
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston
| | - Deborah D. Gordon
- Mossavar-Rahmani Center for Business and Government, Harvard Kennedy School, Cambridge, Massachusetts
| |
Collapse
|
5
|
Sherrod BA, Gamboa NT, Wilkerson C, Wilde H, Azab MA, Karsy M, Jensen RL, Menacho ST. Effect of patient age on glioblastoma perioperative treatment costs: a value driven outcome database analysis. J Neurooncol 2019; 143:465-473. [DOI: 10.1007/s11060-019-03178-z] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 04/25/2019] [Indexed: 12/14/2022]
|