1
|
Cardamone NC, Olfson M, Schmutte T, Ungar L, Liu T, Cullen SW, Williams NJ, Marcus SC. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models: Large Language Model Evaluation Study. JMIR Med Inform 2025; 13:e65454. [PMID: 39864953 DOI: 10.2196/65454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 11/25/2024] [Accepted: 11/30/2024] [Indexed: 01/28/2025] Open
Abstract
Background Prediction models have demonstrated a range of applications across medicine, including using electronic health record (EHR) data to identify hospital readmission and mortality risk. Large language models (LLMs) can transform unstructured EHR text into structured features, which can then be integrated into statistical prediction models, ensuring that the results are both clinically meaningful and interpretable. Objective This study aims to compare the classification decisions made by clinical experts with those generated by a state-of-the-art LLM, using terms extracted from a large EHR data set of individuals with mental health disorders seen in emergency departments (EDs). Methods Using a dataset from the EHR systems of more than 50 health care provider organizations in the United States from 2016 to 2021, we extracted all clinical terms that appeared in at least 1000 records of individuals admitted to the ED for a mental health-related problem from a source population of over 6 million ED episodes. Two experienced mental health clinicians (one medically trained psychiatrist and one clinical psychologist) reached consensus on the classification of EHR terms and diagnostic codes into categories. We evaluated an LLM's agreement with clinical judgment across three classification tasks as follows: (1) classify terms into "mental health" or "physical health", (2) classify mental health terms into 1 of 42 prespecified categories, and (3) classify physical health terms into 1 of 19 prespecified broad categories. Results There was high agreement between the LLM and clinical experts when categorizing 4553 terms as "mental health" or "physical health" (κ=0.77, 95% CI 0.75-0.80). However, there was still considerable variability in LLM-clinician agreement on the classification of mental health terms (κ=0.62, 95% CI 0.59-0.66) and physical health terms (κ=0.69, 95% CI 0.67-0.70). Conclusions The LLM displayed high agreement with clinical experts when classifying EHR terms into certain mental health or physical health term categories. However, agreement with clinical experts varied considerably within both sets of mental and physical health term categories. Importantly, the use of LLMs presents an alternative to manual human coding, presenting great potential to create interpretable features for prediction models.
Collapse
Affiliation(s)
- Nicholas C Cardamone
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Mark Olfson
- Department of Psychiatry, the New York State Psychiatric Institute, New York, NY, United States
| | - Timothy Schmutte
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | - Lyle Ungar
- Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Tony Liu
- Computer and Information Science, University of Pennsylvania, Philadelphia, PA, United States
| | - Sara W Cullen
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Steven C Marcus
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Trujeque J, Dudley RA, Mesfin N, Ingraham NE, Ortiz I, Bangerter A, Chakraborty A, Schutte D, Yeung J, Liu Y, Woodward-Abel A, Bromley E, Zhang R, Brenner LA, Simonetti JA. Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records. J Am Med Inform Assoc 2025; 32:113-118. [PMID: 39530748 DOI: 10.1093/jamia/ocae169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 06/03/2024] [Accepted: 06/25/2024] [Indexed: 11/16/2024] Open
Abstract
OBJECTIVE Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records. MATERIALS AND METHODS We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as "definite access", "definitely no access", or "other". RESULTS Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as "other". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively. DISCUSSION AND CONCLUSION Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.
Collapse
Affiliation(s)
- Joshua Trujeque
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - R Adams Dudley
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States
- Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
- School of Public Health, University of Minnesota, Minneapolis, MN 55455, United States
| | - Nathan Mesfin
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Nicholas E Ingraham
- Division of Pulmonary, Allergy, Critical Care and Sleep Medicine, Department of Medicine, University of Minnesota Medical School, Minneapolis, MN 55455, United States
| | - Isai Ortiz
- Medical School, University of Minnesota, Minneapolis, MN 55455, United States
| | - Ann Bangerter
- Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States
| | - Anjan Chakraborty
- Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States
| | - Dalton Schutte
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Jeremy Yeung
- Institute for Health Informatics, University of Minnesota, Minneapolis, MN 55455, United States
| | - Ying Liu
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Alicia Woodward-Abel
- Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States
| | - Emma Bromley
- Center for Care Delivery and Outcomes Research (CCDOR), Veterans Affairs (VA) Minneapolis Healthcare System, Minneapolis, MN 55417, United States
| | - Rui Zhang
- Department of Surgery, University of Minnesota, Minneapolis, MN 55455, United States
| | - Lisa A Brenner
- Rocky Mountain Mental Illness Research, Education and Clinical Center for Suicide Prevention, Rocky Mountain Regional VAMC (RMR VAMC), Aurora, CO 80045, United States
- Department of Physical Medicine and Rehabilitation, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States
- Department of Neurology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States
- Department of Psychiatry, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States
| | - Joseph A Simonetti
- Rocky Mountain Mental Illness Research, Education and Clinical Center for Suicide Prevention, Rocky Mountain Regional VAMC (RMR VAMC), Aurora, CO 80045, United States
- Firearm Injury Prevention Initiative, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, United States
| |
Collapse
|
3
|
Kuhn D, Harrison NE, Musey PI, Crandall DJ, Pang PS, Welch JL, Harle CA. Preliminary findings regarding the association between patient demographics and ED experience scores across a regional health system: A cross sectional study using natural language processing of patient comments. Int J Med Inform 2024; 195:105748. [PMID: 39671851 DOI: 10.1016/j.ijmedinf.2024.105748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 11/02/2024] [Accepted: 11/30/2024] [Indexed: 12/15/2024]
Abstract
OBJECTIVE Existing literature shows associations between patient demographics and reported experiences of care, but this relationship is poorly understood. Our objective was to use natural language processing of patient comments to gain insight into associations between patient demographics and experiences of care. METHODS This is a cross-sectional study of 14,848 unique emergency department (ED) patient visits from 1/1/2020 to 12/31/2020. Patients discharged from one of 16 ED sites in a regional health system who filled out a patient experience survey with comments were included. This study had two outcome variables: (1) positive vs. non-positive (negative/neutral) comment sentiment, and (2) promoter vs. non-promoter status (based on NRCHealth's Net Promoter Score; likelihood to recommend of 9 or 10 are considered "promoters", while scores of 8 or below are "non-promoters"). We used natural language processing to sort patient comments into topics and sentiments. Logistic regression with mediation analysis was used to estimate the associations between patient demographics and the following: (1) comments about compassion vs. other topics, (2) positive comments, and (3) patient experience, defined as likelihood to recommend. RESULTS Comments about care and compassion (51 % of total comments) had highly positive sentiment (97 %), compared to mixed sentiment for other topics. Older, male, and Asian patients were more likely to comment on compassion and most likely to make positive comments. Our mediation analysis suggests that the demographic association with positive patient comments and net promoter scores was mediated by their focus on care and compassion as a primary comment theme for their visit. Notably, the overall percentage of patients providing comments was only 1.8 %, raising concerns about whether data currently used for hospital and physician feedback has adequate validity to yield meaningful insights. CONCLUSIONS The increased likelihood of specific patient sub-groups to comment on compassionate care may explain previously reported differences in experience by patient demographics.
Collapse
Affiliation(s)
- Diane Kuhn
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA.
| | - Nicholas E Harrison
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA
| | - Paul I Musey
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA
| | - David J Crandall
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, 1015 E. 11(th) St, Bloomington, IN 47408, USA
| | - Peter S Pang
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA
| | - Julie L Welch
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA
| | - Christopher A Harle
- Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA; Department of Health Policy and Management, Indiana University Richard M Fairbanks School of Public Health, 1050 Wishard Blvd, Indianapolis, IN 46202, USA
| |
Collapse
|
4
|
Kim M, Kim Y, Choi M. Intensive care unit nurses' experiences of nursing concerns, activities, and documentation on patient deterioration: A focus-group study. Aust Crit Care 2024; 38:101126. [PMID: 39550338 DOI: 10.1016/j.aucc.2024.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Revised: 09/09/2024] [Accepted: 09/25/2024] [Indexed: 11/18/2024] Open
Abstract
BACKGROUND Although prognosis prediction models using nursing documentation have good predictive performance, the experiences of intensive care unit nurses related to nursing activities and documentation when a patient's condition deteriorates are yet to be explored. OBJECTIVE The aim of this study was to explore nurses' experiences of nursing activities and documentation in intensive care units when a patient's condition deteriorates. METHODS This was a descriptive qualitative study using focus-group interviews with intensive care unit nurses in tertiary or university-affiliated hospitals. In total, 19 registered nurses with at least 1 year of clinical experience in the adult intensive care unit were recruited using a purposive sampling method. Five focus-group interviews were conducted, and the data were analysed through a qualitative content analysis. RESULTS Intensive care unit nurses' experiences with patient deterioration were classified into four main categories-perceived patient deterioration; endeavours to verify nurses' concerns; nursing activities to improve a patient's condition; and optimising documentation practices-which comprised 12 subcategories. Intensive care unit nurses recognise patient deterioration through nursing activities and documentation, and the two processes influence each other. However, nursing activities related to nurses' concerns were mainly handed over verbally rather than documented due to the inflexibility of the available standardised forms and the potential uncertainty of those concerns. CONCLUSIONS The findings reveal how intensive care unit nurses perceive, intervene, and document the condition of a deteriorating patient. Nurses' concerns may be the first sign of a patient's deteriorating condition and are therefore crucial for minimising patient risk. Therefore, efforts to systematically document nurses' concerns may contribute to improving patient outcomes.
Collapse
Affiliation(s)
- Mihui Kim
- College of Nursing and Brain Korea 21 FOUR Project, Yonsei University, Seoul, Republic of Korea; Department of Nursing Science, Jeonju University, Jeonju, Republic of Korea
| | - Yesol Kim
- College of Nursing and Brain Korea 21 FOUR Project, Yonsei University, Seoul, Republic of Korea; College of Nursing, Gyeongsang National University, Jinju, Republic of Korea
| | - Mona Choi
- College of Nursing and Mo-Im Kim Nursing Research Institute, Yonsei University, Seoul, Republic of Korea.
| |
Collapse
|
5
|
Maghsoudi A, Azarian M, Sharafkhaneh A, Jones MB, Nozari H, Kryger M, Ramezani A, Razjouyan J. Age modulates the predictive value of self-reported sleepiness for all-cause mortality risk: insights from a comprehensive national database of veterans. J Clin Sleep Med 2024; 20:1785-1792. [PMID: 38935061 PMCID: PMC11530978 DOI: 10.5664/jcsm.11254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Revised: 06/14/2024] [Accepted: 06/14/2024] [Indexed: 06/28/2024]
Abstract
STUDY OBJECTIVES Excessive daytime sleepiness is prevalent and overwhelmingly stems from disturbed sleep. We hypothesized that age modulates the association between excessive daytime sleepiness and increased all-cause mortality. METHODS We utilized the Veterans' Health Administration data from 1999-2022. We enrolled participants with sleep related International Classification of Diseases 9/10 codes or sleep services. A natural language processing pipeline was developed and validated to extract the Epworth Sleepiness Scale (ESS) as a self-reported tool to measure excessive daytime sleepiness from physician progress notes. The natural language processing's accuracy was assessed through manual annotation of 470 notes. Participants were categorized into normal-ESS (ESS 0-10) and high-ESS (ESS 11-24). We created 3 age groups: < 50 years, 50 to < 65 years, and ≥ 65 years. The adjusted odds ratio of mortality was calculated for age, body mass index, sex, race, ethnicity, and the Charlson Comorbidity Index, using normal-ESS as the reference. Subsequently, we conducted age stratified analysis. RESULTS The first ESS records were extracted from 423,087 veterans with a mean age of 54.8 (± 14.6), mean body mass index of 32.6 (± 6.2), and 90.5% male. The adjusted odds ratio across all ages was 17% higher (1.15, 1.19) in the high-ESS category. The adjusted odds ratio s only became statistically significant for individuals aged ≥ 50 years in the high-ESS compared to the normal-ESS category (< 50 years: 1.02 [0.96, 1.08], 50 to < 65 years 1.13[1.10, 1.16]; ≥ 65 years: 1.25 [1.21, 1.28]). CONCLUSIONS High-ESS predicted increased mortality only in participants aged 50 and older. Further research is required to identify this differential behavior in relation to age. CITATION Maghsoudi A, Azarian M, Sharafkhaneh A, et al. Age modulates the predictive value of self-reported sleepiness for all-cause mortality risk: insights from a comprehensive national database of veterans. J Clin Sleep Med. 2024;20(11):1785-1792.
Collapse
Affiliation(s)
- Arash Maghsoudi
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Mehrnaz Azarian
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Amir Sharafkhaneh
- Department of Medicine, Baylor College of Medicine, Houston, Texas
- Pulmonary, Critical Care and Sleep Medicine Section, Medical Care Line, Michael E. DeBakey VA Medical Center, Houston, Texas
| | - Melissa B. Jones
- Mental Health Care Line, Michael E. DeBakey VA Medical Center, Houston, Texas
- Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, Texas
| | - Hoormehr Nozari
- Children Growth Research Center, Research Institute for Prevention of Non-Communicable Diseases, Qazvin University of Medical Sciences, Qazvin, Iran
| | - Meir Kryger
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, Yale University, New Haven, Connecticut
| | - Amin Ramezani
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas
- Department of Medicine, Baylor College of Medicine, Houston, Texas
| | - Javad Razjouyan
- Center for Innovations in Quality, Effectiveness, and Safety, Michael E. DeBakey VA Medical Center, Houston, Texas
- Department of Medicine, Baylor College of Medicine, Houston, Texas
- Big Data Scientist Training Enhancement Program (BD-STEP), VA Office of Research and Development, Washington, DC
| |
Collapse
|
6
|
Gan S, Kim C, Chang J, Lee DY, Park RW. Enhancing readmission prediction models by integrating insights from home healthcare notes: Retrospective cohort study. Int J Nurs Stud 2024; 158:104850. [PMID: 39024965 DOI: 10.1016/j.ijnurstu.2024.104850] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/20/2024]
Abstract
BACKGROUND Hospital readmission is an important indicator of inpatient care quality and a significant driver of increasing medical costs. Therefore, it is important to explore the effects of postdischarge information, particularly from home healthcare notes, on enhancing readmission prediction models. Despite the use of Natural Language Processing (NLP) and machine learning in prediction model development, current studies often overlook insights from home healthcare notes. OBJECTIVE This study aimed to develop prediction models for 30-day readmissions using home healthcare notes and structured data. In addition, it explored the development of 14- and 180-day prediction models using variables in the 30-day model. DESIGN A retrospective observational cohort study. SETTING(S) This study was conducted at Ajou University School of Medicine in South Korea. PARTICIPANTS Data from electronic health records, encompassing demographic characteristics of 1819 participants, along with information on conditions, drug, and home healthcare, were utilized. METHODS Two distinct models were developed for each prediction window (30-, 14-, 180-day): the traditional model, which utilized structured variables alone, and the common data model (CDM)-NLP model, which incorporated structured and topic variables extracted from home healthcare notes. BERTopic facilitated topic generation and risk probability, representing the likelihood of documents being assigned to specific topics. Feature selection involved experimenting with various algorithms. The best-performing algorithm, determined using the area under the receiver operating characteristic curve (AUROC), was used for model development. Model performance was assessed using various learning metrics including AUROC. RESULTS Among 1819 patients, 251 (13.80 %) experienced 30-day readmission. The least absolute shrinkage and selection operator was used for feature extraction and model development. The 15 structured features were used in the traditional model. Moreover, five additional topic variables from the home healthcare notes were applied in the CDM-NLP model. The AUROC of the traditional model was 0.739 (95 % CI: 0.672-0.807). The AUROC of the CDM-NLP model was high at 0.824 (95 % CI: 0.768-0.880), which indicated an outstanding performance. The topics in the CDM-NLP model included emotional distress, daily living functions, nutrition, postoperative status, and cardiorespiratory issues. In extended prediction model development for 14- and 180-day readmissions, the CDM-NLP consistently outperformed the traditional model. CONCLUSIONS This study developed effective prediction models using both structured and unstructured data, thereby emphasizing the significance of postdischarge information from home healthcare notes in readmission prediction.
Collapse
Affiliation(s)
- Sujin Gan
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea.
| | - Chungsoo Kim
- Section of Cardiovascular Medicine, Department of Internal Medicine, Yale University School of Medicine, New Haven, CT, USA
| | - Junhyuck Chang
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea
| | - Rae Woong Park
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, Gyeonggi-do, Republic of Korea; Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
7
|
Pilowsky JK, Choi JW, Saavedra A, Daher M, Nguyen N, Williams L, Jones SL. Natural language processing in the intensive care unit: A scoping review. CRIT CARE RESUSC 2024; 26:210-216. [PMID: 39355491 PMCID: PMC11440058 DOI: 10.1016/j.ccrj.2024.06.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 06/30/2024] [Accepted: 06/30/2024] [Indexed: 10/03/2024]
Abstract
Objectives Natural language processing (NLP) is a branch of artificial intelligence focused on enabling computers to interpret and analyse text-based data. The intensive care specialty is known to generate large volumes of data, including free-text, however, NLP applications are not commonly used either in critical care clinical research or quality improvement projects. This review aims to provide an overview of how NLP has been used in the intensive care specialty and promote an understanding of NLP's potential future clinical applications. Design Scoping review. Data sources A systematic search was developed with an information specialist and deployed on the PubMed electronic journal database. Results were restricted to the last 10 years to ensure currency. Review methods Screening and data extraction were undertaken by two independent reviewers, with any disagreements resolved by a third. Given the heterogeneity of the eligible articles, a narrative synthesis was conducted. Results Eighty-seven eligible articles were included in the review. The most common type (n = 24) were studies that used NLP-derived features to predict clinical outcomes, most commonly mortality (n = 16). Next were articles that used NLP to identify a specific concept (n = 23), including sepsis, family visitation and mental health disorders. Most studies only described the development and internal validation of their algorithm (n = 79), and only one reported the implementation of an algorithm in a clinical setting. Conclusions Natural language processing has been used for a variety of purposes in the ICU context. Increasing awareness of these techniques amongst clinicians may lead to more clinically relevant algorithms being developed and implemented.
Collapse
Affiliation(s)
- Julia K Pilowsky
- Agency for Clinical Innovation, NSW Health, Australia
- University of Sydney, Australia
- Royal North Shore Hospital, NSW, Australia
| | - Jae-Won Choi
- Agency for Clinical Innovation, NSW Health, Australia
- eHealth, NSW Health, Australia
| | - Aldo Saavedra
- Agency for Clinical Innovation, NSW Health, Australia
- University of Sydney, Australia
| | - Maysaa Daher
- Agency for Clinical Innovation, NSW Health, Australia
| | - Nhi Nguyen
- Agency for Clinical Innovation, NSW Health, Australia
- University of Sydney, Australia
- Nepean Hospital, NSW, Australia
| | | | | |
Collapse
|
8
|
Huang YZ, Chen YM, Lin CC, Chiu HY, Chang YC. A nursing note-aware deep neural network for predicting mortality risk after hospital discharge. Int J Nurs Stud 2024; 156:104797. [PMID: 38788263 DOI: 10.1016/j.ijnurstu.2024.104797] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 04/08/2024] [Accepted: 05/03/2024] [Indexed: 05/26/2024]
Abstract
BACKGROUND ICU readmissions and post-discharge mortality pose significant challenges. Previous studies used EHRs and machine learning models, but mostly focused on structured data. Nursing records contain crucial unstructured information, but their utilization is challenging. Natural language processing (NLP) can extract structured features from clinical text. This study proposes the Crucial Nursing Description Extractor (CNDE) to predict post-ICU discharge mortality rates and identify high-risk patients for unplanned readmission by analyzing electronic nursing records. OBJECTIVE Developed a deep neural network (NurnaNet) with the ability to perceive nursing records, combined with a bio-clinical medicine pre-trained language model (BioClinicalBERT) to analyze the electronic health records (EHRs) in the MIMIC III dataset to predict the death of patients within six month and two year risk. DESIGN A cohort and system development design was used. SETTING(S) Based on data extracted from MIMIC-III, a database of critically ill in the US between 2001 and 2012, the results were analyzed. PARTICIPANTS We calculated patients' age using admission time and date of birth information from the MIMIC dataset. Patients under 18 or over 89 years old, or who died in the hospital, were excluded. We analyzed 16,973 nursing records from patients' ICU stays. METHODS We have developed a technology called the Crucial Nursing Description Extractor (CNDE), which extracts key content from text. We use the logarithmic likelihood ratio to extract keywords and combine BioClinicalBERT. We predict the survival of discharged patients after six months and two years and evaluate the performance of the model using precision, recall, the F1-score, the receiver operating characteristic curve (ROC curve), the area under the curve (AUC), and the precision-recall curve (PR curve). RESULTS The research findings indicate that NurnaNet achieved good F1-scores (0.67030, 0.70874) within six months and two years. Compared to using BioClinicalBERT alone, there was an improvement in performance of 2.05 % and 1.08 % for predictions within six months and two years, respectively. CONCLUSIONS CNDE can effectively reduce long-form records and extract key content. NurnaNet has a good F1-score in analyzing the data of nursing records, which helps to identify the risk of death of patients after leaving the hospital and adjust the regular follow-up and treatment plan of relevant medical care as soon as possible.
Collapse
Affiliation(s)
- Yong-Zhen Huang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan; Department of Nursing, National Taiwan University Cancer Center, Taipei, Taiwan.
| | - Yan-Ming Chen
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
| | - Chih-Cheng Lin
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan.
| | - Hsiao-Yean Chiu
- School of Nursing, College of Nursing, Taipei Medical University, Taipei, Taiwan; Department of Nursing, Taipei Medical University Hospital, Taipei, Taiwan; Research Center of Sleep Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan.
| | - Yung-Chun Chang
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan; Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan.
| |
Collapse
|
9
|
Iscoe M, Socrates V, Gilson A, Chi L, Li H, Huang T, Kearns T, Perkins R, Khandjian L, Taylor RA. Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models. Acad Emerg Med 2024; 31:599-610. [PMID: 38567658 DOI: 10.1111/acem.14883] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Natural language processing (NLP) tools including recently developed large language models (LLMs) have myriad potential applications in medical care and research, including the efficient labeling and classification of unstructured text such as electronic health record (EHR) notes. This opens the door to large-scale projects that rely on variables that are not typically recorded in a structured form, such as patient signs and symptoms. OBJECTIVES This study is designed to acquaint the emergency medicine research community with the foundational elements of NLP, highlighting essential terminology, annotation methodologies, and the intricacies involved in training and evaluating NLP models. Symptom characterization is critical to urinary tract infection (UTI) diagnosis, but identification of symptoms from the EHR has historically been challenging, limiting large-scale research, public health surveillance, and EHR-based clinical decision support. We therefore developed and compared two NLP models to identify UTI symptoms from unstructured emergency department (ED) notes. METHODS The study population consisted of patients aged ≥ 18 who presented to an ED in a northeastern U.S. health system between June 2013 and August 2021 and had a urinalysis performed. We annotated a random subset of 1250 ED clinician notes from these visits for a list of 17 UTI symptoms. We then developed two task-specific LLMs to perform the task of named entity recognition: a convolutional neural network-based model (SpaCy) and a transformer-based model designed to process longer documents (Clinical Longformer). Models were trained on 1000 notes and tested on a holdout set of 250 notes. We compared model performance (precision, recall, F1 measure) at identifying the presence or absence of UTI symptoms at the note level. RESULTS A total of 8135 entities were identified in 1250 notes; 83.6% of notes included at least one entity. Overall F1 measure for note-level symptom identification weighted by entity frequency was 0.84 for the SpaCy model and 0.88 for the Longformer model. F1 measure for identifying presence or absence of any UTI symptom in a clinical note was 0.96 (232/250 correctly classified) for the SpaCy model and 0.98 (240/250 correctly classified) for the Longformer model. CONCLUSIONS The study demonstrated the utility of LLMs and transformer-based models in particular for extracting UTI symptoms from unstructured ED clinical notes; models were highly accurate for detecting the presence or absence of any UTI symptom on the note level, with variable performance for individual symptoms.
Collapse
Affiliation(s)
- Mark Iscoe
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| | - Vimig Socrates
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Aidan Gilson
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Ling Chi
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Huan Li
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, USA
| | - Thomas Huang
- Yale School of Medicine, New Haven, Connecticut, USA
| | - Thomas Kearns
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Rachelle Perkins
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Laura Khandjian
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - R Andrew Taylor
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
- Section for Biomedical Informatics and Data Science, Yale University School of Medicine, New Haven, Connecticut, USA
| |
Collapse
|
10
|
Niu Y, Li Z, Chen Z, Huang W, Tan J, Tian F, Yang T, Fan Y, Wei J, Mu J. Efficient screening of pharmacological broad-spectrum anti-cancer peptides utilizing advanced bidirectional Encoder representation from Transformers strategy. Heliyon 2024; 10:e30373. [PMID: 38765108 PMCID: PMC11101728 DOI: 10.1016/j.heliyon.2024.e30373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/24/2024] [Accepted: 04/24/2024] [Indexed: 05/21/2024] Open
Abstract
In the vanguard of oncological advancement, this investigation delineates the integration of deep learning paradigms to refine the screening process for Anticancer Peptides (ACPs), epitomizing a new frontier in broad-spectrum oncolytic therapeutics renowned for their targeted antitumor efficacy and specificity. Conventional methodologies for ACP identification are marred by prohibitive time and financial exigencies, representing a formidable impediment to the evolution of precision oncology. In response, our research heralds the development of a groundbreaking screening apparatus that marries Natural Language Processing (NLP) with the Pseudo Amino Acid Composition (PseAAC) technique, thereby inaugurating a comprehensive ACP compendium for the extraction of quintessential primary and secondary structural attributes. This innovative methodological approach is augmented by an optimized BERT model, meticulously calibrated for ACP detection, which conspicuously surpasses existing BERT variants and traditional machine learning algorithms in both accuracy and selectivity. Subjected to rigorous validation via five-fold cross-validation and external assessment, our model exhibited exemplary performance, boasting an average Area Under the Curve (AUC) of 0.9726 and an F1 score of 0.9385, with external validation further affirming its prowess (AUC of 0.9848 and F1 of 0.9371). These findings vividly underscore the method's unparalleled efficacy and prospective utility in the precise identification and prognostication of ACPs, significantly ameliorating the financial and temporal burdens traditionally associated with ACP research and development. Ergo, this pioneering screening paradigm promises to catalyze the discovery and clinical application of ACPs, constituting a seminal stride towards the realization of more efficacious and economically viable precision oncology interventions.
Collapse
Affiliation(s)
- Yupeng Niu
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Zhenghao Li
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Ziao Chen
- College of Law, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Wenyuan Huang
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Jingxuan Tan
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Fa Tian
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
| | - Tao Yang
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Yamin Fan
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| | - Jiangshu Wei
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
| | - Jiong Mu
- College of Information Engineering, Sichuan Agricultural University, Ya'an 625000, China
- Artificial intelligence laboratory, Sichuan Agricultural University, Ya'an 625000, China
| |
Collapse
|
11
|
Zaidat B, Shrestha N, Rosenberg AM, Ahmed W, Rajjoub R, Hoang T, Mejia MR, Duey AH, Tang JE, Kim JS, Cho SK. Performance of a Large Language Model in the Generation of Clinical Guidelines for Antibiotic Prophylaxis in Spine Surgery. Neurospine 2024; 21:128-146. [PMID: 38569639 PMCID: PMC10992653 DOI: 10.14245/ns.2347310.655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2023] [Revised: 01/29/2024] [Accepted: 02/17/2024] [Indexed: 04/05/2024] Open
Abstract
OBJECTIVE Large language models, such as chat generative pre-trained transformer (ChatGPT), have great potential for streamlining medical processes and assisting physicians in clinical decision-making. This study aimed to assess the potential of ChatGPT's 2 models (GPT-3.5 and GPT-4.0) to support clinical decision-making by comparing its responses for antibiotic prophylaxis in spine surgery to accepted clinical guidelines. METHODS ChatGPT models were prompted with questions from the North American Spine Society (NASS) Evidence-based Clinical Guidelines for Multidisciplinary Spine Care for Antibiotic Prophylaxis in Spine Surgery (2013). Its responses were then compared and assessed for accuracy. RESULTS Of the 16 NASS guideline questions concerning antibiotic prophylaxis, 10 responses (62.5%) were accurate in ChatGPT's GPT-3.5 model and 13 (81%) were accurate in GPT-4.0. Twenty-five percent of GPT-3.5 answers were deemed as overly confident while 62.5% of GPT-4.0 answers directly used the NASS guideline as evidence for its response. CONCLUSION ChatGPT demonstrated an impressive ability to accurately answer clinical questions. GPT-3.5 model's performance was limited by its tendency to give overly confident responses and its inability to identify the most significant elements in its responses. GPT-4.0 model's responses had higher accuracy and cited the NASS guideline as direct evidence many times. While GPT-4.0 is still far from perfect, it has shown an exceptional ability to extract the most relevant research available compared to GPT-3.5. Thus, while ChatGPT has shown far-reaching potential, scrutiny should still be exercised regarding its clinical use at this time.
Collapse
Affiliation(s)
- Bashar Zaidat
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nancy Shrestha
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashley M. Rosenberg
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Wasil Ahmed
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Rami Rajjoub
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Timothy Hoang
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mateo Restrepo Mejia
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Akiro H. Duey
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Justin E. Tang
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Jun S. Kim
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Samuel K. Cho
- Department of Orthopedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
12
|
Boussen S, Benard-Tertrais M, Ogéa M, Malet A, Simeone P, Antonini F, Bruder N, Velly L. Heart rate complexity helps mortality prediction in the intensive care unit: A pilot study using artificial intelligence. Comput Biol Med 2024; 169:107934. [PMID: 38183707 DOI: 10.1016/j.compbiomed.2024.107934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 12/10/2023] [Accepted: 01/01/2024] [Indexed: 01/08/2024]
Abstract
BACKGROUND In intensive care units (ICUs), accurate mortality prediction is crucial for effective patient management and resource allocation. The Simplified Acute Physiology Score II (SAPS-2), though commonly used, relies heavily on comprehensive clinical data and blood samples. This study sought to develop an artificial intelligence (AI) model utilizing key hemodynamic parameters to predict ICU mortality within the first 24 h and assess its performance relative to SAPS-2. METHODS We conducted an analysis of select hemodynamic parameters and the structure of heart rate curves to identify potential predictors of ICU mortality. A machine-learning model was subsequently trained and validated on distinct patient cohorts. The AI algorithm's performance was then compared to the SAPS-2, focusing on classification accuracy, calibration, and generalizability. MEASUREMENTS AND MAIN RESULTS The study included 1298 ICU admissions from March 27th, 2015, to March 27th, 2017. An additional cohort from 2022 to 2023 comprised 590 patients, resulting in a total dataset of 1888 patients. The observed mortality rate stood at 24.0%. Key determinants of mortality were the Glasgow Coma Scale score, heart rate complexity, patient age, duration of diastolic blood pressure below 50 mmHg, heart rate variability, and specific mean and systolic blood pressure thresholds. The AI model, informed by these determinants, exhibited a performance profile in predicting mortality that was comparable, if not superior, to the SAPS-2. CONCLUSIONS The AI model, which integrates heart rate and blood pressure curve analyses with basic clinical parameters, provides a methodological approach to predict in-hospital mortality in ICU patients. This model offers an alternative to existing tools that depend on extensive clinical data and laboratory inputs. Its potential integration into ICU monitoring systems may facilitate more streamlined mortality prediction processes.
Collapse
Affiliation(s)
- Salah Boussen
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France; Laboratoire de Biomécanique Appliquée-Université Gustave-Eiffel, Aix-Marseille Université, UMR T24, 51 boulevard Pierre Dramard, 13015, Marseille, France.
| | - Manuela Benard-Tertrais
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Mathilde Ogéa
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Arthur Malet
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Pierre Simeone
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France; Aix Marseille University, CNRS, Inst Neurosci Timone, UMR7289, Marseille, France
| | - François Antonini
- Intensive Care and Anesthesiology Department, Hôpital Nord Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Nicolas Bruder
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Lionel Velly
- Intensive Care and Anesthesiology Department, La Timone Teaching Hospital, Aix-Marseille Université Assistance Publique Hôpitaux de Marseille, Marseille, France; Aix Marseille University, CNRS, Inst Neurosci Timone, UMR7289, Marseille, France
| |
Collapse
|
13
|
Jerfy A, Selden O, Balkrishnan R. The Growing Impact of Natural Language Processing in Healthcare and Public Health. INQUIRY : A JOURNAL OF MEDICAL CARE ORGANIZATION, PROVISION AND FINANCING 2024; 61:469580241290095. [PMID: 39396164 PMCID: PMC11475376 DOI: 10.1177/00469580241290095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 09/10/2024] [Accepted: 09/18/2024] [Indexed: 10/14/2024]
Abstract
Natural Language Processing (NLP) is a subset of Artificial Intelligence, specifically focused on understanding and generating human language. NLP technologies are becoming more prevalent in healthcare and hold potential solutions to current problems. Some examples of existing and future uses include: public sentiment analysis in relation to health policies, electronic health record (EHR) screening, use of speech to text technology for extracting EHR data from point of care, patient communications, accelerated identification of eligible clinical trial candidates through automated searches and access of health data to assist in informed treatment decisions. This narrative review aims to summarize the current uses of NLP in healthcare, highlight successful implementation of computational linguistics-based approaches, and identify gaps, limitations, and emerging trends within the subfield of NLP in public health. The online databases Google Scholar and PubMed were scanned for papers published between 2018 and 2023. Keywords "Natural Language Processing, Health Policy, Large Language Models" were utilized in the initial search. Then, papers were limited to those written in English. Each of the 27 selected papers was subject to careful analysis, and their relevance in relation to NLP and healthcare respectively is utilized in this review. NLP and deep learning technologies scan large datasets, extracting valuable insights in various realms. This is especially significant in healthcare where huge amounts of data exist in the form of unstructured text. Automating labor intensive and tedious tasks with language processing algorithms, using text analytics systems and machine learning to analyze social media data and extracting insights from unstructured data allows for better public sentiment analysis, enhancement of risk prediction models, improved patient communication, and informed treatment decisions. In the recent past, some studies have applied NLP tools to social media posts to evaluate public sentiment regarding COVID-19 vaccine use. Social media data also has the capacity to be harnessed to develop pandemic prediction models based on reported symptoms. Furthermore, NLP has the potential to enhance healthcare delivery across the globe. Advanced language processing techniques such as Speech Recognition (SR) and Natural Language Understanding (NLU) tools can help overcome linguistic barriers and facilitate efficient communication between patients and healthcare providers.
Collapse
Affiliation(s)
- Aadit Jerfy
- University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Owen Selden
- University of Virginia School of Medicine, Charlottesville, VA, USA
| | | |
Collapse
|
14
|
Lan Z, Turchin A. Impact of possible errors in natural language processing-derived data on downstream epidemiologic analysis. JAMIA Open 2023; 6:ooad111. [PMID: 38152447 PMCID: PMC10752385 DOI: 10.1093/jamiaopen/ooad111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/14/2023] [Accepted: 12/19/2023] [Indexed: 12/29/2023] Open
Abstract
Objective To assess the impact of potential errors in natural language processing (NLP) on the results of epidemiologic studies. Materials and Methods We utilized data from three outcomes research studies where the primary predictor variable was generated using NLP. For each of these studies, Monte Carlo simulations were applied to generate datasets simulating potential errors in NLP-derived variables. We subsequently fit the original regression models to these partially simulated datasets and compared the distribution of coefficient estimates to the original study results. Results Among the four models evaluated, the mean change in the point estimate of the relationship between the predictor variable and the outcome ranged from -21.9% to 4.12%. In three of the four models, significance of this relationship was not eliminated in a single of the 500 simulations, and in one model it was eliminated in 12% of simulations. Mean changes in the estimates for confounder variables ranged from 0.27% to 2.27% and significance of the relationship was eliminated between 0% and 9.25% of the time. No variables underwent a shift in the direction of its interpretation. Discussion Impact of simulated NLP errors on the results of epidemiologic studies was modest, with only small changes in effect estimates and no changes in the interpretation of the findings (direction and significance of association with the outcome) for either the NLP-generated variables or other variables in the models. Conclusion NLP errors are unlikely to affect the results of studies that use NLP as the source of data.
Collapse
Affiliation(s)
- Zhou Lan
- Center for Clinical Investigation, Brigham & Women’s Hospital, Boston, MA 02115, United States
- Harvard Medical School, Boston, MA 02115, United States
| | - Alexander Turchin
- Harvard Medical School, Boston, MA 02115, United States
- Division of Endocrinology, Brigham & Women’s Hospital, Boston, MA 02115, United States
| |
Collapse
|
15
|
Hydoub YM, Walker AP, Kirchoff RW, Alzu'bi HM, Chipi PY, Gerberi DJ, Burton MC, Murad MH, Dugani SB. Risk Prediction Models for Hospital Mortality in General Medical Patients: A Systematic Review. AMERICAN JOURNAL OF MEDICINE OPEN 2023; 10:100044. [PMID: 38090393 PMCID: PMC10715621 DOI: 10.1016/j.ajmo.2023.100044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 03/20/2023] [Accepted: 05/27/2023] [Indexed: 07/20/2024]
Abstract
Objective To systematically review contemporary prediction models for hospital mortality developed or validated in general medical patients. Methods We screened articles in five databases, from January 1, 2010, through April 7, 2022, and the bibliography of articles selected for final inclusion. We assessed the quality for risk of bias and applicability using the Prediction Model Risk of Bias Assessment Tool (PROBAST) and extracted data using the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) checklist. Two investigators independently screened each article, assessed quality, and extracted data. Results From 20,424 unique articles, we identified 15 models in 8 studies across 10 countries. The studies included 280,793 general medical patients and 19,923 hospital deaths. Models included 7 early warning scores, 2 comorbidities indices, and 6 combination models. Ten models were studied in all general medical patients (general models) and 7 in general medical patients with infection (infection models). Of the 15 models, 13 were developed using logistic or Poisson regression and 2 using machine learning methods. Also, 4 of 15 models reported on handling of missing values. None of the infection models had high discrimination, whereas 4 of 10 general models had high discrimination (area under curve >0.8). Only 1 model appropriately assessed calibration. All models had high risk of bias; 4 of 10 general models and 5 of 7 infection models had low concern for applicability for general medical patients. Conclusion Mortality prediction models for general medical patients were sparse and differed in quality, applicability, and discrimination. These models require hospital-level validation and/or recalibration in general medical patients to guide mortality reduction interventions.
Collapse
Affiliation(s)
- Yousif M. Hydoub
- Division of Cardiology, Sheikh Shakhbout Medical City, Abu Dhabi, United Arab Emirates
| | - Andrew P. Walker
- Division of Hospital Internal Medicine, Mayo Clinic, Phoenix, Ariz
- Department of Critical Care Medicine, Mayo Clinic, Phoenix, Ariz
| | - Robert W. Kirchoff
- Division of Hospital Internal Medicine, Mayo Clinic, Phoenix, Ariz
- Division of Hospital Internal Medicine, Mayo Clinic, Rochester, Minn
| | | | - Patricia Y. Chipi
- Division of Hospital Internal Medicine, Mayo Clinic, Jacksonville, Fla
| | | | | | - M. Hassan Murad
- Evidence-Based Practice Center, Mayo Clinic, Rochester, Minn
| | - Sagar B. Dugani
- Division of Hospital Internal Medicine, Mayo Clinic, Rochester, Minn
- Division of Health Care Delivery Research, Kern Center for the Science of Health Care Delivery, Mayo Clinic, Rochester, Minn
| |
Collapse
|
16
|
Zaver HB, Patel T. Opportunities for the use of large language models in hepatology. Clin Liver Dis (Hoboken) 2023; 22:171-176. [PMID: 38026124 PMCID: PMC10653579 DOI: 10.1097/cld.0000000000000075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 06/05/2023] [Indexed: 12/01/2023] Open
Affiliation(s)
- Himesh B. Zaver
- Department of Internal Medicine, Mayo Clinic, Jacksonville, Florida, USA
| | - Tushar Patel
- Department of Transplant, Mayo Clinic, Jacksonville, Florida, USA
| |
Collapse
|
17
|
Khan SH, Perkins AJ, Fuchita M, Holler E, Ortiz D, Boustani M, Khan BA, Gao S. Development of a population-level prediction model for intensive care unit (ICU) survivorship and mortality in older adults: A population-based cohort study. Health Sci Rep 2023; 6:e1634. [PMID: 37867787 PMCID: PMC10587446 DOI: 10.1002/hsr2.1634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 09/21/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023] Open
Abstract
Background and Aims Given the growing utilization of critical care services by an aging population, development of population-level risk models which predict intensive care unit (ICU) survivorship and mortality may offer advantages for researchers and health systems. Our objective was to develop a risk model for ICU survivorship and mortality among community dwelling older adults. Methods This was a population-based cohort study of 48,127 patients who were 50 years and older with at least one primary care visit between January 1, 2017, and December 31, 2017. We used electronic health record (EHR) data to identify variables predictive of ICU survivorship. Results ICU admission and mortality within 2 years after index primary care visit date were used to divide patients into three groups of "alive without ICU admission", "ICU survivors," and "death." Multinomial logistic regression was used to identify EHR predictive variables for the three patient outcomes. Cross-validation by randomly splitting the data into derivation and validation data sets (60:40 split) was used to identify predictor variables and validate model performance using area under the receiver operating characteristics (AUC) curve. In our overall sample, 92.2% of patients were alive without ICU admission, 6.2% were admitted to the ICU at least once and survived, and 1.6% died. Greater deciles of age over 50 years, diagnoses of chronic obstructive pulmonary disorder or chronic heart failure, and laboratory abnormalities in alkaline phosphatase, hematocrit, and albumin contributed highest risk score weights for mortality. Risk scores derived from the model discriminated between patients that died versus remained alive without ICU admission (AUC = 0.858), and between ICU survivors versus alive without ICU admission (AUC = 0.765). Conclusion Our risk scores provide a feasible and scalable tool for researchers and health systems to identify patient cohorts at increased risk for ICU admission and survivorship. Further studies are needed to prospectively validate the risk scores in other patient populations.
Collapse
Affiliation(s)
- Sikandar H. Khan
- Division of Pulmonary, Critical CareSleep and Occupational MedicineIndianapolisIndianaUSA
- Regenstrief InstituteIndiana University Center for Aging ResearchIndianapolisIndianaUSA
- Department of MedicineIndiana University School of MedicineIndianapolisIndianaUSA
| | - Anthony J. Perkins
- Department of Biostatistics and Health Data ScienceIndiana University School of MedicineIndianapolisIndianaUSA
| | - Mikita Fuchita
- Department of AnesthesiologyUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | - Emma Holler
- Department of Epidemiology and BiostatisticsIndiana University School of Public HealthBloomingtonIndianaUSA
| | - Damaris Ortiz
- Department of SurgeryIndiana University School of MedicineIndianapolisIndianaUSA
| | - Malaz Boustani
- Center for Health Innovation and Implementation ScienceIndiana University School of MedicineIndianapolisIndianaUSA
| | - Babar A. Khan
- Division of Pulmonary, Critical CareSleep and Occupational MedicineIndianapolisIndianaUSA
- Regenstrief InstituteIndiana University Center for Aging ResearchIndianapolisIndianaUSA
- Department of MedicineIndiana University School of MedicineIndianapolisIndianaUSA
| | - Sujuan Gao
- Department of Biostatistics and Health Data ScienceIndiana University School of MedicineIndianapolisIndianaUSA
| |
Collapse
|
18
|
Patton MJ, Liu VX. Predictive Modeling Using Artificial Intelligence and Machine Learning Algorithms on Electronic Health Record Data: Advantages and Challenges. Crit Care Clin 2023; 39:647-673. [PMID: 37704332 DOI: 10.1016/j.ccc.2023.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2023]
Abstract
The rapid adoption of electronic health record (EHR) systems in US hospitals from 2008 to 2014 produced novel data elements for analysis. Concurrent innovations in computing architecture and machine learning (ML) algorithms have made rapid consumption of health data feasible and a powerful engine for clinical innovation. In critical care research, the net convergence of these trends has resulted in an exponential increase in outcome prediction research. In the following article, we explore the history of outcome prediction in the intensive care unit (ICU), the growing use of EHR data, and the rise of artificial intelligence and ML (AI) in critical care.
Collapse
Affiliation(s)
- Michael J Patton
- Medical Scientist Training Program, Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA; Hugh Kaul Precision Medicine Institute at the University of Alabama at Birmingham, 720 20th Street South, Suite 202, Birmingham, Alabama, 35233, USA.
| | - Vincent X Liu
- Kaiser Permanente Division of Research, Oakland, CA, USA.
| |
Collapse
|
19
|
Mao C, Xu J, Rasmussen L, Li Y, Adekkanattu P, Pacheco J, Bonakdarpour B, Vassar R, Shen L, Jiang G, Wang F, Pathak J, Luo Y. AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inform 2023; 144:104442. [PMID: 37429512 PMCID: PMC11131134 DOI: 10.1016/j.jbi.2023.104442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/12/2023]
Abstract
OBJECTIVE We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.
Collapse
Affiliation(s)
- Chengsheng Mao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States; Weill Cornell Medicine, New York, NY, United States
| | - Luke Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | | | - Jennifer Pacheco
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Borna Bonakdarpour
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Robert Vassar
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Fei Wang
- Weill Cornell Medicine, New York, NY, United States
| | | | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
| |
Collapse
|
20
|
Crowson MG, Alsentzer E, Fiskio J, Bates DW. Towards Medical Billing Automation: NLP for Outpatient Clinician Note Classification. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.07.07.23292367. [PMID: 37502975 PMCID: PMC10370228 DOI: 10.1101/2023.07.07.23292367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/29/2023]
Abstract
Objectives Our primary objective was to develop a natural language processing approach that accurately predicts outpatient Evaluation and Management (E/M) level of service (LoS) codes using clinicians' notes from a health system electronic health record. A secondary objective was to investigate the impact of clinic note de-identification on document classification performance. Methods We used retrospective outpatient office clinic notes from four medical and surgical specialties. Classification models were fine-tuned on the clinic notes datasets and stratified by subspecialty. The success criteria for the classification tasks were the classification accuracy and F1-scores on internal test data. For the secondary objective, the dataset was de-identified using Named Entity Recognition (NER) to remove protected health information (PHI), and models were retrained. Results The models demonstrated similar predictive performance across different specialties, except for internal medicine, which had the lowest classification accuracy across all model architectures. The models trained on the entire note corpus achieved an E/M LoS CPT code classification accuracy of 74.8% (CI 95: 74.1-75.6). However, the de-identified note corpus showed a markedly lower classification accuracy of 48.2% (CI 95: 47.7-48.6) compared to the model trained on the identified notes. Conclusion The study demonstrates the potential of NLP-based document classifiers to accurately predict E/M LoS CPT codes using clinical notes from various medical and procedural specialties. The models' performance suggests that the classification task's complexity merits further investigation. The de-identification experiment demonstrated that de-identification may negatively impact classifier performance. Further research is needed to validate the performance of our NLP classifiers in different healthcare settings and patient populations and to investigate the potential implications of de-identification on model performance.
Collapse
|
21
|
Fanconi C, de Hond A, Peterson D, Capodici A, Hernandez-Boussard T. A Bayesian approach to predictive uncertainty in chemotherapy patients at risk of acute care utilization. EBioMedicine 2023; 92:104632. [PMID: 37269570 PMCID: PMC10250586 DOI: 10.1016/j.ebiom.2023.104632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/09/2023] [Accepted: 05/11/2023] [Indexed: 06/05/2023] Open
Abstract
BACKGROUND Machine learning (ML) predictions are becoming increasingly integrated into medical practice. One commonly used method, ℓ1-penalised logistic regression (LASSO), can estimate patient risk for disease outcomes but is limited by only providing point estimates. Instead, Bayesian logistic LASSO regression (BLLR) models provide distributions for risk predictions, giving clinicians a better understanding of predictive uncertainty, but they are not commonly implemented. METHODS This study evaluates the predictive performance of different BLLRs compared to standard logistic LASSO regression, using real-world, high-dimensional, structured electronic health record (EHR) data from cancer patients initiating chemotherapy at a comprehensive cancer centre. Multiple BLLR models were compared against a LASSO model using an 80-20 random split using 10-fold cross-validation to predict the risk of acute care utilization (ACU) after starting chemotherapy. FINDINGS This study included 8439 patients. The LASSO model predicted ACU with an area under the receiver operating characteristic curve (AUROC) of 0.806 (95% CI: 0.775-0.834). BLLR with a Horseshoe+ prior and a posterior approximated by Metropolis-Hastings sampling showed similar performance: 0.807 (95% CI: 0.780-0.834) and offers the advantage of uncertainty estimation for each prediction. In addition, BLLR could identify predictions too uncertain to be automatically classified. BLLR uncertainties were stratified by different patient subgroups, demonstrating that predictive uncertainties significantly differ across race, cancer type, and stage. INTERPRETATION BLLRs are a promising yet underutilised tool that increases explainability by providing risk estimates while offering a similar level of performance to standard LASSO-based models. Additionally, these models can identify patient subgroups with higher uncertainty, which can augment clinical decision-making. FUNDING This work was supported in part by the National Library Of Medicine of the National Institutes of Health under Award Number R01LM013362. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Collapse
Affiliation(s)
- Claudio Fanconi
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zürich, Switzerland
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA
| | - Anne de Hond
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA
- Clinical AI Implementation and Research Lab, Leiden University Medical Centre, Leiden, the Netherlands
| | - Dylan Peterson
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA
| | - Angelo Capodici
- Department of Medicine (Biomedical Informatics), Stanford University, Stanford, USA
- Department of Biomedical and Neuromotor Science, University of Bologna, Bologna, Italy
| | | |
Collapse
|
22
|
Jin ZG, Zhang H, Tai MH, Yang Y, Yao Y, Guo YT. Natural Language Processing in a Clinical Decision Support System for the Identification of Venous Thromboembolism: Algorithm Development and Validation. J Med Internet Res 2023; 25:e43153. [PMID: 37093636 PMCID: PMC10167583 DOI: 10.2196/43153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 11/20/2022] [Accepted: 03/29/2023] [Indexed: 03/31/2023] Open
Abstract
BACKGROUND It remains unknown whether capturing data from electronic health records (EHRs) using natural language processing (NLP) can improve venous thromboembolism (VTE) detection in different clinical settings. OBJECTIVE The aim of this study was to validate the NLP algorithm in a clinical decision support system for VTE risk assessment and integrated care (DeVTEcare) to identify VTEs from EHRs. METHODS All inpatients aged ≥18 years in the Sixth Medical Center of the Chinese People's Liberation Army General Hospital from January 1 to December 31, 2021, were included as the validation cohort. The sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR-, respectively), area under the receiver operating characteristic curve (AUC), and F1-scores along with their 95% CIs were used to analyze the performance of the NLP tool, with manual review of medical records as the reference standard for detecting deep vein thrombosis (DVT) and pulmonary embolism (PE). The primary end point was the performance of the NLP approach embedded into the EHR for VTE identification. The secondary end points were the performances to identify VTE among different hospital departments with different VTE risks. Subgroup analyses were performed among age, sex, and the study season. RESULTS Among 30,152 patients (median age 56 [IQR 41-67] years; 14,247/30,152, 47.3% females), the prevalence of VTE, PE, and DVT was 2.1% (626/30,152), 0.6% (177/30,152), and 1.8% (532/30,152), respectively. The sensitivity, specificity, LR+, LR-, AUC, and F1-score of NLP-facilitated VTE detection were 89.9% (95% CI 87.3%-92.2%), 99.8% (95% CI 99.8%-99.9%), 483 (95% CI 370-629), 0.10 (95% CI 0.08-0.13), 0.95 (95% CI 0.94-0.96), and 0.90 (95% CI 0.90-0.91), respectively. Among departments of surgery, internal medicine, and intensive care units, the highest specificity (100% vs 99.7% vs 98.8%, respectively), LR+ (3202 vs 321 vs 77, respectively), and F1-score (0.95 vs 0.89 vs 0.92, respectively) were in the surgery department (all P<.001). Among low, intermediate, and high VTE risks in hospital departments, the low-risk department had the highest AUC (1.00 vs 0.94 vs 0.96, respectively) and F1-score (0.97 vs 0.90 vs 0.90, respectively) as well as the lowest LR- (0.00 vs 0.13 vs 0.08, respectively) (DeLong test for AUC; all P<.001). Subgroup analysis of the age, sex, and season demonstrated consistently good performance of VTE detection with >87% sensitivity and specificity and >89% AUC and F1-score. The NLP algorithm performed better among patients aged ≤65 years than among those aged >65 years (F1-score 0.93 vs 0.89, respectively; P<.001). CONCLUSIONS The NLP algorithm in our DeVTEcare identified VTE well across different clinical settings, especially in patients in surgery units, departments with low-risk VTE, and patients aged ≤65 years. This algorithm can help to inform accurate in-hospital VTE rates and enhance risk-classified VTE integrated care in future research.
Collapse
Affiliation(s)
- Zhi-Geng Jin
- Department of Pulmonary Vascular and Thrombotic Disease, Sixth Medical Center of Chinese People's Liberation Army General Hospital, Beijing, China
| | - Hui Zhang
- Department of Pulmonary Vascular and Thrombotic Disease, Sixth Medical Center of Chinese People's Liberation Army General Hospital, Beijing, China
| | - Mei-Hui Tai
- Chinese People's Liberation Army Medical School, Beijing, China
| | - Ying Yang
- Quality Management Division, Sixth Medical Center of Chinese People's Liberation Army General Hospital, Beijing, China
| | - Yuan Yao
- Institute for Hospital Management Research, Chinese People's Liberation Army General Hospital, Beijing, China
| | - Yu-Tao Guo
- Department of Pulmonary Vascular and Thrombotic Disease, Sixth Medical Center of Chinese People's Liberation Army General Hospital, Beijing, China
| |
Collapse
|
23
|
Fernandes MB, Valizadeh N, Alabsi HS, Quadri SA, Tesh RA, Bucklin AA, Sun H, Jain A, Brenner LN, Ye E, Ge W, Collens SI, Lin S, Das S, Robbins GK, Zafar SF, Mukerji SS, Westover MB. Classification of neurologic outcomes from medical notes using natural language processing. EXPERT SYSTEMS WITH APPLICATIONS 2023; 214:119171. [PMID: 36865787 PMCID: PMC9974159 DOI: 10.1016/j.eswa.2022.119171] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Neurologic disability level at hospital discharge is an important outcome in many clinical research studies. Outside of clinical trials, neurologic outcomes must typically be extracted by labor intensive manual review of clinical notes in the electronic health record (EHR). To overcome this challenge, we set out to develop a natural language processing (NLP) approach that automatically reads clinical notes to determine neurologic outcomes, to make it possible to conduct larger scale neurologic outcomes studies. We obtained 7314 notes from 3632 patients hospitalized at two large Boston hospitals between January 2012 and June 2020, including discharge summaries (3485), occupational therapy (1472) and physical therapy (2357) notes. Fourteen clinical experts reviewed notes to assign scores on the Glasgow Outcome Scale (GOS) with 4 classes, namely 'good recovery', 'moderate disability', 'severe disability', and 'death' and on the Modified Rankin Scale (mRS), with 7 classes, namely 'no symptoms', 'no significant disability', 'slight disability', 'moderate disability', 'moderately severe disability', 'severe disability', and 'death'. For 428 patients' notes, 2 experts scored the cases generating interrater reliability estimates for GOS and mRS. After preprocessing and extracting features from the notes, we trained a multiclass logistic regression model using LASSO regularization and 5-fold cross validation for hyperparameter tuning. The model performed well on the test set, achieving a micro average area under the receiver operating characteristic and F-score of 0.94 (95% CI 0.93-0.95) and 0.77 (0.75-0.80) for GOS, and 0.90 (0.89-0.91) and 0.59 (0.57-0.62) for mRS, respectively. Our work demonstrates that an NLP algorithm can accurately assign neurologic outcomes based on free text clinical notes. This algorithm increases the scale of research on neurological outcomes that is possible with EHR data.
Collapse
Affiliation(s)
- Marta B. Fernandes
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Navid Valizadeh
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Haitham S. Alabsi
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Syed A. Quadri
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Ryan A. Tesh
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Abigail A. Bucklin
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Haoqi Sun
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Aayushee Jain
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Laura N. Brenner
- Harvard Medical School, Boston, MA, United States
- Division of Pulmonary and Critical Care Medicine, MGH, Boston, MA, United States
- Division of General Internal Medicine, MGH, Boston, MA, United States
| | - Elissa Ye
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Wendong Ge
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
| | - Sarah I. Collens
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
| | - Stacie Lin
- Harvard Medical School, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Gregory K. Robbins
- Harvard Medical School, Boston, MA, United States
- Division of Infectious Diseases, MGH, Boston, MA, United States
| | - Sahar F. Zafar
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
| | - Shibani S. Mukerji
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Vaccine and Immunotherapy Center, Division of Infectious Diseases, MGH, Boston, MA, United States
| | - M. Brandon Westover
- Department of Neurology, Massachusetts General Hospital (MGH), Boston, MA, United States
- Harvard Medical School, Boston, MA, United States
- Clinical Data Animation Center (CDAC), MGH, Boston, MA, United States
- McCance Center for Brain Health, MGH, Boston, MA, United States
| |
Collapse
|
24
|
Pacheco MC, Hiraiwa P, Finn LS, Kapur R. Computer-Based Natural Language Search Applied to the Electronic Medical Record for Tonsil Triage. Am J Clin Pathol 2023; 159:158-163. [PMID: 36495296 DOI: 10.1093/ajcp/aqac146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 10/21/2022] [Indexed: 12/14/2022] Open
Abstract
OBJECTIVES To determine significant histologic findings in tonsils and categorize clinical settings in which they occur to identify cases benefiting from histopathologic examination using a computer-based natural language search (NLS) applied to the electronic medical record. METHODS The pathology database was queried for tonsillectomy cases accessioned between 2002 and 2018. Tonsils with microscopic examination were reviewed, and indication for examination and diagnoses were tallied. Clinical risk of malignancy was correlated with findings. A NLS was used to interrogate preoperative clinical records of the same group of patients. The search identified cases at risk of significant histologic findings and was implemented as part of standard practice. RESULTS Of the 18,733 bilateral tonsillectomies identified in the pathology database, 494 were palatine tonsils that underwent microscopic examination, 134 had indications concerning for malignancy, and 14 had significant findings on histologic examination. When the NLS was applied to the medical record of the same group, 223 cases were identified as having risk of malignancy, including all flagged by surgeons and pathologists and 89 additional cases. Clinical implementation resulted in identification of all cases benefiting from examination. CONCLUSIONS A NLS applied to the electronic medical record to select tonsils for examination was superior to relying on surgeons and pathologists.
Collapse
Affiliation(s)
- M Cristina Pacheco
- Department of Laboratories, Seattle Children's Hospital, Seattle, WA, USA.,Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Paul Hiraiwa
- Department of Laboratories, Seattle Children's Hospital, Seattle, WA, USA
| | - Laura S Finn
- Department of Laboratories, Seattle Children's Hospital, Seattle, WA, USA.,Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Raj Kapur
- Department of Laboratories, Seattle Children's Hospital, Seattle, WA, USA.,Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| |
Collapse
|
25
|
Kim M, Park S, Kim C, Choi M. Diagnostic accuracy of clinical outcome prediction using nursing data in intensive care patients: A systematic review. Int J Nurs Stud 2023; 138:104411. [PMID: 36495596 DOI: 10.1016/j.ijnurstu.2022.104411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2022] [Revised: 09/17/2022] [Accepted: 11/22/2022] [Indexed: 11/30/2022]
Abstract
BACKGROUND Nursing data consist of observations of patients' conditions and information on nurses' clinical judgment based on critically ill patients' behavior and physiological signs. Nursing data in electronic health records were recently emphasized as important predictors of patients' deterioration but have not been systematically reviewed. OBJECTIVE We conducted a systematic review of prediction models using nursing data for clinical outcomes, such as prolonged hospital stay, readmission, and mortality in intensive care patients, compared to physiological data only. In addition, the type of nursing data used in prediction model developments was investigated. DESIGN A systematic review. METHODS PubMed, CINAHL, Cochrane CENTRAL, EMBASE, IEEE Xplore Digital Library, Web of Science, and Scopus were searched. Clinical outcome prediction models using nursing data for intensive care patients were included. Clinical outcomes were prolonged hospital stay, readmission, and mortality. Data were extracted from selected studies such as study design, data source, outcome definition, sample size, predictors, reference test, model development, model performance, and evaluation. The risk of bias and applicability was assessed using the Prediction model Risk of Bias Assessment Tool checklist. Descriptive summaries were produced based on paired forest plots and summary receiver operating characteristic curves. RESULTS Sixteen studies were included in the systematic review. The data types of predictors used in prediction models were categorized as physiological data, nursing data, and clinical notes. The types of nursing data consisted of nursing notes, assessments, documentation frequency, and flowsheet comments. The studies using physiological data as a reference test showed higher predictive performance in combined data or nursing data than in physiological data. The overall risk of bias indicated that most of the included studies have a high risk. CONCLUSIONS This study was conducted to identify and review the diagnostic accuracy of clinical outcome prediction using nursing data in intensive care patients. Most of the included studies developed models using nursing notes, and other studies used nursing assessments, documentation frequency, and flowsheet comments. Although the findings need careful interpretation due to the high risk of bias, the area under the curve scores of nursing data and combined data were higher than physiological data alone. It is necessary to establish a strategy in prediction modeling to utilize nursing data, clinical notes, and physiological data as predictors, considering the clinical context rather than physiological data alone. REGISTRATION The protocol for this study is registered with PROSPERO (registration number: CRD42021273319).
Collapse
Affiliation(s)
- Mihui Kim
- College of Nursing and Brain Korea 21 FOUR Project, Yonsei University, Seoul, Republic of Korea.
| | - Sangwoo Park
- College of Nursing and Mo-Im Kim Nursing Research Institute, Yonsei University, Seoul, Republic of Korea.
| | - Changhwan Kim
- School of Nursing, Johns Hopkins University, Baltimore, MD, United States of America.
| | - Mona Choi
- College of Nursing and Mo-Im Kim Nursing Research Institute, Yonsei University, Seoul, Republic of Korea; Yonsei Evidence Based Nursing Centre of Korea, A JBI Affiliated Group, Seoul, Republic of Korea.
| |
Collapse
|
26
|
Lin FPY, Salih OS, Scott N, Jameson MB, Epstein RJ. Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer. JCO Clin Cancer Inform 2022; 6:e2200064. [DOI: 10.1200/cci.22.00064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
PURPOSE Predicting short-term mortality in patients with advanced cancer remains challenging. Whether digitalized clinical text can be used to build models to enhance survival prediction in this population is unclear. MATERIALS AND METHODS We conducted a single-centered retrospective cohort study in patients with advanced solid tumors. Clinical correspondence authored by oncologists at the first patient encounter was extracted from the electronic medical records. Machine learning (ML) models were trained using narratives from the derivation cohort, before being tested on a temporal validation cohort at the same site. Performance was benchmarked against Eastern Cooperative Oncology Group performance status (PS), comparing ML models alone (comparison 1) or in combination with PS (comparison 2), assessed by areas under receiver operating characteristic curves (AUCs) for predicting vital status at 11 time points from 2 to 52 weeks. RESULTS ML models were built on the derivation cohort (4,791 patients from 2001 to April 2017) and tested on the validation cohort of 726 patients (May 2017-June 2019). In 441 patients (61%) where clinical narratives were available and PS was documented, ML models outperformed the predictivity of PS (mean AUC improvement, 0.039, P < .001, comparison 1). Inclusion of both clinical text and PS in ML models resulted in further improvement in prediction accuracy over PS with a mean AUC improvement of 0.050 ( P < .001, comparison 2); the AUC was > 0.80 at all assessed time points for models incorporating clinical text. Exploratory analysis of oncologist's narratives revealed recurring descriptors correlating with survival, including referral patterns, mobility, physical functions, and concomitant medications. CONCLUSION Applying ML to oncologists' narratives with or without including patient's PS significantly improved survival prediction to 12 months, suggesting the utility of clinical text in building prognostic support tools.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
- NHMRC Clinical Trials Centre, Sydney University, Camperdown, Australia
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
| | - Osama S.M. Salih
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Auckland City Hospital, Auckland, New Zealand
| | - Nina Scott
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Michael B. Jameson
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Richard J. Epstein
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
- Cancer Research Division, Garvan Institute of Medical Research, Sydney, Australia
- New Hope Cancer Centre, Beijing United Hospital, Beijing, China
| |
Collapse
|
27
|
Chae S, Song J, Ojo M, Bowles KH, McDonald MV, Barrón Y, Hobensack M, Kennedy E, Sridharan S, Evans L, Topaz M. Factors associated with poor self-management documented in home health care narrative notes for patients with heart failure. Heart Lung 2022; 55:148-154. [PMID: 35597164 PMCID: PMC11021173 DOI: 10.1016/j.hrtlng.2022.05.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 05/03/2022] [Accepted: 05/07/2022] [Indexed: 11/04/2022]
Abstract
BACKGROUND Patients with heart failure (HF) who actively engage in their own self-management have better outcomes. Extracting data through natural language processing (NLP) holds great promise for identifying patients with or at risk of poor self-management. OBJECTIVE To identify home health care (HHC) patients with HF who have poor self-management using NLP of narrative notes, and to examine patient factors associated with poor self-management. METHODS An NLP algorithm was applied to extract poor self-management documentation using 353,718 HHC narrative notes of 9,710 patients with HF. Sociodemographic and structured clinical data were incorporated into multivariate logistic regression models to identify factors associated with poor self-management. RESULTS There were 758 (7.8%) patients in this sample identified as having notes with language describing poor HF self-management. Younger age (OR 0.982, 95% CI 0.976-0.987, p < .001), longer length of stay in HHC (OR 1.036, 95% CI 1.029- 1.043, p < .001), diagnosis of diabetes (OR 1.47, 95% CI 1.3-1.67, p < .001) and depression (OR 1.36, 95% CI 1.09-1.68, p < .01), impaired decision-making (OR 1.64, 95% CI 1.37-1.95, p < .001), smoking (OR 1.7, 95% CI 1.4-2.04, p < .001), and shortness of breath with exertion (OR 1.25, 95% CI 1.1-1.42, p < .01) were associated with poor self-management. CONCLUSIONS Patients with HF who have poor self-management can be identified from the narrative notes in HHC using novel NLP methods. Meaningful information about the self-management of patients with HF can support HHC clinicians in developing individualized care plans to improve self-management and clinical outcomes.
Collapse
Affiliation(s)
- Sena Chae
- College of Nursing, University of Iowa, 50 Newton Rd, Iowa City, IA 52242, United States.
| | - Jiyoun Song
- Columbia University School of Nursing, New York, NY, United States
| | - Marietta Ojo
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, United States
| | - Kathryn H Bowles
- Department of Biobehavioral Health Sciences Philadelphia PA, Center for Home Care Policy & Research, University of Pennsylvania School of Nursing, Visiting Nurse Service of New York, New York, NY, United States
| | - Margaret V McDonald
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, United States
| | - Yolanda Barrón
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, United States
| | - Mollie Hobensack
- Columbia University School of Nursing, New York, NY, United States
| | - Erin Kennedy
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, PA, United States
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, United States
| | - Lauren Evans
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, United States
| | - Maxim Topaz
- Center for Home Care Policy & Research, Columbia University School of Nursing, Data Science Institute, Columbia University, Visiting Nurse Service of New York, New York, NY, United States
| |
Collapse
|
28
|
Peng X, Zhu T, Chen G, Wang Y, Hao X. A multicenter prospective study on postoperative pulmonary complications prediction in geriatric patients with deep neural network model. Front Surg 2022; 9:976536. [PMID: 36017511 PMCID: PMC9395933 DOI: 10.3389/fsurg.2022.976536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/26/2022] [Indexed: 11/13/2022] Open
Abstract
AimPostoperative pulmonary complications (PPCs) can increase the risk of postoperative mortality, and the geriatric population has high incidence of PPCs. Early identification of high-risk geriatric patients is of great value for clinical decision making and prognosis improvement. Existing prediction models are based purely on structured data, and they lack predictive accuracy in geriatric patients. We aimed to develop and validate a deep neural network model based on combined natural language data and structured data for improving the prediction of PPCs in geriatric patients.MethodsWe consecutively enrolled patients aged ≥65 years who underwent surgery under general anesthesia at seven hospitals in China. Data from the West China Hospital of Sichuan University were used as the derivation dataset, and a deep neural network model was developed based on combined natural language data and structured data. Data from the six other hospitals were combined for external validation.ResultsThe derivation dataset included 12,240 geriatric patients, and 1949(15.9%) patients developed PPCs. Our deep neural network model outperformed other machine learning models with an area under the precision-recall curve (AUPRC) of 0.657(95% confidence interval [CI], 0.655–0.658) and an area under the receiver operating characteristic curve (AUROC) of 0.884(95% CI, 0.883–0.885). The external dataset included 7579 patients, and 776(10.2%) patients developed PPCs. In external validation, the AUPRC was 0.632(95%CI, 0.632–0.633) and the AUROC was 0.889(95%CI, 0.888–0.889).ConclusionsThis study indicated that the deep neural network model based on combined natural language data and structured data could improve the prediction of PPCs in geriatric patients.
Collapse
Affiliation(s)
- Xiran Peng
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, ChengduChina
- The Research Units of West China (2018RU012) -Chinese Academy of Medical Sciences, West China Hospital, Sichuan University, ChengduChina
| | - Tao Zhu
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, ChengduChina
- The Research Units of West China (2018RU012) -Chinese Academy of Medical Sciences, West China Hospital, Sichuan University, ChengduChina
| | - Guo Chen
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, ChengduChina
- The Research Units of West China (2018RU012) -Chinese Academy of Medical Sciences, West China Hospital, Sichuan University, ChengduChina
| | - Yaqiang Wang
- College of Software Engineering, Chengdu University of Information Technology, ChengduChina
| | - Xuechao Hao
- Department of Anesthesiology, National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, ChengduChina
- The Research Units of West China (2018RU012) -Chinese Academy of Medical Sciences, West China Hospital, Sichuan University, ChengduChina
- Correspondence: Xuechao Hao
| |
Collapse
|
29
|
Risk Management In Intensive Care Units With Artificial Intelligence Technologies: Systematic Review of Prediction Models Using Electronic Health Records. JOURNAL OF BASIC AND CLINICAL HEALTH SCIENCES 2022. [DOI: 10.30621/jbachs.993798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Background and aim: Clinical risk assessments should be made to protect patients from negative outcomes, and the definition, frequency and severity of the risk should be determined. The information contained in the electronic health records (EHRs) can use in different areas such as risk prediction, estimation of treatment effect ect. Many prediction models using artificial intelligence (AI) technologies that can be used in risk assessment have been developed. The aim of this study is to bring together the researches on prediction models developed with AI technologies using the EHRs of patients hospitalized in the intensive care unit (ICU) and to evaluate them in terms of risk management in healthcare.
Methods: The study restricted the search to the Web of Science, Pubmed, Science Direct, and Medline databases to retrieve research articles published in English in 2010 and after. Studies with a prediction model using data obtained from EHRs in the ICU are included. The study focused solely on research conducted in ICU to predict a health condition that poses a significant risk to patient safety using artificial intellegence (AI) technologies.
Results: Recognized prediction subcategories were mortality (n=6), sepsis (n=4), pressure ulcer (n=4), acute kidney injury (n=3), and other areas (n=10). It has been found that EHR-based prediction models are good risk management and decision support tools and adoption of such models in ICUs may reduce the prevalence of adverse conditions.
Conclusions: The article results remarks that developed models was found to have higher performance and better selectivity than previously developed risk models, so they are better at predicting risks and serious adverse events in ICU. It is recommended to use AI based prediction models developed using EHRs in risk management studies. Future work is still needed to researches to predict different health conditions risks.
Collapse
|
30
|
Ahmad SR, Tarabochia AD, Budahn L, Lemahieu AM, Anderson B, Vashistha K, Karnatovskaia L, Gajic O. Feasibility of Extracting Meaningful Patient Centered Outcomes From the Electronic Health Record Following Critical Illness in the Elderly. Front Med (Lausanne) 2022; 9:826169. [PMID: 35733861 PMCID: PMC9207323 DOI: 10.3389/fmed.2022.826169] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 05/11/2022] [Indexed: 12/04/2022] Open
Abstract
Background Meaningful patient centered outcomes of critical illness such as functional status, cognition and mental health are studied using validated measurement tools that may often be impractical outside the research setting. The Electronic health record (EHR) contains a plethora of information pertaining to these domains. We sought to determine how feasible and reliable it is to assess meaningful patient centered outcomes from the EHR. Methods Two independent investigators reviewed EHR of a random sample of ICU patients looking at documented assessments of trajectory of functional status, cognition, and mental health. Cohen's kappa was used to measure agreement between 2 reviewers. Post ICU health in these domains 12 month after admission was compared to pre- ICU health in the 12 months prior to assess qualitatively whether a patient's condition was “better,” “unchanged” or “worse.” Days alive and out of hospital/health care facility was a secondary outcome. Results Thirty six of the 41 randomly selected patients (88%) survived critical illness. EHR contained sufficient information to determine the difference in health status before and after critical illness in most survivors (86%). Decline in functional status (36%), cognition (11%), and mental health (11%) following ICU admission was observed compared to premorbid baseline. Agreement between reviewers was excellent (kappa ranging from 0.966 to 1). Eighteen patients (44%) remained home after discharge from hospital and rehabilitation during the 12- month follow up. Conclusion We demonstrated the feasibility and reliability of assessing the trajectory of changes in functional status, cognition, and selected mental health outcomes from EHR of critically ill patients. If validated in a larger, representative sample, these outcomes could be used alongside survival in quality improvement studies and pragmatic clinical trials.
Collapse
Affiliation(s)
- Sumera R. Ahmad
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Sumera R. Ahmad
| | - Alex D. Tarabochia
- Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States
| | - Luann Budahn
- Anesthesia and Critical Care Research Unit, Mayo Clinic, Rochester, MN, United States
| | - Allison M. Lemahieu
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Brenda Anderson
- Anesthesia and Critical Care Research Unit, Mayo Clinic, Rochester, MN, United States
| | - Kirtivardhan Vashistha
- Department of Infectious Disease, Multi-disciplinary Epidemiology and Translational Research in Intensive Care Research Group, Mayo Clinic, Rochester, MN, United States
| | | | - Ognjen Gajic
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
31
|
Abstract
PURPOSE OF REVIEW To provide an overview of the systems being used to identify and predict clinical deterioration in hospitalised patients, with focus on the current and future role of artificial intelligence (AI). RECENT FINDINGS There are five leading AI driven systems in this field: the Advanced Alert Monitor (AAM), the electronic Cardiac Arrest Risk Triage (eCART) score, Hospital wide Alert Via Electronic Noticeboard, the Mayo Clinic Early Warning Score, and the Rothman Index (RI). Each uses Electronic Patient Record (EPR) data and machine learning to predict adverse events. Less mature but relevant evolutions are occurring in the fields of Natural Language Processing, Time and Motion Studies, AI Sepsis and COVID-19 algorithms. SUMMARY Research-based AI-driven systems to predict clinical deterioration are increasingly being developed, but few are being implemented into clinical workflows. Escobar et al. (AAM) provide the current gold standard for robust model development and implementation methodology. Multiple technologies show promise, however, the pathway to meaningfully affect patient outcomes remains challenging.
Collapse
Affiliation(s)
- James Malycha
- Discipline of Acute Care Medicine, University of Adelaide, Adelaide
- The Queen Elizabeth Hospital, Department of Intensive Care Medicine, Woodville South
| | - Stephen Bacchi
- Royal Adelaide Hospital, Adelaide, South Australia, Australia
| | - Oliver Redfern
- Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, UK
| |
Collapse
|
32
|
Chen PF, Chen L, Lin YK, Li GH, Lai F, Lu CW, Yang CY, Chen KC, Lin TY. Predicting Postoperative Mortality With Deep Neural Networks and Natural Language Processing: Model Development and Validation. JMIR Med Inform 2022; 10:e38241. [PMID: 35536634 PMCID: PMC9131148 DOI: 10.2196/38241] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Revised: 04/18/2022] [Accepted: 04/26/2022] [Indexed: 11/23/2022] Open
Abstract
Background Machine learning (ML) achieves better predictions of postoperative mortality than previous prediction tools. Free-text descriptions of the preoperative diagnosis and the planned procedure are available preoperatively. Because reading these descriptions helps anesthesiologists evaluate the risk of the surgery, we hypothesized that deep learning (DL) models with unstructured text could improve postoperative mortality prediction. However, it is challenging to extract meaningful concept embeddings from this unstructured clinical text. Objective This study aims to develop a fusion DL model containing structured and unstructured features to predict the in-hospital 30-day postoperative mortality before surgery. ML models for predicting postoperative mortality using preoperative data with or without free clinical text were assessed. Methods We retrospectively collected preoperative anesthesia assessments, surgical information, and discharge summaries of patients undergoing general and neuraxial anesthesia from electronic health records (EHRs) from 2016 to 2020. We first compared the deep neural network (DNN) with other models using the same input features to demonstrate effectiveness. Then, we combined the DNN model with bidirectional encoder representations from transformers (BERT) to extract information from clinical texts. The effects of adding text information on the model performance were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Statistical significance was evaluated using P<.05. Results The final cohort contained 121,313 patients who underwent surgeries. A total of 1562 (1.29%) patients died within 30 days of surgery. Our BERT-DNN model achieved the highest AUROC (0.964, 95% CI 0.961-0.967) and AUPRC (0.336, 95% CI 0.276-0.402). The AUROC of the BERT-DNN was significantly higher compared to logistic regression (AUROC=0.952, 95% CI 0.949-0.955) and the American Society of Anesthesiologist Physical Status (ASAPS AUROC=0.892, 95% CI 0.887-0.896) but not significantly higher compared to the DNN (AUROC=0.959, 95% CI 0.956-0.962) and the random forest (AUROC=0.961, 95% CI 0.958-0.964). The AUPRC of the BERT-DNN was significantly higher compared to the DNN (AUPRC=0.319, 95% CI 0.260-0.384), the random forest (AUPRC=0.296, 95% CI 0.239-0.360), logistic regression (AUPRC=0.276, 95% CI 0.220-0.339), and the ASAPS (AUPRC=0.149, 95% CI 0.107-0.203). Conclusions Our BERT-DNN model has an AUPRC significantly higher compared to previously proposed models using no text and an AUROC significantly higher compared to logistic regression and the ASAPS. This technique helps identify patients with higher risk from the surgical description text in EHRs.
Collapse
Affiliation(s)
- Pei-Fu Chen
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Lichin Chen
- Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan
| | - Yow-Kuan Lin
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Department of Computer Science, Columbia University, New York, NY, United States
| | - Guo-Hung Li
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - Feipei Lai
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan.,Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan
| | - Cheng-Wei Lu
- Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.,Department of Mechanical Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Chi-Yu Yang
- Department of Information Technology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.,Section of Cardiovascular Medicine, Cardiovascular Center, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Kuan-Chih Chen
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.,Department of Internal Medicine, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Tzu-Yu Lin
- Department of Anesthesiology, Far Eastern Memorial Hospital, New Taipei City, Taiwan.,Department of Mechanical Engineering, Yuan Ze University, Taoyuan, Taiwan
| |
Collapse
|
33
|
Parikh RB, Manz CR, Nelson MN, Evans CN, Regli SH, O'Connor N, Schuchter LM, Shulman LN, Patel MS, Paladino J, Shea JA. Clinician perspectives on machine learning prognostic algorithms in the routine care of patients with cancer: a qualitative study. Support Care Cancer 2022; 30:4363-4372. [PMID: 35094138 PMCID: PMC10232355 DOI: 10.1007/s00520-021-06774-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 12/18/2021] [Indexed: 10/19/2022]
Abstract
PURPOSE Oncologists may overestimate prognosis for patients with cancer, leading to delayed or missed conversations about patients' goals and subsequent low-quality end-of-life care. Machine learning algorithms may accurately predict mortality risk in cancer, but it is unclear how oncology clinicians would use such algorithms in practice. METHODS The purpose of this qualitative study was to assess oncology clinicians' perceptions on the utility and barriers of machine learning prognostic algorithms to prompt advance care planning. Participants included medical oncology physicians and advanced practice providers (APPs) practicing in tertiary and community practices within a large academic healthcare system. Transcripts were coded and analyzed inductively using NVivo software. RESULTS The study included 29 oncology clinicians (19 physicians, 10 APPs) across 6 practice sites (1 tertiary, 5 community) in the USA. Fourteen participants had previously had exposure to an automated machine learning-based prognostic algorithm as part of a pragmatic randomized trial. Clinicians believed that there was utility for algorithms in validating their own intuition about prognosis and prompting conversations about patient goals and preferences. However, this enthusiasm was tempered by concerns about algorithm accuracy, over-reliance on algorithm predictions, and the ethical implications around disclosure of an algorithm prediction. There was significant variation in tolerance for false positive vs. false negative predictions. CONCLUSION While oncologists believe there are applications for advanced prognostic algorithms in routine care of patients with cancer, they are concerned about algorithm accuracy, confirmation and automation biases, and ethical issues of prognostic disclosure.
Collapse
Affiliation(s)
- Ravi B Parikh
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA.
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA.
- Corporal Michael J. Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA.
- University of Pennsylvania Health System, Philadelphia, PA, USA.
| | | | - Maria N Nelson
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
| | - Chalanda N Evans
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
- Penn Medicine Nudge Unit, Philadelphia, PA, USA
| | - Susan H Regli
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Nina O'Connor
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Lynn M Schuchter
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Lawrence N Shulman
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
- University of Pennsylvania Health System, Philadelphia, PA, USA
| | - Mitesh S Patel
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
- Abramson Cancer Center, University of Pennsylvania, Philadelphia, PA, USA
- Corporal Michael J. Crescenz Veterans Affairs Medical Center, Philadelphia, PA, USA
- University of Pennsylvania Health System, Philadelphia, PA, USA
- Penn Medicine Nudge Unit, Philadelphia, PA, USA
- Wharton School of the University of Pennsylvania, Philadelphia, PA, USA
| | - Joanna Paladino
- Ariadne Labs, Brigham and Women's Hospital & Harvard Chan School of Public Health, Boston, MA, USA
| | - Judy A Shea
- Perelman School of Medicine, University of Pennsylvania, 423 Guardian Drive, Blockley 1102, Philadelphia, PA, 19104, USA
| |
Collapse
|
34
|
Devlin JW, Skrobik Y. What language conveys distress and reassurance? Intensive Care Med 2022; 48:599-601. [PMID: 35348819 PMCID: PMC8961086 DOI: 10.1007/s00134-022-06687-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2022] [Indexed: 12/26/2022]
|
35
|
Lee DY, Kim C, Lee S, Son SJ, Cho SM, Cho YH, Lim J, Park RW. Psychosis Relapse Prediction Leveraging Electronic Health Records Data and Natural Language Processing Enrichment Methods. Front Psychiatry 2022; 13:844442. [PMID: 35479497 PMCID: PMC9037331 DOI: 10.3389/fpsyt.2022.844442] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 03/09/2022] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND Identifying patients at a high risk of psychosis relapse is crucial for early interventions. A relevant psychiatric clinical context is often recorded in clinical notes; however, the utilization of unstructured data remains limited. This study aimed to develop psychosis-relapse prediction models using various types of clinical notes and structured data. METHODS Clinical data were extracted from the electronic health records of the Ajou University Medical Center in South Korea. The study population included patients with psychotic disorders, and outcome was psychosis relapse within 1 year. Using only structured data, we developed an initial prediction model, then three natural language processing (NLP)-enriched models using three types of clinical notes (psychological tests, admission notes, and initial nursing assessment) and one complete model. Latent Dirichlet Allocation was used to cluster the clinical context into similar topics. All models applied the least absolute shrinkage and selection operator logistic regression algorithm. We also performed an external validation using another hospital database. RESULTS A total of 330 patients were included, and 62 (18.8%) experienced psychosis relapse. Six predictors were used in the initial model and 10 additional topics from Latent Dirichlet Allocation processing were added in the enriched models. The model derived from all notes showed the highest value of the area under the receiver operating characteristic (AUROC = 0.946) in the internal validation, followed by models based on the psychological test notes, admission notes, initial nursing assessments, and structured data only (0.902, 0.855, 0.798, and 0.784, respectively). The external validation was performed using only the initial nursing assessment note, and the AUROC was 0.616. CONCLUSIONS We developed prediction models for psychosis relapse using the NLP-enrichment method. Models using clinical notes were more effective than models using only structured data, suggesting the importance of unstructured data in psychosis prediction.
Collapse
Affiliation(s)
- Dong Yun Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Chungsoo Kim
- Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| | - Seongwon Lee
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea.,Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| | - Sang Joon Son
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Sun-Mi Cho
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Yong Hyuk Cho
- Department of Psychiatry, Ajou University School of Medicine, Suwon, South Korea
| | - Jaegyun Lim
- Department of Laboratory Medicine, Myongji Hospital, Hanyang University College of Medicine, Goyang, South Korea
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea.,Department of Biomedical Sciences, Ajou University Graduate School of Medicine, Suwon, South Korea
| |
Collapse
|
36
|
Seinen TM, Fridgeirsson EA, Ioannou S, Jeannetot D, John LH, Kors JA, Markus AF, Pera V, Rekkas A, Williams RD, Yang C, van Mulligen EM, Rijnbeek PR. OUP accepted manuscript. J Am Med Inform Assoc 2022; 29:1292-1302. [PMID: 35475536 PMCID: PMC9196702 DOI: 10.1093/jamia/ocac058] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2021] [Revised: 03/06/2022] [Accepted: 04/11/2022] [Indexed: 11/29/2022] Open
Abstract
Objective This systematic review aims to assess how information from unstructured text is used to develop and validate clinical prognostic prediction models. We summarize the prediction problems and methodological landscape and determine whether using text data in addition to more commonly used structured data improves the prediction performance. Materials and Methods We searched Embase, MEDLINE, Web of Science, and Google Scholar to identify studies that developed prognostic prediction models using information extracted from unstructured text in a data-driven manner, published in the period from January 2005 to March 2021. Data items were extracted, analyzed, and a meta-analysis of the model performance was carried out to assess the added value of text to structured-data models. Results We identified 126 studies that described 145 clinical prediction problems. Combining text and structured data improved model performance, compared with using only text or only structured data. In these studies, a wide variety of dense and sparse numeric text representations were combined with both deep learning and more traditional machine learning methods. External validation, public availability, and attention for the explainability of the developed models were limited. Conclusion The use of unstructured text in the development of prognostic prediction models has been found beneficial in addition to structured data in most studies. The text data are source of valuable information for prediction model development and should not be neglected. We suggest a future focus on explainability and external validation of the developed models, promoting robust and trustworthy prediction models in clinical practice.
Collapse
Affiliation(s)
- Tom M Seinen
- Corresponding Author: Tom M. Seinen, MSc, Department of Medical Informatics, Erasmus University Medical Center, Molewaterplein 40, 3015 GD Rotterdam, The Netherlands ()
| | - Egill A Fridgeirsson
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Solomon Ioannou
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Daniel Jeannetot
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Luis H John
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Jan A Kors
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Aniek F Markus
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Victor Pera
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Alexandros Rekkas
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Ross D Williams
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Cynthia Yang
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Erik M van Mulligen
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| | - Peter R Rijnbeek
- Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
| |
Collapse
|
37
|
Affiliation(s)
- P Elliott Miller
- Section of Cardiovascular Medicine Yale School of Medicine New Haven CT
| | - Jacob Jentzer
- Department of Cardiovascular Medicine Mayo Clinic Rochester MN
| | - Jason N Katz
- Division of Cardiovascular Medicine Duke University Durham NC
| |
Collapse
|
38
|
Gensheimer MF, Aggarwal S, Benson KRK, Carter JN, Henry AS, Wood DJ, Soltys SG, Hancock S, Pollom E, Shah NH, Chang DT. Automated model versus treating physician for predicting survival time of patients with metastatic cancer. J Am Med Inform Assoc 2021; 28:1108-1116. [PMID: 33313792 DOI: 10.1093/jamia/ocaa290] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/09/2020] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE Being able to predict a patient's life expectancy can help doctors and patients prioritize treatments and supportive care. For predicting life expectancy, physicians have been shown to outperform traditional models that use only a few predictor variables. It is possible that a machine learning model that uses many predictor variables and diverse data sources from the electronic medical record can improve on physicians' performance. For patients with metastatic cancer, we compared accuracy of life expectancy predictions by the treating physician, a machine learning model, and a traditional model. MATERIALS AND METHODS A machine learning model was trained using 14 600 metastatic cancer patients' data to predict each patient's distribution of survival time. Data sources included note text, laboratory values, and vital signs. From 2015-2016, 899 patients receiving radiotherapy for metastatic cancer were enrolled in a study in which their radiation oncologist estimated life expectancy. Survival predictions were also made by the machine learning model and a traditional model using only performance status. Performance was assessed with area under the curve for 1-year survival and calibration plots. RESULTS The radiotherapy study included 1190 treatment courses in 899 patients. A total of 879 treatment courses in 685 patients were included in this analysis. Median overall survival was 11.7 months. Physicians, machine learning model, and traditional model had area under the curve for 1-year survival of 0.72 (95% CI 0.63-0.81), 0.77 (0.73-0.81), and 0.68 (0.65-0.71), respectively. CONCLUSIONS The machine learning model's predictions were more accurate than those of the treating physician or a traditional model.
Collapse
Affiliation(s)
| | - Sonya Aggarwal
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Kathryn R K Benson
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Justin N Carter
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - A Solomon Henry
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Douglas J Wood
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Scott G Soltys
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Steven Hancock
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Erqi Pollom
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| | - Nigam H Shah
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA
| | - Daniel T Chang
- Department of Radiation Oncology, Stanford University, Stanford, CA, USA
| |
Collapse
|
39
|
McCulloch CE, Neuhaus JM. Improving Predictions When Interest Focuses on Extreme Random Effects. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2021.1938583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Charles E. McCulloch
- Division of Biostatistics, Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
| | - John M. Neuhaus
- Division of Biostatistics, Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
| |
Collapse
|
40
|
Abstract
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
Collapse
Affiliation(s)
- Bethany Percha
- Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA;
| |
Collapse
|
41
|
George N, Moseley E, Eber R, Siu J, Samuel M, Yam J, Huang K, Celi LA, Lindvall C. Deep learning to predict long-term mortality in patients requiring 7 days of mechanical ventilation. PLoS One 2021; 16:e0253443. [PMID: 34185798 PMCID: PMC8241081 DOI: 10.1371/journal.pone.0253443] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 06/06/2021] [Indexed: 01/12/2023] Open
Abstract
Background Among patients with acute respiratory failure requiring prolonged mechanical ventilation, tracheostomies are typically placed after approximately 7 to 10 days. Yet half of patients admitted to the intensive care unit receiving tracheostomy will die within a year, often within three months. Existing mortality prediction models for prolonged mechanical ventilation, such as the ProVent Score, have poor sensitivity and are not applied until after 14 days of mechanical ventilation. We developed a model to predict 3-month mortality in patients requiring more than 7 days of mechanical ventilation using deep learning techniques and compared this to existing mortality models. Methods Retrospective cohort study. Setting: The Medical Information Mart for Intensive Care III Database. Patients: All adults requiring ≥ 7 days of mechanical ventilation. Measurements: A neural network model for 3-month mortality was created using process-of-care variables, including demographic, physiologic and clinical data. The area under the receiver operator curve (AUROC) was compared to the ProVent model at predicting 3 and 12-month mortality. Shapley values were used to identify the variables with the greatest contributions to the model. Results There were 4,334 encounters divided into a development cohort (n = 3467) and a testing cohort (n = 867). The final deep learning model included 250 variables and had an AUROC of 0.74 for predicting 3-month mortality at day 7 of mechanical ventilation versus 0.59 for the ProVent model. Older age and elevated Simplified Acute Physiology Score II (SAPS II) Score on intensive care unit admission had the largest contribution to predicting mortality. Discussion We developed a deep learning prediction model for 3-month mortality among patients requiring ≥ 7 days of mechanical ventilation using a neural network approach utilizing readily available clinical variables. The model outperforms the ProVent model for predicting mortality among patients requiring ≥ 7 days of mechanical ventilation. This model requires external validation.
Collapse
Affiliation(s)
- Naomi George
- Department of Emergency Medicine, Division of Critical Care, University of New Mexico Health Science Center, Albuquerque, New Mexico, United States of America
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- * E-mail:
| | - Edward Moseley
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Rene Eber
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Université de Montpellier, Montpellier, France
| | - Jennifer Siu
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Otolaryngology, Division of Head & Neck Surgery, University of Toronto, Toronto, Canada
| | - Mathew Samuel
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Jonathan Yam
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Kexin Huang
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Leo Anthony Celi
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
| | - Charlotta Lindvall
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
| |
Collapse
|
42
|
Impact of Different Approaches to Preparing Notes for Analysis With Natural Language Processing on the Performance of Prediction Models in Intensive Care. Crit Care Explor 2021; 3:e0450. [PMID: 34136824 PMCID: PMC8202578 DOI: 10.1097/cce.0000000000000450] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Supplemental Digital Content is available in the text. OBJECTIVES: To evaluate whether different approaches in note text preparation (known as preprocessing) can impact machine learning model performance in the case of mortality prediction ICU. DESIGN: Clinical note text was used to build machine learning models for adults admitted to the ICU. Preprocessing strategies studied were none (raw text), cleaning text, stemming, term frequency-inverse document frequency vectorization, and creation of n-grams. Model performance was assessed by the area under the receiver operating characteristic curve. Models were trained and internally validated on University of California San Francisco data using 10-fold cross validation. These models were then externally validated on Beth Israel Deaconess Medical Center data. SETTING: ICUs at University of California San Francisco and Beth Israel Deaconess Medical Center. SUBJECTS: Ten thousand patients in the University of California San Francisco training and internal testing dataset and 27,058 patients in the external validation dataset, Beth Israel Deaconess Medical Center. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Mortality rate at Beth Israel Deaconess Medical Center and University of California San Francisco was 10.9% and 7.4%, respectively. Data are presented as area under the receiver operating characteristic curve (95% CI) for models validated at University of California San Francisco and area under the receiver operating characteristic curve for models validated at Beth Israel Deaconess Medical Center. Models built and trained on University of California San Francisco data for the prediction of inhospital mortality improved from the raw note text model (AUROC, 0.84; CI, 0.80–0.89) to the term frequency-inverse document frequency model (AUROC, 0.89; CI, 0.85–0.94). When applying the models developed at University of California San Francisco to Beth Israel Deaconess Medical Center data, there was a similar increase in model performance from raw note text (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.72) to the term frequency-inverse document frequency model (area under the receiver operating characteristic curve at Beth Israel Deaconess Medical Center: 0.83). CONCLUSIONS: Differences in preprocessing strategies for note text impacted model discrimination. Completing a preprocessing pathway including cleaning, stemming, and term frequency-inverse document frequency vectorization resulted in the preprocessing strategy with the greatest improvement in model performance. Further study is needed, with particular emphasis on how to manage author implicit bias present in note text, before natural language processing algorithms are implemented in the clinical setting.
Collapse
|
43
|
Higgins AM, Neto AS, Bailey M, Barrett J, Bellomo R, Cooper DJ, Gabbe BJ, Linke N, Myles PS, Paton M, Philpot S, Shulman M, Young M, Hodgson CL. Predictors of death and new disability after critical illness: a multicentre prospective cohort study. Intensive Care Med 2021; 47:772-781. [PMID: 34089063 DOI: 10.1007/s00134-021-06438-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 05/15/2021] [Indexed: 11/24/2022]
Abstract
PURPOSE This study aimed to determine the prevalence and predictors of death or new disability following critical illness. METHODS Prospective, multicentre cohort study conducted in six metropolitan intensive care units (ICU). Participants were adults admitted to the ICU who received more than 24 h of mechanical ventilation. The primary outcome was death or new disability at 6 months, with new disability defined by a 10% increase in the WHODAS 2.0. RESULTS Of 628 patients with the primary outcome available (median age of 62 [49-71] years, 379 [61.0%] had a medical admission and 370 (58.9%) died or developed new disability by 6 months. Independent predictors of death or new disability included age [OR 1.02 (1.01-1.03), P = 0.001], higher severity of illness (APACHE III) [OR 1.02 (1.01-1.03), P < 0.001] and admission diagnosis. Compared to patients with a surgical admission diagnosis, patients with a cardiac arrest [OR (95% CI) 4.06 (1.89-8.68), P < 0.001], sepsis [OR (95% CI) 2.43 (1.32-4.47), P = 0.004], or trauma [OR (95% CI) 6.24 (3.07-12.71), P < 0.001] diagnosis had higher odds of death or new disability, while patients with a lung transplant [OR (95% CI) 0.21 (0.07-0.58), P = 0.003] diagnosis had lower odds. A model including these three variables had good calibration (Brier score 0.20) and acceptable discriminative power with an area under the receiver operating characteristic curve of 0.76 (95% CI 0.72-0.80). CONCLUSION Less than half of all patients mechanically ventilated for more than 24 h were alive and free of new disability at 6 months after admission to ICU. A model including age, illness severity and admission diagnosis has acceptable discriminative ability to predict death or new disability at 6 months.
Collapse
Affiliation(s)
- A M Higgins
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia
| | - A Serpa Neto
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia.,Department of Critical Care, The University of Melbourne, Melbourne, VIC, Australia.,Department of Intensive Care, Austin Health, Melbourne, VIC, Australia.,Department of Critical Care Medicine, Hospital Israelita Albert Einstein, Sao Paulo, Brazil
| | - M Bailey
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia.,Department of Critical Care, The University of Melbourne, Melbourne, VIC, Australia
| | - J Barrett
- Intensive Care Unit, Epworth Healthcare, Melbourne, VIC, Australia.,Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC, Australia
| | - R Bellomo
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia.,Department of Critical Care, The University of Melbourne, Melbourne, VIC, Australia.,Department of Intensive Care, Austin Health, Melbourne, VIC, Australia
| | - D J Cooper
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia.,Department of Intensive Care and Hyperbaric Medicine, The Alfred, Melbourne, VIC, Australia
| | - B J Gabbe
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - N Linke
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia
| | - P S Myles
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.,Department of Anaesthesiology and Perioperative Medicine, The Alfred, Melbourne, VIC, Australia
| | - M Paton
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia.,Department of Physiotherapy, Monash Health, Melbourne, VIC, Australia
| | - S Philpot
- Intensive Care Unit, Cabrini Health, Melbourne, VIC, Australia
| | - M Shulman
- School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.,Department of Anaesthesiology and Perioperative Medicine, The Alfred, Melbourne, VIC, Australia
| | - M Young
- Department of Intensive Care and Hyperbaric Medicine, The Alfred, Melbourne, VIC, Australia
| | - C L Hodgson
- Australian and New Zealand Intensive Care Research Centre, School of Public Health and Preventive Medicine, Monash University, 553 St Kilda Rd, Melbourne, VIC, 3004, Australia. .,Department of Intensive Care and Hyperbaric Medicine, The Alfred, Melbourne, VIC, Australia.
| | | |
Collapse
|
44
|
Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen GB. Natural language processing in medicine: A review. TRENDS IN ANAESTHESIA AND CRITICAL CARE 2021. [DOI: 10.1016/j.tacc.2021.02.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
45
|
Ahn JC, Connell A, Simonetto DA, Hughes C, Shah VH. Application of Artificial Intelligence for the Diagnosis and Treatment of Liver Diseases. Hepatology 2021; 73:2546-2563. [PMID: 33098140 DOI: 10.1002/hep.31603] [Citation(s) in RCA: 88] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 09/15/2020] [Accepted: 09/29/2020] [Indexed: 12/11/2022]
Abstract
Modern medical care produces large volumes of multimodal patient data, which many clinicians struggle to process and synthesize into actionable knowledge. In recent years, artificial intelligence (AI) has emerged as an effective tool in this regard. The field of hepatology is no exception, with a growing number of studies published that apply AI techniques to the diagnosis and treatment of liver diseases. These have included machine-learning algorithms (such as regression models, Bayesian networks, and support vector machines) to predict disease progression, the presence of complications, and mortality; deep-learning algorithms to enable rapid, automated interpretation of radiologic and pathologic images; and natural-language processing to extract clinically meaningful concepts from vast quantities of unstructured data in electronic health records. This review article will provide a comprehensive overview of hepatology-focused AI research, discuss some of the barriers to clinical implementation and adoption, and suggest future directions for the field.
Collapse
Affiliation(s)
- Joseph C Ahn
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN
| | | | | | | | - Vijay H Shah
- Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN
| |
Collapse
|
46
|
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2021; 2:156-163. [PMID: 35265904 PMCID: PMC8890044 DOI: 10.1016/j.cvdhj.2021.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Objective Methods Results Conclusion
Collapse
|
47
|
Khambete MP, Su W, Garcia JC, Badgeley MA. Quantification of BERT Diagnosis Generalizability Across Medical Specialties Using Semantic Dataset Distance. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:345-354. [PMID: 34457149 PMCID: PMC8378651] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Deep learning models in healthcare may fail to generalize on data from unseen corpora. Additionally, no quantitative metric exists to tell how existing models will perform on new data. Previous studies demonstrated that NLP models of medical notes generalize variably between institutions, but ignored other levels of healthcare organization. We measured SciBERT diagnosis sentiment classifier generalizability between medical specialties using EHR sentences from MIMIC-III. Models trained on one specialty performed better on internal test sets than mixed or external test sets (mean AUCs 0.92, 0.87, and 0.83, respectively; p = 0.016). When models are trained on more specialties, they have better test performances (p < 1e-4). Model performance on new corpora is directly correlated to the similarity between train and test sentence content (p < 1e-4). Future studies should assess additional axes of generalization to ensure deep learning models fulfil their intended purpose across institutions, specialties, and practices.
Collapse
Affiliation(s)
- Mihir P Khambete
- nference LLC, Cambridge, MA
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA
| | - William Su
- nference LLC, Cambridge, MA
- Department of Radiation Oncology, Penn Medicine, University of Pennsylvania Health System, Philadelphia, PA
| | | | - Marcus A Badgeley
- nference LLC, Cambridge, MA
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA
| |
Collapse
|
48
|
Huang B, Liang D, Zou R, Yu X, Dan G, Huang H, Liu H, Liu Y. Mortality prediction for patients with acute respiratory distress syndrome based on machine learning: a population-based study. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:794. [PMID: 34268407 PMCID: PMC8246239 DOI: 10.21037/atm-20-6624] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/10/2021] [Indexed: 11/06/2022]
Abstract
Background Traditional scoring systems for patients' outcome prediction in intensive care units such as Oxygenation Saturation Index (OSI) and Oxygenation Index (OI) may not reliably predict the clinical prognosis of patients with acute respiratory distress syndrome (ARDS). Thus, none of them have been widely accepted for mortality prediction in ARDS. This study aimed to develop and validate a mortality prediction method for patients with ARDS based on machine learning using the Medical Information Mart for Intensive Care (MIMIC-III) and Telehealth Intensive Care Unit (eICU) Collaborative Research Database (eICU-CRD) databases. Methods Patients with ARDS were selected based on the Berlin definition in MIMIC-III and eICU-CRD databases. The APPS score (using age, PaO2/FiO2, and plateau pressure), Simplified Acute Physiology Score II (SAPS-II), Sepsis-related Organ Failure Assessment (SOFA), OSI, and OI were calculated. With MIMIC-III data, a mortality prediction model was built based on the random forest (RF) algorithm, and the performance was compared to those of existing scoring systems based on logistic regression. The performance of the proposed RF method was also validated with the combined MIMIC-III and eICU-CRD data. The performance of mortality prediction was evaluated by using the area under the receiver operating characteristics curve (AUROC) and performing calibration using the Hosmer-Lemeshow test. Results With the MIMIC-III dataset (308 patients, for comparisons with the existing scoring systems), the RF model predicted the in-hospital mortality, 30-day mortality, and 1-year mortality with an AUROC of 0.891, 0.883, and 0.892, respectively, which were significantly higher than those of the SAPS-II, APPS, OSI, and OI (all P<0.001). In the multi-source validation (the combined dataset of 2,235 patients in MIMIC-III and 331 patients in eICU-CRD), the RF model achieved an AUROC of 0.905 and 0.736 for predicting in-hospital mortality for the MIMIC-III and eICU-CRD datasets, respectively. The calibration plots suggested good fits for our RF model and these scoring systems for predicting mortality. The platelet count and lactate level were the strongest predictive variables for predicting in-hospital mortality. Conclusions Compared to the existing scoring systems, machine learning significantly improved performance for predicting ARDS mortality. Validation with multi-source datasets showed a relatively robust generalisation ability of our prediction model.
Collapse
Affiliation(s)
- Bingsheng Huang
- Medical AI Lab, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China.,Clinical Research Center for Neurological Diseases, Shenzhen University General Hospital, Shenzhen, China
| | - Dong Liang
- Medical AI Lab, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Rushi Zou
- Medical AI Lab, School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Xiaxia Yu
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Guo Dan
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Haofan Huang
- School of Biomedical Engineering, Health Science Center, Shenzhen University, Shenzhen, China
| | - Heng Liu
- Medical Imaging Center of Guizhou Province, Department of Radiology, The Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Yong Liu
- Department of Intensive Care Unit, Shenzhen Hospital, Southern Medical University, Shenzhen, China
| |
Collapse
|
49
|
Sarkar R, Martin C, Mattie H, Gichoya JW, Stone DJ, Celi LA. Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study. LANCET DIGITAL HEALTH 2021; 3:e241-e249. [PMID: 33766288 PMCID: PMC8063502 DOI: 10.1016/s2589-7500(21)00022-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 01/11/2021] [Accepted: 01/29/2021] [Indexed: 12/29/2022]
Abstract
BACKGROUND Despite wide use of severity scoring systems for case-mix determination and benchmarking in the intensive care unit (ICU), the possibility of scoring bias across ethnicities has not been examined. Guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources, such as mechanical ventilation, during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of the severity scoring systems Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa), Oxford Acute Severity of Illness Score (OASIS), and Sequential Organ Failure Assessment (SOFA) across four ethnicities in two large ICU databases to identify possible ethnicity-based bias. METHODS Data from the electronic ICU Collaborative Research Database (eICU-CRD) and the Medical Information Mart for Intensive Care III (MIMIC-III) database, built from patient episodes in the USA from 2014-15 and 2001-12, respectively, were analysed for score performance in Asian, Black, Hispanic, and White people after appropriate exclusions. Hospital mortality was the outcome of interest. Discrimination and calibration were determined for all three scoring systems in all four groups, using area under receiver operating characteristic (AUROC) curve for different ethnicities to assess discrimination, and standardised mortality ratio (SMR) or proxy measures to assess calibration. FINDINGS We analysed 166 751 participants (122 919 eICU-CRD and 43 832 MIMIC-III). Although measurements of discrimination were significantly different among the groups (AUROC ranging from 0·86 to 0·89 [p=0·016] with APACHE IVa and from 0·75 to 0·77 [p=0·85] with OASIS), they did not display any discernible systematic patterns of bias. However, measurements of calibration indicated persistent, and in some cases statistically significant, patterns of difference between Hispanic people (SMR 0·73 with APACHE IVa and 0·64 with OASIS) and Black people (0·67 and 0·68) versus Asian people (0·77 and 0·95) and White people (0·76 and 0·81). Although calibrations were imperfect for all groups, the scores consistently showed a pattern of overpredicting mortality for Black people and Hispanic people. Similar results were seen using SOFA scores across the two databases. INTERPRETATION The systematic differences in calibration across ethnicities suggest that illness severity scores reflect statistical bias in their predictions of mortality. FUNDING There was no specific funding for this study.
Collapse
Affiliation(s)
- Rahuldeb Sarkar
- Department of Respiratory Medicine, Medway NHS Foundation Trust, Gillingham, Kent, UK; Department of Critical Care, Medway NHS Foundation Trust, Gillingham, Kent, UK; Faculty of Life Sciences, King's College London, London, UK
| | - Christopher Martin
- UCL Institute for Health Informatics, London, UK; Crystallise, Essex, UK
| | - Heather Mattie
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA
| | - Judy Wawira Gichoya
- Interventional Radiology and Informatics, Department of Radiology and Imaging Sciences, Emory University, Atlanta, GA, USA
| | - David J Stone
- Department of Anesthesiology, University of Virginia School of Medicine, Charlottesville, VA, USA; Department of Neurosurgery, University of Virginia School of Medicine, Charlottesville, VA, USA; Center for Advanced Medical Analytics, University of Virginia School of Medicine, Charlottesville, VA, USA
| | - Leo Anthony Celi
- Department of Biostatistics, Harvard T H Chan School of Public Health, Boston, MA, USA; Laboratory for Computational Physiology, Massachusetts Institute of Technology, Cambridge, MA, USA; Division of Pulmonary, Critical Care and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
| |
Collapse
|
50
|
Fernandes M, Sun H, Jain A, Alabsi HS, Brenner LN, Ye E, Ge W, Collens SI, Leone MJ, Das S, Robbins GK, Mukerji SS, Westover MB. Classification of the Disposition of Patients Hospitalized with COVID-19: Reading Discharge Summaries Using Natural Language Processing. JMIR Med Inform 2021; 9:e25457. [PMID: 33449908 PMCID: PMC7879729 DOI: 10.2196/25457] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 12/09/2020] [Accepted: 12/12/2020] [Indexed: 01/10/2023] Open
Abstract
Background Medical notes are a rich source of patient data; however, the nature of unstructured text has largely precluded the use of these data for large retrospective analyses. Transforming clinical text into structured data can enable large-scale research studies with electronic health records (EHR) data. Natural language processing (NLP) can be used for text information retrieval, reducing the need for labor-intensive chart review. Here we present an application of NLP to large-scale analysis of medical records at 2 large hospitals for patients hospitalized with COVID-19. Objective Our study goal was to develop an NLP pipeline to classify the discharge disposition (home, inpatient rehabilitation, skilled nursing inpatient facility [SNIF], and death) of patients hospitalized with COVID-19 based on hospital discharge summary notes. Methods Text mining and feature engineering were applied to unstructured text from hospital discharge summaries. The study included patients with COVID-19 discharged from 2 hospitals in the Boston, Massachusetts area (Massachusetts General Hospital and Brigham and Women’s Hospital) between March 10, 2020, and June 30, 2020. The data were divided into a training set (70%) and hold-out test set (30%). Discharge summaries were represented as bags-of-words consisting of single words (unigrams), bigrams, and trigrams. The number of features was reduced during training by excluding n-grams that occurred in fewer than 10% of discharge summaries, and further reduced using least absolute shrinkage and selection operator (LASSO) regularization while training a multiclass logistic regression model. Model performance was evaluated using the hold-out test set. Results The study cohort included 1737 adult patients (median age 61 [SD 18] years; 55% men; 45% White and 16% Black; 14% nonsurvivors and 61% discharged home). The model selected 179 from a vocabulary of 1056 engineered features, consisting of combinations of unigrams, bigrams, and trigrams. The top features contributing most to the classification by the model (for each outcome) were the following: “appointments specialty,” “home health,” and “home care” (home); “intubate” and “ARDS” (inpatient rehabilitation); “service” (SNIF); “brief assessment” and “covid” (death). The model achieved a micro-average area under the receiver operating characteristic curve value of 0.98 (95% CI 0.97-0.98) and average precision of 0.81 (95% CI 0.75-0.84) in the testing set for prediction of discharge disposition. Conclusions A supervised learning–based NLP approach is able to classify the discharge disposition of patients hospitalized with COVID-19. This approach has the potential to accelerate and increase the scale of research on patients’ discharge disposition that is possible with EHR data.
Collapse
Affiliation(s)
- Marta Fernandes
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Haoqi Sun
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Aayushee Jain
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States
| | - Haitham S Alabsi
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Laura N Brenner
- Harvard Medical School, Boston, MA, United States.,Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, MA, United States.,Division of General Internal Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Elissa Ye
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States
| | - Wendong Ge
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Sarah I Collens
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Michael J Leone
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| | - Sudeshna Das
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - Gregory K Robbins
- Harvard Medical School, Boston, MA, United States.,Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, United States
| | - Shibani S Mukerji
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States
| | - M Brandon Westover
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States.,Clinical Data Animation Center, Boston, MA, United States.,Harvard Medical School, Boston, MA, United States.,McCance Center for Brain Health, Massachusetts General Hospital, Boston, MA, United States
| |
Collapse
|