1
|
Du X, Novoa-Laurentiev J, Plasek JM, Chuang YW, Wang L, Marshall GA, Mueller SK, Chang F, Datta S, Paek H, Lin B, Wei Q, Wang X, Wang J, Ding H, Manion FJ, Du J, Bates DW, Zhou L. Enhancing early detection of cognitive decline in the elderly: a comparative study utilizing large language models in clinical notes. EBioMedicine 2024; 109:105401. [PMID: 39396423 DOI: 10.1016/j.ebiom.2024.105401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/28/2024] [Accepted: 09/30/2024] [Indexed: 10/15/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. METHODS This study, conducted at Mass General Brigham in Boston, MA, analysed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We developed prompts for two LLMs, Llama 2 and GPT-4, on Health Insurance Portability and Accountability Act (HIPAA)-compliant cloud-computing platforms using multiple approaches (e.g., hard prompting, retrieval augmented generation, and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Confusion-matrix-based scores were used for model evaluation. FINDINGS We used a randomly annotated sample of 4949 note sections from 1969 patients (women: 1046 [53.1%]; age: mean, 76.0 [SD, 13.3] years), filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1996 note sections from 1161 patients (women: 619 [53.3%]; age: mean, 76.5 [SD, 10.2] years) without keyword filtering was utilised. GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models in terms of all evaluation metrics with statistical significance (p < 0.01), achieving a precision of 90.2% [95% CI: 81.9%-96.8%], a recall of 94.2% [95% CI: 87.9%-98.7%], and an F1-score of 92.1% [95% CI: 86.8%-96.4%]. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. INTERPRETATION LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localised models and incorporating medical data and domain knowledge to enhance performance on specific tasks. FUNDING This research was supported by the National Institute on Aging grants (R44AG081006, R01AG080429) and National Library of Medicine grant (R01LM014239).
Collapse
Affiliation(s)
- Xinsong Du
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Joseph M Plasek
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Ya-Wen Chuang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA; Division of Nephrology, Taichung Veterans General Hospital, Taichung, 407219, Taiwan; Department of Post-Baccalaureate Medicine, College of Medicine, National Chung Hsing University, Taichung, 402202, Taiwan; School of Medicine, College of Medicine, China Medical University, Taichung, 406040, Taiwan
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Gad A Marshall
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA; Department of Neurology, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Stephanie K Mueller
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Frank Chang
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA
| | - Surabhi Datta
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Hunki Paek
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Bin Lin
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Qiang Wei
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Xiaoyan Wang
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Jingqi Wang
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Hao Ding
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Frank J Manion
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - Jingcheng Du
- Intelligent Medical Objects, Rosemont, Illinois, 60018, USA
| | - David W Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, 02115, USA; Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA
| |
Collapse
|
2
|
Wang L, Novoa-Laurentiev J, Cook C, Srivatsan S, Hua Y, Yang J, Miloslavsky E, Choi HK, Zhou L, Wallace ZS. Identification of an ANCA-Associated Vasculitis Cohort Using Deep Learning and Electronic Health Records. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.09.24308603. [PMID: 38946986 PMCID: PMC11213085 DOI: 10.1101/2024.06.09.24308603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Background ANCA-associated vasculitis (AAV) is a rare but serious disease. Traditional case-identification methods using claims data can be time-intensive and may miss important subgroups. We hypothesized that a deep learning model analyzing electronic health records (EHR) can more accurately identify AAV cases. Methods We examined the Mass General Brigham (MGB) repository of clinical documentation from 12/1/1979 to 5/11/2021, using expert-curated keywords and ICD codes to identify a large cohort of potential AAV cases. Three labeled datasets (I, II, III) were created, each containing note sections. We trained and evaluated a range of machine learning and deep learning algorithms for note-level classification, using metrics like positive predictive value (PPV), sensitivity, F-score, area under the receiver operating characteristic curve (AUROC), and area under the precision and recall curve (AUPRC). The deep learning model was further evaluated for its ability to classify AAV cases at the patient-level, compared with rule-based algorithms in 2,000 randomly chosen samples. Results Datasets I, II, and III comprised 6,000, 3,008, and 7,500 note sections, respectively. Deep learning achieved the highest AUROC in all three datasets, with scores of 0.983, 0.991, and 0.991. The deep learning approach also had among the highest PPVs across the three datasets (0.941, 0.954, and 0.800, respectively). In a test cohort of 2,000 cases, the deep learning model achieved a PPV of 0.262 and an estimated sensitivity of 0.975. Compared to the best rule-based algorithm, the deep learning model identified six additional AAV cases, representing 13% of the total. Conclusion The deep learning model effectively classifies clinical note sections for AAV diagnosis. Its application to EHR notes can potentially uncover additional cases missed by traditional rule-based methods.
Collapse
Affiliation(s)
- Liqin Wang
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Claire Cook
- Rheumatology and Allergy Clinical Epidemiology Research Center and Division of Rheumatology, Allergy, and Immunology, and Mongan Institute, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Shruthi Srivatsan
- Rheumatology and Allergy Clinical Epidemiology Research Center and Division of Rheumatology, Allergy, and Immunology, and Mongan Institute, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Yining Hua
- Department of Epidemiology, Harvard T.H. Chan School of Public Health
| | - Jie Yang
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Eli Miloslavsky
- Division of Rheumatology, Allergy, and Immunology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Hyon K. Choi
- Rheumatology and Allergy Clinical Epidemiology Research Center and Division of Rheumatology, Allergy, and Immunology, and Mongan Institute, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA
| | - Zachary S. Wallace
- Rheumatology and Allergy Clinical Epidemiology Research Center and Division of Rheumatology, Allergy, and Immunology, and Mongan Institute, Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
3
|
Lee S, Skains RM, Magidson PD, Qadoura N, Liu SW, Southerland LT. Enhancing healthcare access for an older population: The age-friendly emergency department. J Am Coll Emerg Physicians Open 2024; 5:e13182. [PMID: 38726466 PMCID: PMC11079440 DOI: 10.1002/emp2.13182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Revised: 12/29/2023] [Accepted: 01/24/2024] [Indexed: 05/12/2024] Open
Abstract
Healthcare systems face significant challenges in meeting the unique needs of older adults, particularly in the acute setting. Age-friendly healthcare is a comprehensive approach using the 4Ms framework-what matters, medications, mentation, and mobility-to ensure that healthcare settings are responsive to the needs of older patients. The Age-Friendly Emergency Department (AFED) is a crucial component of a holistic age-friendly health system. Our objective is to provide an overview of the AFED model, its core principles, and the benefits to older adults and healthcare clinicians. The AFED optimizes the delivery of emergency care by integrating age-specific considerations into various aspects of (1) ED physical infrastructure, (2) clinical care policies, and (3) care transitions. Physical infrastructure incorporates environmental modifications to enhance patient safety, including adequate lighting, nonslip flooring, and devices for sensory and ambulatory impairment. Clinical care policies address the physiological, cognitive, and psychosocial needs of older adults while preserving focus on emergency issues. Care transitions include communication and involving community partners and case management services. The AFED prioritizes collaboration between interdisciplinary team members (ED clinicians, geriatric specialists, nurses, physical/occupational therapists, and social workers). By adopting an age-friendly approach, EDs have the potential to improve patient-centered outcomes, reduce adverse events and hospitalizations, and enhance functional recovery. Moreover, healthcare clinicians benefit from the AFED model through increased satisfaction, multidisciplinary support, and enhanced training in geriatric care. Policymakers, healthcare administrators, and clinicians must collaborate to standardize guidelines, address barriers to AFEDs, and promote the adoption of age-friendly practices in the ED.
Collapse
Affiliation(s)
- Sangil Lee
- Department of Emergency MedicineUniversity of Iowa Carver College of MedicineIowa CityIowaUSA
| | - Rachel M. Skains
- University of Alabama at BirminghamBirminghamAlabamaUSA
- Geriatric Research, Education, and Clinical CenterBirmingham VA Medical CenterBirminghamAlabamaUSA
| | | | - Nadine Qadoura
- Department of Emergency MedicineUniversity of Iowa Carver College of MedicineIowa CityIowaUSA
| | - Shan W. Liu
- Massachusetts General Hospital, Harvard Medical SchoolBostonMassachusettsUSA
| | | |
Collapse
|
4
|
Du X, Novoa-Laurentiev J, Plasaek JM, Chuang YW, Wang L, Marshall G, Mueller SK, Chang F, Datta S, Paek H, Lin B, Wei Q, Wang X, Wang J, Ding H, Manion FJ, Du J, Bates DW, Zhou L. Enhancing Early Detection of Cognitive Decline in the Elderly: A Comparative Study Utilizing Large Language Models in Clinical Notes. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.03.24305298. [PMID: 38633810 PMCID: PMC11023645 DOI: 10.1101/2024.04.03.24305298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2024]
Abstract
Background Large language models (LLMs) have shown promising performance in various healthcare domains, but their effectiveness in identifying specific clinical conditions in real medical records is less explored. This study evaluates LLMs for detecting signs of cognitive decline in real electronic health record (EHR) clinical notes, comparing their error profiles with traditional models. The insights gained will inform strategies for performance enhancement. Methods This study, conducted at Mass General Brigham in Boston, MA, analyzed clinical notes from the four years prior to a 2019 diagnosis of mild cognitive impairment in patients aged 50 and older. We used a randomly annotated sample of 4,949 note sections, filtered with keywords related to cognitive functions, for model development. For testing, a random annotated sample of 1,996 note sections without keyword filtering was utilized. We developed prompts for two LLMs, Llama 2 and GPT-4, on HIPAA-compliant cloud-computing platforms using multiple approaches (e.g., both hard and soft prompting and error analysis-based instructions) to select the optimal LLM-based method. Baseline models included a hierarchical attention-based neural network and XGBoost. Subsequently, we constructed an ensemble of the three models using a majority vote approach. Results GPT-4 demonstrated superior accuracy and efficiency compared to Llama 2, but did not outperform traditional models. The ensemble model outperformed the individual models, achieving a precision of 90.3%, a recall of 94.2%, and an F1-score of 92.2%. Notably, the ensemble model showed a significant improvement in precision, increasing from a range of 70%-79% to above 90%, compared to the best-performing single model. Error analysis revealed that 63 samples were incorrectly predicted by at least one model; however, only 2 cases (3.2%) were mutual errors across all models, indicating diverse error profiles among them. Conclusions LLMs and traditional machine learning models trained using local EHR data exhibited diverse error profiles. The ensemble of these models was found to be complementary, enhancing diagnostic performance. Future research should investigate integrating LLMs with smaller, localized models and incorporating medical data and domain knowledge to enhance performance on specific tasks.
Collapse
Affiliation(s)
- Xinsong Du
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - John Novoa-Laurentiev
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
| | - Joseph M. Plasaek
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Ya-Wen Chuang
- Division of Nephrology, Taichung Veterans General Hospital, Taichung, Taiwan, 407219
| | - Liqin Wang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Gad Marshall
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
- Department of Neurology, Brigham and Women’s Hospital, Boston, Massachusetts 02115
| | - Stephanie K. Mueller
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Frank Chang
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
| | - Surabhi Datta
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Hunki Paek
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Bin Lin
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Qiang Wei
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Xiaoyan Wang
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Jingqi Wang
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - Hao Ding
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | | | - Jingcheng Du
- Intelligent Medical Objects, Rosemont, Illinois, 60018
| | - David W. Bates
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| | - Li Zhou
- Division of General Internal Medicine and Primary Care, Brigham and Women’s Hospital, Boston, Massachusetts 02115
- Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115
| |
Collapse
|
5
|
Haimovich AD, Shah MN, Southerland LT, Hwang U, Patterson BW. Automating risk stratification for geriatric syndromes in the emergency department. J Am Geriatr Soc 2024; 72:258-267. [PMID: 37811698 PMCID: PMC10866303 DOI: 10.1111/jgs.18594] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 08/11/2023] [Accepted: 08/19/2023] [Indexed: 10/10/2023]
Abstract
BACKGROUND Geriatric emergency department (GED) guidelines endorse screening older patients for geriatric syndromes in the ED, but there have been significant barriers to widespread implementation. The majority of screening programs require engagement of a clinician, nurse, or social worker, adding to already significant workloads at a time of record-breaking ED patient volumes, staff shortages, and hospital boarding crises. Automated, electronic health record (EHR)-embedded risk stratification approaches may be an alternate solution for extending the reach of the GED mission by directing human actions to a smaller subset of higher risk patients. METHODS We define the concept of automated risk stratification and screening using existing EHR data. We discuss progress made in three potential use cases in the ED: falls, cognitive impairment, and end-of-life and palliative care, emphasizing the importance of linking automated screening with systems of healthcare delivery. RESULTS Research progress and operational deployment vary by use case, ranging from deployed solutions in falls screening to algorithmic validation in cognitive impairment and end-of-life care. CONCLUSIONS Automated risk stratification offers a potential solution to one of the most pressing problems in geriatric emergency care: identifying high-risk populations of older adults most appropriate for specific GED care. Future work is needed to realize the promise of improved care with less provider burden by creating tools suitable for widespread deployment as well as best practices for their implementation and governance.
Collapse
Affiliation(s)
- Adrian D Haimovich
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts, USA
| | - Manish N Shah
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA
| | - Lauren T Southerland
- Department of Emergency Medicine, The Ohio State University Wexner Medical Center, Columbus, Ohio, USA
| | - Ula Hwang
- Geriatric Research, Education and Clinical Center, James J. Peters VAMC, Bronx, New York, USA
- Department of Emergency Medicine, Yale School of Medicine, New Haven, Connecticut, USA
| | - Brian W Patterson
- BerbeeWalsh Department of Emergency Medicine, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, USA
- Department of Industrial and Systems Engineering, Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
6
|
Li R, Wang X, Yu H. Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. PROCEEDINGS OF THE CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING. CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING 2023; 2023:7129-7143. [PMID: 38213944 PMCID: PMC10782150 DOI: 10.18653/v1/2023.findings-emnlp.474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2024]
Abstract
Large language models (LLMs) can generate natural language texts for various domains and tasks, but their potential for clinical text mining, a domain with scarce, sensitive, and imbalanced medical data, is under-explored. We investigate whether LLMs can augment clinical data for detecting Alzheimer's Disease (AD)-related signs and symptoms from electronic health records (EHRs), a challenging task that requires high expertise. We create a novel pragmatic taxonomy for AD sign and symptom progression based on expert knowledge and generated three datasets: (1) a gold dataset annotated by human experts on longitudinal EHRs of AD patients; (2) a silver dataset created by the data-to-label method, which labels sentences from a public EHR collection with AD-related signs and symptoms; and (3) a bronze dataset created by the label-to-data method which generates sentences with AD-related signs and symptoms based on the label definition. We train a system to detect AD-related signs and symptoms from EHRs. We find that the silver and bronze datasets improves the system performance, outperforming the system using only the gold dataset. This shows that LLMs can generate synthetic clinical data for a complex task by incorporating expert knowledge, and our label-to-data method can produce datasets that are free of sensitive information, while maintaining acceptable quality.
Collapse
Affiliation(s)
- Rumeng Li
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
| | | | - Hong Yu
- Umass Amherst, Amherst, MA, USA
- VA Bedford Healthcare System, Bedford, MA, USA
- Umass Lowell, Lowell, MA, USA
| |
Collapse
|
7
|
Pavon JM, Previll L, Woo M, Henao R, Solomon M, Rogers U, Olson A, Fischer J, Leo C, Fillenbaum G, Hoenig H, Casarett D. Machine learning functional impairment classification with electronic health record data. J Am Geriatr Soc 2023; 71:2822-2833. [PMID: 37195174 PMCID: PMC10524844 DOI: 10.1111/jgs.18383] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 03/16/2023] [Accepted: 03/19/2023] [Indexed: 05/18/2023]
Abstract
BACKGROUND Poor functional status is a key marker of morbidity, yet is not routinely captured in clinical encounters. We developed and evaluated the accuracy of a machine learning algorithm that leveraged electronic health record (EHR) data to provide a scalable process for identification of functional impairment. METHODS We identified a cohort of patients with an electronically captured screening measure of functional status (Older Americans Resources and Services ADL/IADL) between 2018 and 2020 (N = 6484). Patients were classified using unsupervised learning K means and t-distributed Stochastic Neighbor Embedding into normal function (NF), mild to moderate functional impairment (MFI), and severe functional impairment (SFI) states. Using 11 EHR clinical variable domains (832 variable input features), we trained an Extreme Gradient Boosting supervised machine learning algorithm to distinguish functional status states, and measured prediction accuracies. Data were randomly split into training (80%) and test (20%) sets. The SHapley Additive Explanations (SHAP) feature importance analysis was used to list the EHR features in rank order of their contribution to the outcome. RESULTS Median age was 75.3 years, 62% female, 60% White. Patients were classified as 53% NF (n = 3453), 30% MFI (n = 1947), and 17% SFI (n = 1084). Summary of model performance for identifying functional status state (NF, MFI, SFI) was AUROC (area under the receiving operating characteristic curve) 0.92, 0.89, and 0.87, respectively. Age, falls, hospitalization, home health use, labs (e.g., albumin), comorbidities (e.g., dementia, heart failure, chronic kidney disease, chronic pain), and social determinants of health (e.g., alcohol use) were highly ranked features in predicting functional status states. CONCLUSION A machine learning algorithm run on EHR clinical data has potential utility for differentiating functional status in the clinical setting. Through further validation and refinement, such algorithms can complement traditional screening methods and result in a population-based strategy for identifying patients with poor functional status who need additional health resources.
Collapse
Affiliation(s)
- Juliessa M Pavon
- Department of Medicine/Division of Geriatrics, Duke University, Durham, North Carolina, USA
- Geriatric Research Education Clinical Center, Durham Veteran Affairs Health Care System, Durham, North Carolina, USA
- Claude D. Pepper Center, Duke University, Durham, North Carolina, USA
- Center for the Study of Aging and Human Development, Duke University, Durham, North Carolina, USA
| | - Laura Previll
- Department of Medicine/Division of Geriatrics, Duke University, Durham, North Carolina, USA
- Geriatric Research Education Clinical Center, Durham Veteran Affairs Health Care System, Durham, North Carolina, USA
- Center for the Study of Aging and Human Development, Duke University, Durham, North Carolina, USA
| | - Myung Woo
- AI Health, Duke University, Durham, North Carolina, USA
- Department of Medicine/Division of General Internal Medicine/Hospital Medicine, Duke University, Durham, North Carolina, USA
| | - Ricardo Henao
- AI Health, Duke University, Durham, North Carolina, USA
| | - Mary Solomon
- AI Health, Duke University, Durham, North Carolina, USA
| | - Ursula Rogers
- AI Health, Duke University, Durham, North Carolina, USA
| | - Andrew Olson
- AI Health, Duke University, Durham, North Carolina, USA
| | - Jonathan Fischer
- Department of Community and Family Medicine, Duke University, Durham, North Carolina, USA
| | - Christopher Leo
- Department of Medicine/Division of Geriatrics, Duke University, Durham, North Carolina, USA
- Department of Medicine/Division of General Internal Medicine/Hospital Medicine, Duke University, Durham, North Carolina, USA
| | - Gerda Fillenbaum
- Claude D. Pepper Center, Duke University, Durham, North Carolina, USA
- Center for the Study of Aging and Human Development, Duke University, Durham, North Carolina, USA
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
| | - Helen Hoenig
- Department of Medicine/Division of Geriatrics, Duke University, Durham, North Carolina, USA
- Geriatric Research Education Clinical Center, Durham Veteran Affairs Health Care System, Durham, North Carolina, USA
- Claude D. Pepper Center, Duke University, Durham, North Carolina, USA
- Center for the Study of Aging and Human Development, Duke University, Durham, North Carolina, USA
- Physical Medicine & Rehabilitation Service, Durham Veteran Affairs Health Care System, Durham, North Carolina, USA
| | - David Casarett
- Department of Medicine/Division of General Internal Medicine/Palliative Care, Duke University, Durham, North Carolina, USA
| |
Collapse
|
8
|
Taylor B, Barboi C, Boustani M. Passive digital markers for Alzheimer's disease and other related dementias: A systematic evidence review. J Am Geriatr Soc 2023; 71:2966-2974. [PMID: 37249252 DOI: 10.1111/jgs.18426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 04/12/2023] [Accepted: 04/30/2023] [Indexed: 05/31/2023]
Abstract
BACKGROUND The timely detection of Alzheimer's disease and other related dementias (ADRD) is suboptimal. Digital data already stored in electronic health records (EHR) offer opportunities for enhancing the timely detection of ADRD by facilitating the development of passive digital markers (PDMs). We conducted a systematic evidence review to identify studies that describe the development, performance, and validity of EHR-based PDMs for ADRD. METHODS We searched the literature published from January 2000 to August 2022 and reviewed cross-sectional, retrospective, or prospective observational studies with a patient population of 18 years or older, published in English that collected and interpreted original data, included EHR as a source of digital data, and had the primary purpose of supporting ADRD care. We extracted relevant data from the included studies with guidance from the Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies checklist and used the US Preventive Services Task Force criteria to appraise each study. RESULTS We included and appraised 19 studies. Four studies were considered to have a fair quality, and none was considered to have a good quality. The functionality of the PDMs varied from detecting mild cognitive impairment, Alzheimer's disease or ADRD, to forecasting stages of ADRD. Only seven studies used a valid reference diagnostic method. Nine PDMs used only structured EHR data, and five studies provided complete information on the race and ethnicity of its population. The number of features included in the PDMs ranges from 10 to 853, and the PMDs used a variety of statistical and machine learning algorithms with various time-at-risk windows. The area under the curve (AUC) for the PDMs varied from 0.67 to 0.97. CONCLUSION Although we noted heterogeneity in the PDMs development and performance, there is evidence that these PDMs have the potential to detect ADRD at earlier stages.
Collapse
Affiliation(s)
- Britain Taylor
- Department of Intelligent Systems Engineering, School of Informatics, Computing, and Engineering. Indiana University, Bloomington, Indiana, USA
| | - Cristina Barboi
- Department of Epidemiology, School of Public Health. Indiana University, Indianapolis, Indiana, USA
| | - Malaz Boustani
- Center for Health Innovation and Implementation Science, Department of Medicine, School of Medicine, Indiana University, Indianapolis, Indiana, USA
| |
Collapse
|
9
|
Mao C, Xu J, Rasmussen L, Li Y, Adekkanattu P, Pacheco J, Bonakdarpour B, Vassar R, Shen L, Jiang G, Wang F, Pathak J, Luo Y. AD-BERT: Using pre-trained language model to predict the progression from mild cognitive impairment to Alzheimer's disease. J Biomed Inform 2023; 144:104442. [PMID: 37429512 PMCID: PMC11131134 DOI: 10.1016/j.jbi.2023.104442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 06/13/2023] [Accepted: 07/07/2023] [Indexed: 07/12/2023]
Abstract
OBJECTIVE We develop a deep learning framework based on the pre-trained Bidirectional Encoder Representations from Transformers (BERT) model using unstructured clinical notes from electronic health records (EHRs) to predict the risk of disease progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD). METHODS We identified 3657 patients diagnosed with MCI together with their progress notes from Northwestern Medicine Enterprise Data Warehouse (NMEDW) between 2000 and 2020. The progress notes no later than the first MCI diagnosis were used for the prediction. We first preprocessed the notes by deidentification, cleaning and splitting into sections, and then pre-trained a BERT model for AD (named AD-BERT) based on the publicly available Bio+Clinical BERT on the preprocessed notes. All sections of a patient were embedded into a vector representation by AD-BERT and then combined by global MaxPooling and a fully connected network to compute the probability of MCI-to-AD progression. For validation, we conducted a similar set of experiments on 2563 MCI patients identified at Weill Cornell Medicine (WCM) during the same timeframe. RESULTS Compared with the 7 baseline models, the AD-BERT model achieved the best performance on both datasets, with Area Under receiver operating characteristic Curve (AUC) of 0.849 and F1 score of 0.440 on NMEDW dataset, and AUC of 0.883 and F1 score of 0.680 on WCM dataset. CONCLUSION The use of EHRs for AD-related research is promising, and AD-BERT shows superior predictive performance in modeling MCI-to-AD progression prediction. Our study demonstrates the utility of pre-trained language models and clinical notes in predicting MCI-to-AD progression, which could have important implications for improving early detection and intervention for AD.
Collapse
Affiliation(s)
- Chengsheng Mao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Jie Xu
- Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, United States; Weill Cornell Medicine, New York, NY, United States
| | - Luke Rasmussen
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Yikuan Li
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | | | - Jennifer Pacheco
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Borna Bonakdarpour
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Robert Vassar
- Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States
| | - Li Shen
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, United States
| | | | - Fei Wang
- Weill Cornell Medicine, New York, NY, United States
| | | | - Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, United States.
| |
Collapse
|
10
|
Khalid SI, Chilakapati S, Mirpuri P, Eldridge C, Burton M, Adogwa O. The Impact of Cognitive Impairment on Postoperative Complications After Spinal Surgery: A Matched Analysis. World Neurosurg 2023; 171:e172-e185. [PMID: 36574568 DOI: 10.1016/j.wneu.2022.11.114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/25/2022] [Indexed: 11/29/2022]
Abstract
OBJECTIVE The coprevalence of age-related comorbidities such as cognitive impairment and spinal disorders is increasing. No studies to date have assessed the postoperative spine surgery outcomes of patients with mild cognitive impairment (MCI) or severe cognitive impairment (dementia) compared with those without preexisting cognitive impairment. METHODS Using all-payer claims database, 235,123 persons undergoing either cervical or lumbar spine procedures between January 2010 and October 2020 were identified. Exact 1:1:1 matching based on baseline patient demographics and comorbidities was used to create a dementia group, MCI group, and control group without MCI/dementia (n = 3636). The primary outcome was the rate of any 30-day major postoperative complications. Secondary outcomes included the rates of revision surgery, readmission rates within 30 days, and health care costs within 1 year postoperatively. RESULTS Compared with the control group, patients with dementia had an 8-fold and 5.4-fold increase in all-cause 30-day complications after undergoing cervical and lumbar spine procedures, respectively. Similarly, patients with MCI had a 3.1-fold and 2.2-fold increase in all-cause 30-day complications, respectively. Patients with either MCI or dementia had increased rates of pneumonia and urinary tract infection after either spine procedure compared with control (P < 0.01). Odds of revision surgery were increased in the lumbar surgery cohort for dementia (3.43; 95% confidence interval, 1.69-6.95) and for MCI (2.41; 95% confidence interval, 1.14-5.05). CONCLUSIONS This is the first study to characterize the postoperative complications profile of patients with preexisting dementia or MCI undergoing cervical and lumbar spine surgery. Both dementia and MCI are associated with increased postoperative complications within 30 days.
Collapse
Affiliation(s)
- Syed I Khalid
- Department of Neurosurgery, University of Illinois at Chicago, Chicago, Illinois, USA.
| | - Sai Chilakapati
- Department of Neurosurgery, University of Texas Southwestern, Dallas, Texas, USA
| | - Pranav Mirpuri
- Chicago Medical School, Rosalind Franklin University of Medicine and Science, North Chicago, Illinois, USA
| | - Cody Eldridge
- Department of Neurosurgery, University of Texas Southwestern, Dallas, Texas, USA
| | - Michael Burton
- Department of Neuroscience, University of Texas at Dallas, Richardson, Texas, USA
| | - Owoicho Adogwa
- Department of Neurosurgery, University of Cincinnati, Cincinnati, Ohio, USA
| |
Collapse
|
11
|
Ball DE, Mattke S, Frank L, Murray JF, Noritake R, MacLeod T, Benham‐Hermetz S, Kurzman A, Ferrell P. A framework for addressing Alzheimer's disease: Without a frame, the work has no aim. Alzheimers Dement 2022; 19:1568-1578. [PMID: 36478657 DOI: 10.1002/alz.12869] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/07/2022] [Accepted: 10/17/2022] [Indexed: 12/13/2022]
Abstract
Confronting Alzheimer's disease (AD) involves patients, healthcare professionals, supportive services, caregivers, and government agencies interacting along a continuum from initial awareness to diagnosis, treatment, support, and care. This complex scope presents a challenge for health system transformation supporting individuals at risk for, or diagnosed with, AD. The AD systems preparedness framework was developed to help health systems identify specific opportunities to implement and evaluate focused improvement programs. The framework is purposely flexible to permit local adaptation across different health systems and countries. Health systems can develop solutions tailored to system-specific priorities considered within the context of the overall framework. Example metric concepts and initiatives are provided for each of ten areas of focus. Examples of funded projects focusing on screening and early detection are provided. It is our hope that stakeholders utilize the common framework to generate and share additional implementation evidence to benefit individuals with AD.
Collapse
Affiliation(s)
- Daniel E. Ball
- Davos Alzheimer's Collaborative Philadelphia Pennsylvania USA
| | - Soeren Mattke
- Center for Economic and Social Research University of Southern California Los Angeles California USA
| | - Lori Frank
- The New York Academy of Medicine New York New York USA
| | - James F. Murray
- Davos Alzheimer's Collaborative Philadelphia Pennsylvania USA
| | - Ryoji Noritake
- Health and Global Policy Institute, Grand Cube 3F, Otemachi Financial City Global Business Hub Tokyo Tokyo Japan
| | - Timothy MacLeod
- Davos Alzheimer's Collaborative Philadelphia Pennsylvania USA
- Bridgeable Toronto Ontario USA
| | | | - Alissa Kurzman
- Davos Alzheimer's Collaborative Philadelphia Pennsylvania USA
- High Lantern Group Philadelphia Pennsylvania USA
- World Economic Forum New York New York USA
| | - Phyllis Ferrell
- Davos Alzheimer's Collaborative Philadelphia Pennsylvania USA
- Eli Lilly and Company Lilly Corporate Center Indianapolis Indiana USA
| |
Collapse
|
12
|
Development of a Deep Learning Model for Malignant Small Bowel Tumors Survival: A SEER-Based Study. Diagnostics (Basel) 2022; 12:diagnostics12051247. [PMID: 35626403 PMCID: PMC9141623 DOI: 10.3390/diagnostics12051247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 05/11/2022] [Accepted: 05/12/2022] [Indexed: 11/29/2022] Open
Abstract
Background This study aims to explore a deep learning (DL) algorithm for developing a prognostic model and perform survival analyses in SBT patients. Methods The demographic and clinical features of patients with SBTs were extracted from the Surveillance, Epidemiology and End Results (SEER) database. We randomly split the samples into the training set and the validation set at 7:3. Cox proportional hazards (Cox-PH) analysis and the DeepSurv algorithm were used to develop models. The performance of the Cox-PH and DeepSurv models was evaluated using receiver operating characteristic curves, calibration curves, C-statistics and decision-curve analysis (DCA). A Kaplan−Meier (K−M) survival analysis was performed for further explanation on prognostic effect of the Cox-PH model. Results The multivariate analysis demonstrated that seven variables were associated with cancer-specific survival (CSS) (all p < 0.05). The DeepSurv model showed better performance than the Cox-PH model (C-index: 0.871 vs. 0.866). The calibration curves and DCA revealed that the two models had good discrimination and calibration. Moreover, patients with ileac malignancy and N2 stage disease were not responding to surgery according to the K−M analysis. Conclusions This study reported a DeepSurv model that performed well in CSS in SBT patients. It might offer insights into future research to explore more DL algorithms in cohort studies.
Collapse
|