1
|
Ryvicker M, Barrón Y, Song J, Zolnoori M, Shah S, Burgdorf J, Noble JM, Topaz M. Using Natural Language Processing to Identify Home Health Care Patients at Risk for Diagnosis of Alzheimer's Disease and Related Dementias. J Appl Gerontol 2024; 43:1461-1472. [PMID: 38556756 PMCID: PMC11368608 DOI: 10.1177/07334648241242321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/02/2024] Open
Abstract
This study aimed to: (1) validate a natural language processing (NLP) system developed for the home health care setting to identify signs and symptoms of Alzheimer's disease and related dementias (ADRD) documented in clinicians' free-text notes; (2) determine whether signs and symptoms detected via NLP help to identify patients at risk of a new ADRD diagnosis within four years after admission. This study applied NLP to a longitudinal dataset including medical record and Medicare claims data for 56,652 home health care patients and Cox proportional hazard models to the subset of 24,874 patients admitted without an ADRD diagnosis. Selected ADRD signs and symptoms were associated with increased risk of a new ADRD diagnosis during follow-up, including: motor issues; hoarding/cluttering; uncooperative behavior; delusions or hallucinations; mention of ADRD disease names; and caregiver stress. NLP can help to identify patients in need of ADRD-related evaluation and support services.
Collapse
Affiliation(s)
| | | | - Jiyoun Song
- University of Pennsylvania School of Nursing
| | | | - Shivani Shah
- Center for Home Care Policy & Research at VNS Health
| | | | | | - Maxim Topaz
- Center for Home Care Policy & Research at VNS Health
- Columbia University Medical Center
| |
Collapse
|
2
|
Prakash R, Dupre ME, Østbye T, Xu H. Extracting Critical Information from Unstructured Clinicians' Notes Data to Identify Dementia Severity Using a Rule-Based Approach: Feasibility Study. JMIR Aging 2024; 7:e57926. [PMID: 39316421 PMCID: PMC11462099 DOI: 10.2196/57926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 07/08/2024] [Accepted: 07/24/2024] [Indexed: 09/25/2024] Open
Abstract
BACKGROUND The severity of Alzheimer disease and related dementias (ADRD) is rarely documented in structured data fields in electronic health records (EHRs). Although this information is important for clinical monitoring and decision-making, it is often undocumented or "hidden" in unstructured text fields and not readily available for clinicians to act upon. OBJECTIVE We aimed to assess the feasibility and potential bias in using keywords and rule-based matching for obtaining information about the severity of ADRD from EHR data. METHODS We used EHR data from a large academic health care system that included patients with a primary discharge diagnosis of ADRD based on ICD-9 (International Classification of Diseases, Ninth Revision) and ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes between 2014 and 2019. We first assessed the presence of ADRD severity information and then the severity of ADRD in the EHR. Clinicians' notes were used to determine the severity of ADRD based on two criteria: (1) scores from the Mini Mental State Examination and Montreal Cognitive Assessment and (2) explicit terms for ADRD severity (eg, "mild dementia" and "advanced Alzheimer disease"). We compiled a list of common ADRD symptoms, cognitive test names, and disease severity terms, refining it iteratively based on previous literature and clinical expertise. Subsequently, we used rule-based matching in Python using standard open-source data analysis libraries to identify the context in which specific words or phrases were mentioned. We estimated the prevalence of documented ADRD severity and assessed the performance of our rule-based algorithm. RESULTS We included 9115 eligible patients with over 65,000 notes from the providers. Overall, 22.93% (2090/9115) of patients were documented with mild ADRD, 20.87% (1902/9115) were documented with moderate or severe ADRD, and 56.20% (5123/9115) did not have any documentation of the severity of their ADRD. For the task of determining the presence of any ADRD severity information, our algorithm achieved an accuracy of >95%, specificity of >95%, sensitivity of >90%, and an F1-score of >83%. For the specific task of identifying the actual severity of ADRD, the algorithm performed well with an accuracy of >91%, specificity of >80%, sensitivity of >88%, and F1-score of >92%. Comparing patients with mild ADRD to those with more advanced ADRD, the latter group tended to contain older, more likely female, and Black patients, and having received their diagnoses in primary care or in-hospital settings. Relative to patients with undocumented ADRD severity, those with documented ADRD severity had a similar distribution in terms of sex, race, and rural or urban residence. CONCLUSIONS Our study demonstrates the feasibility of using a rule-based matching algorithm to identify ADRD severity from unstructured EHR report data. However, it is essential to acknowledge potential biases arising from differences in documentation practices across various health care systems.
Collapse
Affiliation(s)
- Ravi Prakash
- Thomas Lord Department of Mechanical Engineering and Materials Science, Pratt School of Engineering, Duke University, Durham, NC, United States
| | - Matthew E Dupre
- Department of Population Health Sciences, School of Medicine, Duke University, Durham, NC, United States
- Department of Sociology, Trinity College of Arts & Sciences, Duke University, Durham, NC, United States
| | - Truls Østbye
- Department of Population Health Sciences, School of Medicine, Duke University, Durham, NC, United States
- Department of Family Medicine and Community Health, School of Medicine, Duke Univeristy, Durham, NC, United States
| | - Hanzhang Xu
- Department of Family Medicine and Community Health, School of Medicine, Duke Univeristy, Durham, NC, United States
- School of Nursing, Duke University, Durham, NC, United States
- Center for the Study of Aging and Human Development, Duke University, Durham, NC, United States
- Health Services and Systems Research (HSSR), Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
3
|
Wei R. Automated Medical Records Review for Mild Cognitive Impairment and Dementia. RESEARCH SQUARE 2024:rs.3.rs-5046441. [PMID: 39315274 PMCID: PMC11419186 DOI: 10.21203/rs.3.rs-5046441/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Objectives Unstructured and structured data in electronic health records (EHR) are a rich source of information for research and quality improvement studies. However, extracting accurate information from EHR is labor-intensive. Here we introduce an automated EHR phenotyping model to identify patients with Alzheimer's Disease, related dementias (ADRD), or mild cognitive impairment (MCI). Methods We assembled medical notes and associated International Classification of Diseases (ICD) codes and medication prescriptions from 3,626 outpatient adults from two hospitals seen between February 2015 and June 2022. Ground truth annotations regarding the presence vs. absence of a diagnosis of MCI or ADRD were determined through manual chart review. Indicators extracted from notes included the presence of keywords and phrases in unstructured clinical notes, prescriptions of medications associated with MCI/ADRD, and ICD codes associated with MCI/ADRD. We trained a regularized logistic regression model to predict the ground truth annotations. Model performance was evaluated using area under the receiver operating curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, specificity, precision/positive predictive value, recall/sensitivity, and F1 score (harmonic mean of precision and recall). Results Thirty percent of patients in the cohort carried diagnoses of MCI/ADRD based on manual review. When evaluated on a held-out test set, the best model using clinical notes, ICDs, and medications, achieved an AUROC of 0.98, an AUPRC of 0.98, an accuracy of 0.93, a sensitivity (recall) of 0.91, a specificity of 0.96, a precision of 0.96, and an F1 score of 0.93 The estimated overall accuracy for patients randomly selected from EHRs was 99.88%. Conclusion Automated EHR phenotyping accurately identifies patients with MCI/ADRD based on clinical notes, ICD codes, and medication records. This approach holds potential for large-scale MCI/ADRD research utilizing EHR databases.
Collapse
|
4
|
Zhou J, Liu W, Zhou H, Lau KK, Wong GH, Chan WC, Zhang Q, Knapp M, Wong IC, Luo H. Identifying dementia from cognitive footprints in hospital records among Chinese older adults: a machine-learning study. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2024; 46:101060. [PMID: 38638410 PMCID: PMC11025003 DOI: 10.1016/j.lanwpc.2024.101060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/09/2024] [Accepted: 03/25/2024] [Indexed: 04/20/2024]
Abstract
Background By combining theory-driven and data-driven methods, this study aimed to develop dementia predictive algorithms among Chinese older adults guided by the cognitive footprint theory. Methods Electronic medical records from the Clinical Data Analysis and Reporting System in Hong Kong were employed. We included patients with dementia diagnosed at 65+ between 2010 and 2018, and 1:1 matched dementia-free controls. We identified 51 features, comprising exposures to established modifiable factors and other factors before and after 65 years old. The performances of four machine learning models, including LASSO, Multilayer perceptron (MLP), XGBoost, and LightGBM, were compared with logistic regression models, for all patients and subgroups by age. Findings A total of 159,920 individuals (40.5% male; mean age [SD]: 83.97 [7.38]) were included. Compared with the model included established modifiable factors only (area under the curve [AUC] 0.689, 95% CI [0.684, 0.694]), the predictive accuracy substantially improved for models with all factors (0.774, [0.770, 0.778]). Machine learning and logistic regression models performed similarly, with AUC ranged between 0.773 (0.768, 0.777) for LASSO and 0.780 (0.776, 0.784) for MLP. Antipsychotics, education, antidepressants, head injury, and stroke were identified as the most important predictors in the total sample. Age-specific models identified different important features, with cardiovascular and infectious diseases becoming prominent in older ages. Interpretation The models showed satisfactory performances in identifying dementia. These algorithms can be used in clinical practice to assist decision making and allow timely interventions cost-effectively. Funding The Research Grants Council of Hong Kong under the Early Career Scheme 27110519.
Collapse
Affiliation(s)
- Jiayi Zhou
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
| | - Wenlong Liu
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Huiquan Zhou
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Kui Kai Lau
- Department of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Gloria H.Y. Wong
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
| | - Wai Chi Chan
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Qingpeng Zhang
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China
| | - Martin Knapp
- Care Policy and Evaluation Centre (CPEC), The London School of Economics and Political Science, London, UK
| | - Ian C.K. Wong
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Laboratory of Data Discovery for Health (D24H), Hong Kong Science and Technology Park, Sha Tin, Hong Kong SAR, China
- Aston Pharmacy School, Aston University, Birmingham B4 7ET, UK
| | - Hao Luo
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
- Department of Computer Science, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
5
|
Bucholc M, James C, Khleifat AA, Badhwar A, Clarke N, Dehsarvi A, Madan CR, Marzi SJ, Shand C, Schilder BM, Tamburin S, Tantiangco HM, Lourida I, Llewellyn DJ, Ranson JM. Artificial intelligence for dementia research methods optimization. Alzheimers Dement 2023; 19:5934-5951. [PMID: 37639369 DOI: 10.1002/alz.13441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/31/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation.
Collapse
Affiliation(s)
- Magda Bucholc
- Cognitive Analytics Research Lab, School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Charlotte James
- NIHR Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and University of Bristol, Bristol, UK
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - AmanPreet Badhwar
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
- Institut de génie biomédical, Université de Montréal, Montréal, Quebec, Canada
- Département de Pharmacologie et Physiologie, Université de Montréal, Montréal, Quebec, Canada
| | - Natasha Clarke
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
| | - Amir Dehsarvi
- Aberdeen Biomedical Imaging Centre, School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, UK
| | | | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Cameron Shand
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| | | |
Collapse
|
6
|
Zolnoori M, Barrón Y, Song J, Noble J, Burgdorf J, Ryvicker M, Topaz M. HomeADScreen: Developing Alzheimer's disease and related dementia risk identification model in home healthcare. Int J Med Inform 2023; 177:105146. [PMID: 37454558 PMCID: PMC10529395 DOI: 10.1016/j.ijmedinf.2023.105146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 06/22/2023] [Accepted: 07/07/2023] [Indexed: 07/18/2023]
Abstract
BACKGROUND More than 50 % of patients with Alzheimer's disease and related dementia (ADRD) remain undiagnosed. This is specifically the case for home healthcare (HHC) patients. OBJECTIVES This study aimed at developing HomeADScreen, an ADRD risk screening model built on the combination of HHC patients' structured data and information extracted from HHC clinical notes. METHODS The study's sample included 15,973 HHC patients with no diagnosis of ADRD and 8,901 patients diagnosed with ADRD across four follow-up time windows. First, we applied two natural language processing methods, Word2Vec and topic modeling methods, to extract ADRD risk factors from clinical notes. Next, we built the risk identification model on the combination of the Outcome and Assessment Information Set (OASIS-structured data collected in the HHC setting) and clinical notes-risk factors across the four-time windows. RESULTS The top-performing machine learning algorithm attained an Area under the Curve = 0.76 for a four-year risk prediction time window. After optimizing the cut-off value for screening patients with ADRD (cut-off-value = 0.31), we achieved sensitivity = 0.75 and an F1-score = 0.63. For the first-year time window, adding clinical note-derived risk factors to OASIS data improved the overall performance of the risk identification model by 60 %. We observed a similar trend of increasing the model's overall performance across other time windows. Variables associated with increased risk of ADRD were "hearing impairment" and "impaired patient ability in the use of telephone." On the other hand, being "non-Hispanic White" and the "absence of impairment with prior daily functioning" were associated with a lower risk of ADRD. CONCLUSION HomeADScreen has a strong potential to be translated into clinical practice and assist HHC clinicians in assessing patients' cognitive function and referring them for further neurological assessment.
Collapse
Affiliation(s)
- Maryam Zolnoori
- Columbia University Irving Medical Center, New York, NY, USA; Center for Home Care Policy & Research, VNS Health, New York, NY, USA; School of Nursing, Columbia University, USA.
| | - Yolanda Barrón
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | | | - James Noble
- Columbia University Irving Medical Center, New York, NY, USA
| | - Julia Burgdorf
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Miriam Ryvicker
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA
| | - Maxim Topaz
- Center for Home Care Policy & Research, VNS Health, New York, NY, USA; School of Nursing, Columbia University, USA
| |
Collapse
|
7
|
Cho E, Kim S, Heo SJ, Shin J, Hwang S, Kwon E, Lee S, Kim S, Kang B. Machine learning-based predictive models for the occurrence of behavioral and psychological symptoms of dementia: model development and validation. Sci Rep 2023; 13:8073. [PMID: 37202454 DOI: 10.1038/s41598-023-35194-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 05/14/2023] [Indexed: 05/20/2023] Open
Abstract
The behavioral and psychological symptoms of dementia (BPSD) are challenging aspects of dementia care. This study used machine learning models to predict the occurrence of BPSD among community-dwelling older adults with dementia. We included 187 older adults with dementia for model training and 35 older adults with dementia for external validation. Demographic and health data and premorbid personality traits were examined at the baseline, and actigraphy was utilized to monitor sleep and activity levels. A symptom diary tracked caregiver-perceived symptom triggers and the daily occurrence of 12 BPSD classified into seven subsyndromes. Several prediction models were also employed, including logistic regression, random forest, gradient boosting machine, and support vector machine. The random forest models revealed the highest area under the receiver operating characteristic curve (AUC) values for hyperactivity, euphoria/elation, and appetite and eating disorders; the gradient boosting machine models for psychotic and affective symptoms; and the support vector machine model showed the highest AUC. The gradient boosting machine model achieved the best performance in terms of average AUC scores across the seven subsyndromes. Caregiver-perceived triggers demonstrated higher feature importance values across the seven subsyndromes than other features. Our findings demonstrate the possibility of predicting BPSD using a machine learning approach.
Collapse
Affiliation(s)
- Eunhee Cho
- Mo-Im Kim Nursing Research Institute, Yonsei University College of Nursing, 50-1, Yonsei-Ro, Seodaemun-gu, Seoul, 03722, Republic of Korea
| | - Sujin Kim
- Department of Nursing, Yong-In Arts and Science University, Gyeonggi-do, Korea
| | - Seok-Jae Heo
- Division of Biostatistics, Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea
| | - Jinhee Shin
- College of Nursing, Woosuk University, Jeollabuk-do, Korea
| | - Sinwoo Hwang
- Korea Armed Forces Nursing Academy, Daejeon, Korea
| | - Eunji Kwon
- Korea Armed Forces Nursing Academy, Daejeon, Korea
| | | | | | - Bada Kang
- Mo-Im Kim Nursing Research Institute, Yonsei University College of Nursing, 50-1, Yonsei-Ro, Seodaemun-gu, Seoul, 03722, Republic of Korea.
| |
Collapse
|
8
|
Bucholc M, James C, Al Khleifat A, Badhwar A, Clarke N, Dehsarvi A, Madan CR, Marzi SJ, Shand C, Schilder BM, Tamburin S, Tantiangco HM, Lourida I, Llewellyn DJ, Ranson JM. Artificial Intelligence for Dementia Research Methods Optimization. ARXIV 2023:arXiv:2303.01949v1. [PMID: 36911275 PMCID: PMC10002770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/14/2023]
Abstract
INTRODUCTION Machine learning (ML) has been extremely successful in identifying key features from high-dimensional datasets and executing complicated tasks with human expert levels of accuracy or greater. METHODS We summarize and critically evaluate current applications of ML in dementia research and highlight directions for future research. RESULTS We present an overview of ML algorithms most frequently used in dementia research and highlight future opportunities for the use of ML in clinical practice, experimental medicine, and clinical trials. We discuss issues of reproducibility, replicability and interpretability and how these impact the clinical applicability of dementia research. Finally, we give examples of how state-of-the-art methods, such as transfer learning, multi-task learning, and reinforcement learning, may be applied to overcome these issues and aid the translation of research to clinical practice in the future. DISCUSSION ML-based models hold great promise to advance our understanding of the underlying causes and pathological mechanisms of dementia.
Collapse
Affiliation(s)
- Magda Bucholc
- Cognitive Analytics Research Lab, School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Charlotte James
- NIHR Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and University of Bristol, Bristol, UK
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
| | - AmanPreet Badhwar
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
- Institut de génie biomédical, Université de Montréal, Montréal, Canada
- Département de Pharmacologie et Physiologie, Université de Montréal, Montréal, Canada
| | - Natasha Clarke
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l’Institut Universitaire de Gériatrie de Montréal, Montréal, Canada
| | - Amir Dehsarvi
- Aberdeen Biomedical Imaging Centre, School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, UK
| | | | - Sarah J. Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Cameron Shand
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Brian M. Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - David J. Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| | | |
Collapse
|
9
|
Maclagan LC, Abdalla M, Harris DA, Stukel TA, Chen B, Candido E, Swartz RH, Iaboni A, Jaakkimainen RL, Bronskill SE. Can Patients with Dementia Be Identified in Primary Care Electronic Medical Records Using Natural Language Processing? JOURNAL OF HEALTHCARE INFORMATICS RESEARCH 2023; 7:42-58. [PMID: 36910911 PMCID: PMC9995630 DOI: 10.1007/s41666-023-00125-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Revised: 12/23/2022] [Accepted: 01/07/2023] [Indexed: 01/24/2023]
Abstract
Dementia and mild cognitive impairment can be underrecognized in primary care practice and research. Free-text fields in electronic medical records (EMRs) are a rich source of information which might support increased detection and enable a better understanding of populations at risk of dementia. We used natural language processing (NLP) to identify dementia-related features in EMRs and compared the performance of supervised machine learning models to classify patients with dementia. We assembled a cohort of primary care patients aged 66 + years in Ontario, Canada, from EMR notes collected until December 2016: 526 with dementia and 44,148 without dementia. We identified dementia-related features by applying published lists, clinician input, and NLP with word embeddings to free-text progress and consult notes and organized features into thematic groups. Using machine learning models, we compared the performance of features to detect dementia, overall and during time periods relative to dementia case ascertainment in health administrative databases. Over 900 dementia-related features were identified and grouped into eight themes (including symptoms, social, function, cognition). Using notes from all time periods, LASSO had the best performance (F1 score: 77.2%, sensitivity: 71.5%, specificity: 99.8%). Model performance was poor when notes written before case ascertainment were included (F1 score: 14.4%, sensitivity: 8.3%, specificity 99.9%) but improved as later notes were added. While similar models may eventually improve recognition of cognitive issues and dementia in primary care EMRs, our findings suggest that further research is needed to identify which additional EMR components might be useful to promote early detection of dementia. Supplementary Information The online version contains supplementary material available at 10.1007/s41666-023-00125-6.
Collapse
Affiliation(s)
| | - Mohamed Abdalla
- Department of Computer Science, University of Toronto, Toronto, Canada
| | - Daniel A. Harris
- Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Therese A. Stukel
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
| | - Branson Chen
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
| | - Elisa Candido
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
| | - Richard H. Swartz
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
- Department of Medicine (Neurology), Sunnybrook Health Sciences Centre, University of Toronto, Toronto, Canada
| | - Andrea Iaboni
- KITE Research Institute, Toronto Rehabilitation Institute, University Health Network, Toronto, Canada
- Department of Psychiatry, University of Toronto, Toronto, Canada
| | - R. Liisa Jaakkimainen
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
- Department of Family and Community Medicine, University of Toronto, Toronto, Canada
| | - Susan E. Bronskill
- ICES, G1-06, 2075 Bayview Avenue, Toronto, M4N 3M5 Canada
- Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, Canada
- Sunnybrook Research Institute, Sunnybrook Health Sciences Centre, Toronto, Canada
- Women’s College Research Institute, Women’s College Hospital, Toronto, Canada
| |
Collapse
|
10
|
Javeed A, Dallora AL, Berglund JS, Ali A, Ali L, Anderberg P. Machine Learning for Dementia Prediction: A Systematic Review and Future Research Directions. J Med Syst 2023; 47:17. [PMID: 36720727 PMCID: PMC9889464 DOI: 10.1007/s10916-023-01906-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2022] [Accepted: 01/03/2023] [Indexed: 02/02/2023]
Abstract
Nowadays, Artificial Intelligence (AI) and machine learning (ML) have successfully provided automated solutions to numerous real-world problems. Healthcare is one of the most important research areas for ML researchers, with the aim of developing automated disease prediction systems. One of the disease detection problems that AI and ML researchers have focused on is dementia detection using ML methods. Numerous automated diagnostic systems based on ML techniques for early prediction of dementia have been proposed in the literature. Few systematic literature reviews (SLR) have been conducted for dementia prediction based on ML techniques in the past. However, these SLR focused on a single type of data modality for the detection of dementia. Hence, the purpose of this study is to conduct a comprehensive evaluation of ML-based automated diagnostic systems considering different types of data modalities such as images, clinical-features, and voice data. We collected the research articles from 2011 to 2022 using the keywords dementia, machine learning, feature selection, data modalities, and automated diagnostic systems. The selected articles were critically analyzed and discussed. It was observed that image data driven ML models yields promising results in terms of dementia prediction compared to other data modalities, i.e., clinical feature-based data and voice data. Furthermore, this SLR highlighted the limitations of the previously proposed automated methods for dementia and presented future directions to overcome these limitations.
Collapse
Affiliation(s)
- Ashir Javeed
- Aging Research Center, Karolinska Institutet, Tomtebodavagen, Stockholm, 17165, Solna, Sweden
- Department of Health, Blekinge Institute of Technology, Valhallavägen 1, Karlskrona, 37141, Blekinge, Sweden
| | - Ana Luiza Dallora
- Department of Health, Blekinge Institute of Technology, Valhallavägen 1, Karlskrona, 37141, Blekinge, Sweden
| | - Johan Sanmartin Berglund
- Department of Health, Blekinge Institute of Technology, Valhallavägen 1, Karlskrona, 37141, Blekinge, Sweden.
| | - Arif Ali
- Department of Computer Science, University of Science and Technology Bannu, Township, Bannu, 28100, Khyber-Pakhtunkhwa, Pakistan
| | - Liaqat Ali
- Department of Electrical Engineering, University of Science and Technology Bannu, Township, Bannu, 28100, Khyber-Pakhtunkhwa, Pakistan
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, Valhallavägen 1, Karlskrona, 37141, Blekinge, Sweden
- School of Health Sciences, University of Skovde, Högskolevägen 1, Skövde, SE-541 28, Skövde, Sweden
| |
Collapse
|
11
|
Shehzad A, Rockwood K, Stanley J, Dunn T, Howlett SE. Use of Patient-Reported Symptoms from an Online Symptom Tracking Tool for Dementia Severity Staging: Development and Validation of a Machine Learning Approach. J Med Internet Res 2020; 22:e20840. [PMID: 33174853 PMCID: PMC7688393 DOI: 10.2196/20840] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Revised: 08/17/2020] [Accepted: 10/24/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND SymptomGuide Dementia (DGI Clinical Inc) is a publicly available online symptom tracking tool to support caregivers of persons living with dementia. The value of such data are enhanced when the specific dementia stage is identified. OBJECTIVE We aimed to develop a supervised machine learning algorithm to classify dementia stages based on tracked symptoms. METHODS We employed clinical data from 717 people from 3 sources: (1) a memory clinic; (2) long-term care; and (3) an open-label trial of donepezil in vascular and mixed dementia (VASPECT). Symptoms were captured with SymptomGuide Dementia. A clinician classified participants into 4 groups using either the Functional Assessment Staging Test or the Global Deterioration Scale as mild cognitive impairment, mild dementia, moderate dementia, or severe dementia. Individualized symptom profiles from the pooled data were used to train machine learning models to predict dementia severity. Models trained with 6 different machine learning algorithms were compared using nested cross-validation to identify the best performing model. Model performance was assessed using measures of balanced accuracy, precision, recall, Cohen κ, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). The best performing algorithm was used to train a model optimized for balanced accuracy. RESULTS The study population was mostly female (424/717, 59.1%), older adults (mean 77.3 years, SD 10.6, range 40-100) with mild to moderate dementia (332/717, 46.3%). Age, duration of symptoms, 37 unique dementia symptoms, and 10 symptom-derived variables were used to distinguish dementia stages. A model trained with a support vector machine learning algorithm using a one-versus-rest approach showed the best performance. The correct dementia stage was identified with 83% balanced accuracy (Cohen κ=0.81, AUPRC 0.91, AUROC 0.96). The best performance was seen when classifying severe dementia (AUROC 0.99). CONCLUSIONS A supervised machine learning algorithm exhibited excellent performance in identifying dementia stages based on dementia symptoms reported in an online environment. This novel dementia staging algorithm can be used to describe dementia stage based on user-reported symptoms. This type of symptom recording offers real-world data that reflect important symptoms in people with dementia.
Collapse
Affiliation(s)
| | - Kenneth Rockwood
- DGI Clinical Inc, Halifax, NS, Canada.,Geriatric Medicine Research Unit, Nova Scotia Health Authority, Halifax, NS, Canada.,Division of Geriatric Medicine, Dalhousie University, Halifax, NS, Canada
| | | | | | - Susan E Howlett
- DGI Clinical Inc, Halifax, NS, Canada.,Division of Geriatric Medicine, Dalhousie University, Halifax, NS, Canada.,Department of Pharmacology, Dalhousie University, Halifax, NS, Canada
| |
Collapse
|