1
|
Lin K, Washington PY. Multimodal deep learning for dementia classification using text and audio. Sci Rep 2024; 14:13887. [PMID: 38880810 PMCID: PMC11180654 DOI: 10.1038/s41598-024-64438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/10/2024] [Indexed: 06/18/2024] Open
Abstract
Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.
Collapse
Affiliation(s)
- Kaiying Lin
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
- Department of Linguistics, University of Hawai'i, Honolulu, 96822, USA.
| | - Peter Y Washington
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
| |
Collapse
|
2
|
Lukic S, Fan Z, García AM, Welch AE, Ratnasiri BM, Wilson SM, Henry ML, Vonk J, Deleon J, Miller BL, Miller Z, Mandelli ML, Gorno-Tempini ML. Discriminating nonfluent/agrammatic and logopenic PPA variants with automatically extracted morphosyntactic measures from connected speech. Cortex 2024; 173:34-48. [PMID: 38359511 DOI: 10.1016/j.cortex.2023.12.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 10/15/2023] [Accepted: 12/12/2023] [Indexed: 02/17/2024]
Abstract
Morphosyntactic assessments are important for characterizing individuals with nonfluent/agrammatic variant primary progressive aphasia (nfvPPA). Yet, standard tests are subject to examiner bias and often fail to differentiate between nfvPPA and logopenic variant PPA (lvPPA). Moreover, relevant neural signatures remain underexplored. Here, we leverage natural language processing tools to automatically capture morphosyntactic disturbances and their neuroanatomical correlates in 35 individuals with nfvPPA relative to 10 healthy controls (HC) and 26 individuals with lvPPA. Participants described a picture, and ensuing transcripts were analyzed via part-of-speech tagging to extract sentence-related features (e.g., subordinating and coordinating conjunctions), verbal-related features (e.g., tense markers), and nominal-related features (e.g., subjective and possessive pronouns). Gradient boosting machines were used to classify between groups using all features. We identified the most discriminant morphosyntactic marker via a feature importance algorithm and examined its neural correlates via voxel-based morphometry. Individuals with nfvPPA produced fewer morphosyntactic elements than the other two groups. Such features robustly discriminated them from both individuals with lvPPA and HCs with an AUC of .95 and .82, respectively. The most discriminatory feature corresponded to subordinating conjunctions was correlated with cortical atrophy within the left posterior inferior frontal gyrus across groups (pFWE < .05). Automated morphosyntactic analysis can efficiently differentiate nfvPPA from lvPPA. Also, the most sensitive morphosyntactic markers correlate with a core atrophy region of nfvPPA. Our approach, thus, can contribute to a key challenge in PPA diagnosis.
Collapse
Affiliation(s)
- Sladjana Lukic
- University of California, San Francisco Memory and Aging Center, CA, USA; Ruth S. Ammon College of Education and Health Sciences, Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY, USA.
| | - Zekai Fan
- Heinz College of Information Systems and Public Policy, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Adolfo M García
- Global Brain Health Institute (GBHI), University of California, San Francisco, CA, USA; Cognitive Neuroscience Center, Universidad de San Andrés, Buenos Aires, Argentina; Departamento de Lingüística y Literatura, Facultad de Humanidades, Universidad de Santiago de Chile, Santiago, Chile
| | - Ariane E Welch
- Ruth S. Ammon College of Education and Health Sciences, Department of Communication Sciences and Disorders, Adelphi University, Garden City, NY, USA
| | | | - Stephen M Wilson
- School of Health and Rehabilitation Sciences, University of Queensland, Brisbane, QLD, Australia
| | - Maya L Henry
- University of Texas at Austin Moody College of Communication, Austin, TX, USA
| | - Jet Vonk
- University of California, San Francisco Memory and Aging Center, CA, USA
| | - Jessica Deleon
- University of California, San Francisco Memory and Aging Center, CA, USA
| | - Bruce L Miller
- University of California, San Francisco Memory and Aging Center, CA, USA
| | - Zachary Miller
- University of California, San Francisco Memory and Aging Center, CA, USA
| | | | | |
Collapse
|
3
|
Gagliardi G. Natural language processing techniques for studying language in pathological ageing: A scoping review. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2024; 59:110-122. [PMID: 36960885 DOI: 10.1111/1460-6984.12870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 02/27/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND In the past few years there has been a growing interest in the employment of verbal productions as digital biomarkers, namely objective, quantifiable behavioural data that can be collected and measured by means of digital devices, allowing for a low-cost pathology detection, classification and monitoring. Numerous research papers have been published on the automatic detection of subtle verbal alteration, starting from written texts, raw speech recordings and transcripts, and such linguistic analysis has been singled out as a cost-effective method for diagnosing dementia and other medical conditions common among elderly patients (e.g., cognitive dysfunctions associated with metabolic disorders, dysarthria). AIMS To provide a critical appraisal and synthesis of evidence concerning the application of natural language processing (NLP) techniques for clinical purposes in the geriatric population. In particular, we discuss the state of the art on studying language in healthy and pathological ageing, focusing on the latest research efforts to build non-intrusive language-based tools for the early identification of cognitive frailty due to dementia. We also discuss some challenges and open problems raised by this approach. METHODS & PROCEDURES We performed a scoping review to examine emerging evidence about this novel domain. Potentially relevant studies published up to November 2021 were identified from the databases of MEDLINE, Cochrane and Web of Science. We also browsed the proceedings of leading international conferences (e.g., ACL, COLING, Interspeech, LREC) from 2017 to 2021, and checked the reference lists of relevant studies and reviews. MAIN CONTRIBUTION The paper provides an introductory, but complete, overview of the application of NLP techniques for studying language disruption due to dementia. We also suggest that this technique can be fruitfully applied to other medical conditions (e.g., cognitive dysfunctions associated with dysarthria, cerebrovascular disease and mood disorders). CONCLUSIONS & IMPLICATIONS Despite several critical points need to be addressed by the scientific community, a growing body of empirical evidence shows that NLP techniques can represent a promising tool for studying language changes in pathological aging, with a high potential to lead a significant shift in clinical practice. WHAT THIS PAPER ADDS What is already known on this subject Speech and languages abilities change due to non-pathological neurocognitive ageing and neurodegenerative processes. These subtle verbal modifications can be measured through NLP techniques and used as biomarkers for screening/diagnostic purposes in the geriatric population (i.e., digital linguistic biomarkers-DLBs). What this paper adds to existing knowledge The review shows that DLBs can represent a promising clinical tool, with a high potential to spark a major shift to dementia assessment in the elderly. Some challenges and open problems are also discussed. What are the potential or actual clinical implications of this work? This methodological review represents a starting point for clinicians approaching the DLB research field for studying language in healthy and pathological ageing. It summarizes the state of the art and future research directions of this novel approach.
Collapse
Affiliation(s)
- Gloria Gagliardi
- Department of Classical Philology and Italian Studies, University of Bologna, Bologna, Italy
| |
Collapse
|
4
|
Liu N, Wang L. An approach for assisting diagnosis of Alzheimer's disease based on natural language processing. Front Aging Neurosci 2023; 15:1281726. [PMID: 38035270 PMCID: PMC10687444 DOI: 10.3389/fnagi.2023.1281726] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/17/2023] [Indexed: 12/02/2023] Open
Abstract
Introduction Alzheimer's Disease (AD) is a common dementia which affects linguistic function, memory, cognitive and visual spatial ability of the patients. Language is proved to have the relationship with AD, so the time that AD can be diagnosed in a doctor's office is coming. Methods In this study, the Pitt datasets are used to detect AD which is balanced in gender and age. First bidirectional Encoder Representation from Transformers (Bert) pretrained model is used to acquire the word vector. Then two channels are constructed in the feature extraction layer, which is, convolutional neural networks (CNN) and long and short time memory (LSTM) model to extract local features and global features respectively. The local features and global features are concatenated to generate feature vectors containing rich semantics, which are sent to softmax classifier for classification. Results Finally, we obtain a best accuracy of 89.3% which is comparative compared to other studies. In the meanwhile, we do the comparative experiments with TextCNN and LSTM model respectively, the combined model manifests best and TextCNN takes the second place. Discussion The performance illustrates the feasibility to predict AD effectively by using acoustic and linguistic datasets.
Collapse
Affiliation(s)
- Ning Liu
- School of Science/School of Big Data Science, Zhejiang University of Science and Technology, Zhejiang, China
| | - Lingxing Wang
- Department of Neurology, Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian, China
| |
Collapse
|
5
|
Shi M, Cheung G, Shahamiri SR. Speech and language processing with deep learning for dementia diagnosis: A systematic review. Psychiatry Res 2023; 329:115538. [PMID: 37864994 DOI: 10.1016/j.psychres.2023.115538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 10/06/2023] [Accepted: 10/08/2023] [Indexed: 10/23/2023]
Abstract
Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of dementia could be followed by introducing interventions that may slow down its progression or reduce its burdens. However, the diagnostic process of dementia is often complex and resource intensive. Access to diagnostic services is also an issue in low and middle-income countries. The abundance and easy accessibility of speech and language data have created new possibilities for utilizing Deep Learning (DL) technologies to be part of the dementia diagnostic process. This systematic review included studies published between 2012-2022 that utilized such technologies to aid in diagnosing dementia. We identified 72 studies using the PRISMA 2020 protocol, extracted and analyzed data from these studies and reported the related DL technologies. We found these technologies effectively differentiated between healthy individuals and those with a dementia diagnosis, highlighting their potential in the diagnosis of dementia. This systematic review provides insights into the contributions of DL-based speech and language techniques to support the dementia diagnostic process. It also offers an understanding of the advancements made in this field thus far and highlights some challenges that still need to be addressed.
Collapse
Affiliation(s)
- Mengke Shi
- Department of Electrical, Computer and Software Engineering, Faculty of Engineering, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand
| | - Gary Cheung
- Department of Psychological Medicine, Faculty of Medical and Health Sciences, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand
| | - Seyed Reza Shahamiri
- Department of Electrical, Computer and Software Engineering, Faculty of Engineering, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand.
| |
Collapse
|
6
|
Parsapoor M. AI-based assessments of speech and language impairments in dementia. Alzheimers Dement 2023; 19:4675-4687. [PMID: 37578167 DOI: 10.1002/alz.13395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/03/2023] [Accepted: 06/05/2023] [Indexed: 08/15/2023]
Abstract
Recent advancements in the artificial intelligence (AI) domain have revolutionized the early detection of cognitive impairments associated with dementia. This has motivated clinicians to use AI-powered dementia detection systems, particularly systems developed based on individuals' and patients' speech and language, for a quick and accurate identification of patients with dementia. This paper reviews articles about developing assessment tools using machine learning and deep learning algorithms trained by vocal and textual datasets.
Collapse
Affiliation(s)
- Mahboobeh Parsapoor
- Centre de Recherche Informatique de Montréal: CRIM, Montreal, Quebec, Canada
| |
Collapse
|
7
|
Liu N, Yuan Z, Chen Y, Liu C, Wang L. Learning implicit sentiments in Alzheimer's disease recognition with contextual attention features. Front Aging Neurosci 2023; 15:1122799. [PMID: 37266402 PMCID: PMC10231228 DOI: 10.3389/fnagi.2023.1122799] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 04/05/2023] [Indexed: 06/03/2023] Open
Abstract
Background Alzheimer's disease (AD) is difficult to diagnose on the basis of language because of the implicit emotion of transcripts, which is defined as a supervised fuzzy implicit emotion classification at the document level. Recent neural network-based approaches have not paid attention to the implicit sentiments entailed in AD transcripts. Method A two-level attention mechanism is proposed to detect deep semantic information toward words and sentences, which enables it to attend to more words and fewer sentences differentially when constructing document representation. Specifically, a document vector was built by progressively aggregating important words into sentence vectors and important sentences into document vectors. Results Experimental results showed that our method achieved the best accuracy of 91.6% on annotated public Pitt corpora, which validates its effectiveness in learning implicit sentiment representation for our model. Conclusion The proposed model can qualitatively select informative words and sentences using attention layers, and this method also provides good inspiration for AD diagnosis based on implicit sentiment transcripts.
Collapse
Affiliation(s)
- Ning Liu
- School of Science/School of Big Data Science, Zhejiang University of Science and Technology, Hangzhou, China
| | - Zhenming Yuan
- School of Information Science and Technology, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yan Chen
- International Unresponsive Wakefulness Syndrome and Consciousness Science Institute, Hangzhou Normal University, Hangzhou, China
| | - Chuan Liu
- School of Mathematics and Computer Science, Quanzhou Normal University, Quanzhou, Fujian, China
| | - Lingxing Wang
- Department of Neurology, Second Affiliated Hospital of Fujian Medical University, Quanzhou, Fujian, China
| |
Collapse
|
8
|
Artificial Intelligence-Enabled End-To-End Detection and Assessment of Alzheimer's Disease Using Voice. Brain Sci 2022; 13:brainsci13010028. [PMID: 36672010 PMCID: PMC9856143 DOI: 10.3390/brainsci13010028] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 12/13/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022] Open
Abstract
There is currently no simple, widely available screening method for Alzheimer's disease (AD), partly because the diagnosis of AD is complex and typically involves expensive and sometimes invasive tests not commonly available outside highly specialized clinical settings. Here, we developed an artificial intelligence (AI)-powered end-to-end system to detect AD and predict its severity directly from voice recordings. At the core of our system is the pre-trained data2vec model, the first high-performance self-supervised algorithm that works for speech, vision, and text. Our model was internally evaluated on the ADReSSo (Alzheimer's Dementia Recognition through Spontaneous Speech only) dataset containing voice recordings of subjects describing the Cookie Theft picture, and externally validated on a test dataset from DementiaBank. The AI model can detect AD with average area under the curve (AUC) of 0.846 and 0.835 on held-out and external test set, respectively. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.9616). Moreover, the model can reliably predict the subject's cognitive testing score solely based on raw voice recordings. Our study demonstrates the feasibility of using the AI-powered end-to-end model for early AD diagnosis and severity prediction directly based on voice, showing its potential for screening Alzheimer's disease in a community setting.
Collapse
|
9
|
Ivanova O, Meilán JJG, Martínez-Sánchez F, Martínez-Nicolás I, Llorente TE, González NC. Discriminating speech traits of Alzheimer's disease assessed through a corpus of reading task for Spanish language. COMPUT SPEECH LANG 2022. [DOI: 10.1016/j.csl.2021.101341] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Ambadi PS, Basche K, Koscik RL, Berisha V, Liss JM, Mueller KD. Spatio-Semantic Graphs From Picture Description: Applications to Detection of Cognitive Impairment. Front Neurol 2021; 12:795374. [PMID: 34956070 PMCID: PMC8696356 DOI: 10.3389/fneur.2021.795374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Clinical assessments often use complex picture description tasks to elicit natural speech patterns and magnify changes occurring in brain regions implicated in Alzheimer's disease and dementia. As The Cookie Theft picture description task is used in the largest Alzheimer's disease and dementia cohort studies available, we aimed to create algorithms that could characterize the visual narrative path a participant takes in describing what is happening in this image. We proposed spatio-semantic graphs, models based on graph theory that transform the participants' narratives into graphs that retain semantic order and encode the visuospatial information between content units in the image. The resulting graphs differ between Cognitively Impaired and Unimpaired participants in several important ways. Cognitively Impaired participants consistently scored higher on features that are heavily associated with symptoms of cognitive decline, including repetition, evidence of short-term memory lapses, and generally disorganized narrative descriptions, while Cognitively Unimpaired participants produced more efficient narrative paths. These results provide evidence that spatio-semantic graph analysis of these tasks can generate important insights into a participant's cognitive performance that cannot be generated from semantic analysis alone.
Collapse
Affiliation(s)
- Pranav S. Ambadi
- College of Health Solutions, Arizona State University, Tempe, AZ, United States
| | - Kristin Basche
- Division of Geriatrics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| | - Rebecca L. Koscik
- Wisconsin Alzheimer's Institute, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| | - Visar Berisha
- College of Health Solutions, Arizona State University, Tempe, AZ, United States
| | - Julie M. Liss
- College of Health Solutions, Arizona State University, Tempe, AZ, United States
| | - Kimberly D. Mueller
- Division of Geriatrics, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States
| |
Collapse
|