1
|
Lin K, Washington PY. Multimodal deep learning for dementia classification using text and audio. Sci Rep 2024; 14:13887. [PMID: 38880810 PMCID: PMC11180654 DOI: 10.1038/s41598-024-64438-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 06/10/2024] [Indexed: 06/18/2024] Open
Abstract
Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.
Collapse
Affiliation(s)
- Kaiying Lin
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
- Department of Linguistics, University of Hawai'i, Honolulu, 96822, USA.
| | - Peter Y Washington
- Department of Information and Computer Science, University of Hawai'i, Honolulu, 96822, USA.
| |
Collapse
|
2
|
LUZ SATURNINO, HAIDER FASIH, FROMM DAVIDA, LAZAROU IOULIETTA, KOMPATSIARIS IOANNIS, MACWHINNEY BRIAN. An Overview of the ADReSS-M Signal Processing Grand Challenge on Multilingual Alzheimer's Dementia Recognition Through Spontaneous Speech. IEEE OPEN JOURNAL OF SIGNAL PROCESSING 2024; 5:738-749. [PMID: 38957540 PMCID: PMC11218814 DOI: 10.1109/ojsp.2024.3378595] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
The ADReSS-M Signal Processing Grand Challenge was held at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023. The challenge targeted difficult automatic prediction problems of great societal and medical relevance, namely, the detection of Alzheimer's Dementia (AD) and the estimation of cognitive test scoress. Participants were invited to create models for the assessment of cognitive function based on spontaneous speech data. Most of these models employed signal processing and machine learning methods. The ADReSS-M challenge was designed to assess the extent to which predictive models built based on speech in one language generalise to another language. The language data compiled and made available for ADReSS-M comprised English, for model training, and Greek, for model testing and validation. To the best of our knowledge no previous shared research task investigated acoustic features of the speech signal or linguistic characteristics in the context of multilingual AD detection. This paper describes the context of the ADReSS-M challenge, its data sets, its predictive tasks, the evaluation methodology we employed, our baseline models and results, and the top five submissions. The paper concludes with a summary discussion of the ADReSS-M results, and our critical assessment of the future outlook in this field.
Collapse
Affiliation(s)
- SATURNINO LUZ
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, EH16 4UX Edinburgh, U.K
| | - FASIH HAIDER
- School of Engineering, The University of Edinburgh, EH9 3JW Edinburgh, U.K
| | - DAVIDA FROMM
- Department of Psychology, Carnegie Mellon University, Pittsburgh 15213, PA USA
| | - IOULIETTA LAZAROU
- Information Technologies Institute, CERTH, Thessaloniki, Thermi-Thessaloniki 57001, Greece
| | - IOANNIS KOMPATSIARIS
- Information Technologies Institute, CERTH, Thessaloniki, Thermi-Thessaloniki 57001, Greece
| | - BRIAN MACWHINNEY
- Department of Psychology, Carnegie Mellon University, Pittsburgh 15213, PA USA
| |
Collapse
|
3
|
Runde BS, Alapati A, Bazan NG. The Optimization of a Natural Language Processing Approach for the Automatic Detection of Alzheimer's Disease Using GPT Embeddings. Brain Sci 2024; 14:211. [PMID: 38539600 PMCID: PMC10968873 DOI: 10.3390/brainsci14030211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 02/19/2024] [Accepted: 02/22/2024] [Indexed: 04/04/2024] Open
Abstract
The development of noninvasive and cost-effective methods of detecting Alzheimer's disease (AD) is essential for its early prevention and mitigation. We optimize the detection of AD using natural language processing (NLP) of spontaneous speech through the use of audio enhancement techniques and novel transcription methodologies. Specifically, we utilized Boll Spectral Subtraction to improve audio fidelity and created transcriptions using state-of-the-art AI services-locally-based Wav2Vec and Whisper, alongside cloud-based IBM Cloud and Rev AI-evaluating their performance against traditional manual transcription methods. Support Vector Machine (SVM) classifiers were then trained and tested using GPT-based embeddings of transcriptions. Our findings revealed that AI-based transcriptions largely outperformed traditional manual ones, with Wav2Vec (enhanced audio) achieving the best accuracy and F-1 score (0.99 for both metrics) for locally-based systems and Rev AI (standard audio) performing the best for cloud-based systems (0.96 for both metrics). Furthermore, this study revealed the detrimental effects of interviewer speech on model performance in addition to the minimal effect of audio enhancement. Based on our findings, current AI transcription and NLP technologies are highly effective at accurately detecting AD with available data but struggle to classify probable AD and mild cognitive impairment (MCI), a prodromal stage of AD, due to a lack of training data, laying the groundwork for the future implementation of an automatic AD detection system.
Collapse
Affiliation(s)
- Benjamin S. Runde
- Science Engineering Research Center, The Potomac School, McLean, VA 22101, USA
| | - Ajit Alapati
- Neuroscience Center of Excellence, School of Medicine, New Orleans, LA 70112, USA;
| | - Nicolas G. Bazan
- Neuroscience Center of Excellence, School of Medicine, New Orleans, LA 70112, USA;
| |
Collapse
|
4
|
Runde BS, Alapati A, Bazan NG. The Optimization of a Natural Language Processing Approach for the Automatic Detection of Alzheimer's Disease Using GPT Embeddings. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.14.24301297. [PMID: 38293012 PMCID: PMC10827239 DOI: 10.1101/2024.01.14.24301297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
As the impact of Alzheimer's disease (AD) is projected to grow in the coming decades as the world's population ages, the development of noninvasive and cost-effective methods of detecting AD is essential for the early prevention and mitigation of the progressive disease, alleviating its expected global impact. This study analyzes audio processing techniques and transcription methodologies to optimize the detection of AD through the natural language processing (NLP) of spontaneous speech. We enhanced audio fidelity using Boll Spectral Subtraction and evaluated the transcription accuracy of state-of-the-art AI services-locally-based Wav2Vec and Whisper, alongside cloud-based IBM Cloud and Rev AI-against traditional manual transcription methods. The choice between local and cloud-based solutions hinges on a trade-off between privacy, ongoing costs, and computational requirements. Leveraging OpenAI's GPT for word embeddings, we enhanced the training of Support Vector Machine (SVM) classifiers, which were crucial in analyzing transcripts and refining detection accuracy. Our findings reveal that AI-driven transcriptions significantly outperform manual counterparts when classifying AD and Control samples, with Wav2Vec using enhanced audio exhibiting the highest accuracy and F-1 scores (0.99 for both metrics) for locally based systems and Rev AI using unenhanced audio leading cloud-based methods with comparable precision (0.96 for both metrics). The study also uncovers the detrimental effect of including interviewer speech in recordings on model performance, advocating for the exclusion of such interactions to improve data quality for AD classification algorithms. Our comprehensive evaluation demonstrates that AI transcription (both Cloud and Local) and NLP technologies in their current forms can classify AD, as well as probable AD and mild cognitive impairment (MCI), a prodromal stage of AD, accurately but suffer from a lack of available training data. The insights garnered from this research lay the groundwork for future advancements in the noninvasive monitoring and early detection of cognitive impairments through linguistic analysis.
Collapse
Affiliation(s)
| | - Ajit Alapati
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University
| | - Nicolas G Bazan
- Neuroscience Center of Excellence, School of Medicine, Louisiana State University
| |
Collapse
|
5
|
Fromm D, Dalton SG, Brick A, Olaiya G, Hill S, Greenhouse J, MacWhinney B. The Case of the Cookie Jar: Differences in Typical Language Use in Dementia. J Alzheimers Dis 2024; 100:1417-1434. [PMID: 38995772 PMCID: PMC11380261 DOI: 10.3233/jad-230844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2024]
Abstract
Background Findings from language sample analyses can provide efficient and effective indicators of cognitive impairment in older adults. Objective This study used newly automated core lexicon analyses of Cookie Theft picture descriptions to assess differences in typical use across three groups. Methods Participants included adults without diagnosed cognitive impairments (Control), adults diagnosed with Alzheimer's disease (ProbableAD), and adults diagnosed with mild cognitive impairment (MCI). Cookie Theft picture descriptions were transcribed and analyzed using CLAN. Results Results showed that the ProbableAD group used significantly fewer core lexicon words overall than the MCI and Control groups. For core lexicon content words (nouns, verbs), however, both the MCI and ProbableAD groups produced significantly fewer words than the Control group. The groups did not differ in their use of core lexicon function words. The ProbableAD group was also slower to produce most of the core lexicon words than the MCI and Control groups. The MCI group was slower than the Control group for only two of the core lexicon content words. All groups mentioned a core lexicon word in the top left quadrant of the picture early in the description. The ProbableAD group was then significantly slower than the other groups to mention a core lexicon word in the other quadrants. Conclusions This standard and simple-to-administer task reveals group differences in overall core lexicon scores and the amount of time until the speaker produces the key items. Clinicians and researchers can use these tools for both early assessment and measurement of change over time.
Collapse
Affiliation(s)
- Davida Fromm
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Sarah Grace Dalton
- Department of Speech Pathology and Audiology, Marquette University, Milwaukee, WI, USA
| | - Alexander Brick
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Gbenuola Olaiya
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Sophia Hill
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Joel Greenhouse
- Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Brian MacWhinney
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Wang C, Liu S, Li A, Liu J. Text Dialogue Analysis for Primary Screening of Mild Cognitive Impairment: Development and Validation Study. J Med Internet Res 2023; 25:e51501. [PMID: 38157230 PMCID: PMC10787336 DOI: 10.2196/51501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/28/2023] [Accepted: 11/27/2023] [Indexed: 01/03/2024] Open
Abstract
BACKGROUND Artificial intelligence models tailored to diagnose cognitive impairment have shown excellent results. However, it is unclear whether large linguistic models can rival specialized models by text alone. OBJECTIVE In this study, we explored the performance of ChatGPT for primary screening of mild cognitive impairment (MCI) and standardized the design steps and components of the prompts. METHODS We gathered a total of 174 participants from the DementiaBank screening and classified 70% of them into the training set and 30% of them into the test set. Only text dialogues were kept. Sentences were cleaned using a macro code, followed by a manual check. The prompt consisted of 5 main parts, including character setting, scoring system setting, indicator setting, output setting, and explanatory information setting. Three dimensions of variables from published studies were included: vocabulary (ie, word frequency and word ratio, phrase frequency and phrase ratio, and lexical complexity), syntax and grammar (ie, syntactic complexity and grammatical components), and semantics (ie, semantic density and semantic coherence). We used R 4.3.0. for the analysis of variables and diagnostic indicators. RESULTS Three additional indicators related to the severity of MCI were incorporated into the final prompt for the model. These indicators were effective in discriminating between MCI and cognitively normal participants: tip-of-the-tongue phenomenon (P<.001), difficulty with complex ideas (P<.001), and memory issues (P<.001). The final GPT-4 model achieved a sensitivity of 0.8636, a specificity of 0.9487, and an area under the curve of 0.9062 on the training set; on the test set, the sensitivity, specificity, and area under the curve reached 0.7727, 0.8333, and 0.8030, respectively. CONCLUSIONS ChatGPT was effective in the primary screening of participants with possible MCI. Improved standardization of prompts by clinicians would also improve the performance of the model. It is important to note that ChatGPT is not a substitute for a clinician making a diagnosis.
Collapse
Affiliation(s)
- Changyu Wang
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- West China College of Stomatology, Sichuan University, Chengdu, China
| | - Siru Liu
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Aiqing Li
- Department of Neurology, West China Hospital, Sichuan University, Chengdu, China
| | - Jialin Liu
- Department of Medical Informatics, West China Medical School, Sichuan University, Chengdu, China
- Information Center, West China Hospital, Sichuan University, Chengdu, China
- Department of Otolaryngology-Head and Neck Surgery, West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
7
|
Liu H, MacWhinney B, Fromm D, Lanzi A. Automation of Language Sample Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:2421-2433. [PMID: 37348510 PMCID: PMC10555460 DOI: 10.1044/2023_jslhr-22-00642] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Revised: 01/14/2023] [Accepted: 04/13/2023] [Indexed: 06/24/2023]
Abstract
PURPOSE A major barrier to the wider use of language sample analysis (LSA) is the fact that transcription is very time intensive. Methods that can reduce the required time and effort could help in promoting the use of LSA for clinical practice and research. METHOD This article describes an automated pipeline, called Batchalign, that takes raw audio and creates full transcripts in Codes for the Human Analysis of Talk (CHAT) transcription format, complete with utterance- and word-level time alignments and morphosyntactic analysis. The pipeline only requires major human intervention for final checking. It combines a series of existing tools with additional novel reformatting processes. The steps in the pipeline are (a) automatic speech recognition, (b) utterance tokenization, (c) automatic corrections, (d) speaker ID assignment, (e) forced alignment, (f) user adjustments, and (g) automatic morphosyntactic and profiling analyses. RESULTS For work with recordings from adults with language disorders, six major results were obtained: (a) The word error rate was between 2.4% for controls and 3.4% for patients, (b) utterance tokenization accuracy was at the level reported for speakers without language disorders, (c) word-level diarization accuracy was at 93% for control participants and 83% for participants with language disorders, (d) utterance-level diarization accuracy based on word-level diarization was high, (e) adherence to CHAT format was fully accurate, and (f) human transcriber time was reduced by up to 75%. CONCLUSION The pipeline dramatically shortens the time gap between data collection and data analysis and provides an output superior to that typically generated by human transcribers.
Collapse
Affiliation(s)
| | - Brian MacWhinney
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA
| | - Davida Fromm
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA
| | - Alyssa Lanzi
- Communication Sciences and Disorders Department, University of Delaware, Newark
| |
Collapse
|
8
|
Brem AK, Kuruppu S, de Boer C, Muurling M, Diaz-Ponce A, Gove D, Curcic J, Pilotto A, Ng WF, Cummins N, Malzbender K, Nies VJM, Erdemli G, Graeber J, Narayan VA, Rochester L, Maetzler W, Aarsland D. Digital endpoints in clinical trials of Alzheimer's disease and other neurodegenerative diseases: challenges and opportunities. Front Neurol 2023; 14:1210974. [PMID: 37435159 PMCID: PMC10332162 DOI: 10.3389/fneur.2023.1210974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 05/26/2023] [Indexed: 07/13/2023] Open
Abstract
Alzheimer's disease (AD) and other neurodegenerative diseases such as Parkinson's disease (PD) and Huntington's disease (HD) are associated with progressive cognitive, motor, affective and consequently functional decline considerably affecting Activities of Daily Living (ADL) and quality of life. Standard assessments, such as questionnaires and interviews, cognitive testing, and mobility assessments, lack sensitivity, especially in early stages of neurodegenerative diseases and in the disease progression, and have therefore a limited utility as outcome measurements in clinical trials. Major advances in the last decade in digital technologies have opened a window of opportunity to introduce digital endpoints into clinical trials that can reform the assessment and tracking of neurodegenerative symptoms. The Innovative Health Initiative (IMI)-funded projects RADAR-AD (Remote assessment of disease and relapse-Alzheimer's disease), IDEA-FAST (Identifying digital endpoints to assess fatigue, sleep and ADL in neurodegenerative disorders and immune-mediated inflammatory diseases) and Mobilise-D (Connecting digital mobility assessment to clinical outcomes for regulatory and clinical endorsement) aim to identify digital endpoints relevant for neurodegenerative diseases that provide reliable, objective, and sensitive evaluation of disability and health-related quality of life. In this article, we will draw from the findings and experiences of the different IMI projects in discussing (1) the value of remote technologies to assess neurodegenerative diseases; (2) feasibility, acceptability and usability of digital assessments; (3) challenges related to the use of digital tools; (4) public involvement and the implementation of patient advisory boards; (5) regulatory learnings; and (6) the significance of inter-project exchange and data- and algorithm-sharing.
Collapse
Affiliation(s)
- Anna-Katharine Brem
- Department of Old Age Psychiatry, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- University Hospital of Old Age Psychiatry, University of Bern, Bern, Switzerland
| | - Sajini Kuruppu
- Department of Old Age Psychiatry, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | - Casper de Boer
- Alzheimer Center Amsterdam, Neurology, Vrije Universiteit Amsterdam, Amsterdam UMC Location VUmc, Amsterdam, Netherlands
- Amsterdam Neuroscience, Neurodegeneration, Amsterdam, Netherlands
| | - Marijn Muurling
- Alzheimer Center Amsterdam, Neurology, Vrije Universiteit Amsterdam, Amsterdam UMC Location VUmc, Amsterdam, Netherlands
- Amsterdam Neuroscience, Neurodegeneration, Amsterdam, Netherlands
| | | | | | - Jelena Curcic
- Novartis Institutes for Biomedical Research (NIBR), Basel, Switzerland
| | - Andrea Pilotto
- Neurology Unit, Department of Clinical and Experimental Sciences, University of Brescia, Brescia, Italy
- Laboratory of Digital Neurology and Biosensors, University of Brescia, Brescia, Italy
- Neurology Unit, Department of Continuity of Care and Frailty, ASST Spedali Civili Brescia Hospital, Brescia, Italy
| | - Wan-Fai Ng
- Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- NIHR Newcastle Biomedical Research Centre and Clinical Research Facility, Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Nicholas Cummins
- Department of Biostats and Health Informatics, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
| | | | | | - Gul Erdemli
- Novartis Pharmaceuticals Corporations, Cambridge, MA, United States
| | - Johanna Graeber
- Institute of General Practice, University Medical Center Schleswig-Holstein, Kiel University, Kiel, Germany
| | | | - Lynn Rochester
- Faculty of Medical Sciences, Translational and Clinical Research Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
- National Institute for Health and Care Research (NIHR) Newcastle Biomedical Research Centre (BRC), Newcastle University and The Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, United Kingdom
| | - Walter Maetzler
- Department of Neurology, University Hospital Schleswig-Holstein and Kiel University, Kiel, Germany
| | - Dag Aarsland
- Department of Old Age Psychiatry, King’s College London, Institute of Psychiatry, Psychology and Neuroscience, London, United Kingdom
- Centre for Age-Related Medicine, Stavanger University Hospital, Stavanger, Norway
| |
Collapse
|