1
|
Parsapoor M. AI-based assessments of speech and language impairments in dementia. Alzheimers Dement 2023; 19:4675-4687. [PMID: 37578167 DOI: 10.1002/alz.13395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 06/03/2023] [Accepted: 06/05/2023] [Indexed: 08/15/2023]
Abstract
Recent advancements in the artificial intelligence (AI) domain have revolutionized the early detection of cognitive impairments associated with dementia. This has motivated clinicians to use AI-powered dementia detection systems, particularly systems developed based on individuals' and patients' speech and language, for a quick and accurate identification of patients with dementia. This paper reviews articles about developing assessment tools using machine learning and deep learning algorithms trained by vocal and textual datasets.
Collapse
Affiliation(s)
- Mahboobeh Parsapoor
- Centre de Recherche Informatique de Montréal: CRIM, Montreal, Quebec, Canada
| |
Collapse
|
2
|
Walker G, Pevy N, O'Malley R, Mirheidari B, Reuber M, Christensen H, Blackburn DJ. Speech patterns in responses to questions asked by an intelligent virtual agent can help to distinguish between people with early stage neurodegenerative disorders and healthy controls. CLINICAL LINGUISTICS & PHONETICS 2023:1-22. [PMID: 37722818 DOI: 10.1080/02699206.2023.2254458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 08/28/2023] [Indexed: 09/20/2023]
Abstract
Previous research has provided strong evidence that speech patterns can help to distinguish between people with early stage neurodegenerative disorders (ND) and healthy controls. This study examined speech patterns in responses to questions asked by an intelligent virtual agent (IVA): a talking head on a computer which asks pre-recorded questions. The study investigated whether measures of response length, speech rate and pausing in responses to questions asked by an IVA help to distinguish between healthy control participants and people diagnosed with Mild Cognitive Impairment (MCI) or Alzheimer's disease (AD). The study also considered whether those measures can further help to distinguish between people with MCI, people with AD, and healthy control participants (HC). There were 38 people with ND (31 people with MCI, 7 people with AD) and 26 HC. All interactions took place in English. People with MCI spoke fewer words compared to HC, and people with AD and people with MCI spoke for less time than HC. People with AD spoke at a slower rate than people with MCI and HC. There were significant differences across all three groups for the proportion of time spent pausing and the average pause duration: silent pauses make up the greatest proportion of responses from people with AD, who also have the longest average silent pause duration, followed by people with MCI then HC. Therefore, the study demonstrates the potential of an IVA as a method for collecting data showing patterns which can help to distinguish between diagnostic groups.
Collapse
Affiliation(s)
- Gareth Walker
- School of English, University of Sheffield, Sheffield, UK
| | - Nathan Pevy
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Ronan O'Malley
- Department of Neuroscience, University of Sheffield, Sheffield, UK
| | - Bahman Mirheidari
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | - Markus Reuber
- Academic Neurology Unit, Royal Hallamshire Hospital, University of Sheffield, Sheffield, UK
| | - Heidi Christensen
- Department of Computer Science, University of Sheffield, Sheffield, UK
| | | |
Collapse
|
3
|
Pevy N, Christensen H, Walker T, Reuber M. Differentiating between epileptic and functional/dissociative seizures using semantic content analysis of transcripts of routine clinic consultations. Epilepsy Behav 2023; 143:109217. [PMID: 37119579 DOI: 10.1016/j.yebeh.2023.109217] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/02/2023] [Accepted: 04/04/2023] [Indexed: 05/01/2023]
Abstract
The common causes of Transient Loss of Consciousness (TLOC) are syncope, epilepsy, and functional/dissociative seizures (FDS). Simple, questionnaire-based decision-making tools for non-specialists who may have to deal with TLOC (such as clinicians working in primary or emergency care) reliably differentiate between patients who have experienced syncope and those who have had one or more seizures but are more limited in their ability to differentiate between epileptic seizures and FDS. Previous conversation analysis research has demonstrated that qualitative expert analysis of how people talk to clinicians about their seizures can help distinguish between these two TLOC causes. This paper investigates whether automated language analysis - using semantic categories measured by the Linguistic Inquiry and Word Count (LIWC) toolkit - can contribute to the distinction between epilepsy and FDS. Using patient-only talk manually transcribed from recordings of 58 routine doctor-patient clinic interactions, we compared the word frequencies for 21 semantic categories and explored the predictive performance of these categories using 5 different machine learning algorithms. Machine learning algorithms trained using the chosen semantic categories and leave-one-out cross-validation were able to predict the diagnosis with an accuracy of up to 81%. The results of this proof of principle study suggest that the analysis of semantic variables in seizure descriptions could improve clinical decision tools for patients presenting with TLOC.
Collapse
Affiliation(s)
- Nathan Pevy
- Department of Neuroscience, The University of Sheffield, United Kingdom.
| | - Heidi Christensen
- Department of Computer Science, The University of Sheffield, United Kingdom
| | - Traci Walker
- Division of Human Communication Sciences, The University of Sheffield, United Kingdom
| | - Markus Reuber
- Academic Neurology Unit, University of Sheffield, United Kingdom
| |
Collapse
|
4
|
Zhang J, Wu J, Qiu Y, Song A, Li W, Li X, Liu Y. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput Biol Med 2023; 153:106517. [PMID: 36623438 PMCID: PMC9814440 DOI: 10.1016/j.compbiomed.2022.106517] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 12/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
The growing and aging of the world population have driven the shortage of medical resources in recent years, especially during the COVID-19 pandemic. Fortunately, the rapid development of robotics and artificial intelligence technologies help to adapt to the challenges in the healthcare field. Among them, intelligent speech technology (IST) has served doctors and patients to improve the efficiency of medical behavior and alleviate the medical burden. However, problems like noise interference in complex medical scenarios and pronunciation differences between patients and healthy people hamper the broad application of IST in hospitals. In recent years, technologies such as machine learning have developed rapidly in intelligent speech recognition, which is expected to solve these problems. This paper first introduces IST's procedure and system architecture and analyzes its application in medical scenarios. Secondly, we review existing IST applications in smart hospitals in detail, including electronic medical documentation, disease diagnosis and evaluation, and human-medical equipment interaction. In addition, we elaborate on an application case of IST in the early recognition, diagnosis, rehabilitation training, evaluation, and daily care of stroke patients. Finally, we discuss IST's limitations, challenges, and future directions in the medical field. Furthermore, we propose a novel medical voice analysis system architecture that employs active hardware, active software, and human-computer interaction to realize intelligent and evolvable speech recognition. This comprehensive review and the proposed architecture offer directions for future studies on IST and its applications in smart hospitals.
Collapse
Affiliation(s)
- Jun Zhang
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China,Corresponding author
| | - Jingyue Wu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Yiyi Qiu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Aiguo Song
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Weifeng Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Xin Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Yecheng Liu
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100730, China
| |
Collapse
|
5
|
Ho SYC, Chien TW, Lin ML, Tsai KT. An app for predicting patient dementia classes using convolutional neural networks (CNN) and artificial neural networks (ANN): Comparison of prediction accuracy in Microsoft Excel. Medicine (Baltimore) 2023; 102:e32670. [PMID: 36705387 PMCID: PMC9875960 DOI: 10.1097/md.0000000000032670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
BACKGROUND Dementia is a progressive disease that worsens over time as cognitive abilities deteriorate. Effective preventive interventions require early detection. However, there are no reports in the literature concerning apps that have been developed and designed to predict patient dementia classes (DCs). This study aimed to develop an app that could predict DC automatically and accurately for patients responding to the clinical dementia rating (CDR) instrument. METHODS A CDR was applied to 366 outpatients in a hospital in Taiwan, with assessments on 25 and 49 items endorsed by patients and family members, respectively. The 2 models of convolutional neural networks (CNN) and artificial neural networks (ANN) were applied to examine the prediction accuracy based on 5 classes (i.e., no cognitive decline, very mild, mild, moderate, and severe) in 4 scenarios, consisting of 74 (items) in total, 25 in patients, 49 in family, and a combination strategy to select the best in the aforementioned scenarios using the forest plot. Using CDR scores in patients and their families on both axes, patients were dispersed on a radar plot. An app was developed to predict patient DC. RESULTS We found that ANN had higher accuracy rates than CNN with a ratio of 3:1 in the 4 scenarios. The highest accuracy rate (=93.72%) was shown in the combination scenario of ANN. A significant difference was observed between the CNN and ANN in terms of the accuracy rate. An available ANN-based app for predicting DC in patients was successfully developed and demonstrated in this study. CONCLUSION On the basis of a combination strategy and a decision rule, a 74-item ANN model with 285 estimated parameters was developed and included. The development of an app that will assist clinicians in predicting DC in clinical settings is required in the near future.
Collapse
Affiliation(s)
- Sam Yu-Chieh Ho
- Department of Emergency Medicine, Chi Mei Medical Center, Tainan, Taiwan
- Department of Geriatrics and Gerontology, Chi Mei Medical Center, Tainan, Taiwan
| | - Tsair-Wei Chien
- Department of Medical Research, Chi Mei Medical Center, Tainan, Taiwan
| | - Mei-Lien Lin
- Department of Examination Room, Chi Mei Medical Center, Tainan, Taiwan
| | - Kang-Ting Tsai
- Department of Geriatrics and Gerontology, Chi Mei Medical Center, Tainan, Taiwan
- Center for Integrative Medicine, Chi Mei Medical Center, Tainan, Taiwan
- Department of Nursing, Chung Hwa University of Medical Technology, Tainan, Taiwan.*
- * Correspondence: Kang-Ting Tsai, Department of Geriatrics and Gerontology, Chi-Mei Medical Center, 901 Chung Hwa Road, Yung Kung Dist., Tainan 710, Taiwan (e-mail: )
| |
Collapse
|
6
|
Mahon E, Lachman ME. Voice biomarkers as indicators of cognitive changes in middle and later adulthood. Neurobiol Aging 2022; 119:22-35. [PMID: 35964541 PMCID: PMC9487188 DOI: 10.1016/j.neurobiolaging.2022.06.010] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 06/20/2022] [Accepted: 06/26/2022] [Indexed: 11/20/2022]
Abstract
Voice prosody measures have been linked with Alzheimer's disease (AD), but it is unclear whether they are associated with normal cognitive aging. We assessed relationships between voice measures and 10-year cognitive changes in the MIDUS national sample of middle-aged and older adults ages 42-92, with a mean age of 64.09 (standard deviation = 11.23) at the second wave. Seven cognitive tests were assessed in 2003-2004 (Wave 2) and 2013-2014 (Wave 3). Voice measures were collected at Wave 3 (N = 2585) from audio recordings of the cognitive interviews. Analyses controlled for age, education, depressive symptoms, and health. As predicted, higher jitter was associated with greater declines in episodic memory, verbal fluency, and attention switching. Lower pulse was related to greater decline in episodic memory, and fewer voice breaks were related to greater declines in episodic memory and verbal fluency, although the direction of these effects was contrary to hypotheses. Findings suggest that voice biomarkers may offer a promising approach for early detection of risk factors for cognitive impairment or AD.
Collapse
Affiliation(s)
- Elizabeth Mahon
- Brandeis University, Department of Psychology, Waltham, MA, USA.
| | | |
Collapse
|
7
|
Liang X, Batsis JA, Zhu Y, Driesse TM, Roth RM, Kotz D, MacWhinney B. Evaluating Voice-Assistant Commands for Dementia Detection. COMPUT SPEECH LANG 2022; 72:101297. [PMID: 34764541 PMCID: PMC8577405 DOI: 10.1016/j.csl.2021.101297] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Early detection of cognitive decline involved in Alzheimer's Disease and Related Dementias (ADRD) in older adults living alone is essential for developing, planning, and initiating interventions and support systems to improve users' everyday function and quality of life. In this paper, we explore the voice commands using a Voice-Assistant System (VAS), i.e., Amazon Alexa, from 40 older adults who were either Healthy Control (HC) participants or Mild Cognitive Impairment (MCI) participants, age 65 or older. We evaluated the data collected from voice commands, cognitive assessments, and interviews and surveys using a structured protocol. We extracted 163 unique command-relevant features from each participant's use of the VAS. We then built machine-learning models including 1-layer/2-layer neural networks, support vector machines, decision tree, and random forest, for classification and comparison with standard cognitive assessment scores, e.g., Montreal Cognitive Assessment (MoCA). Our classification models using fusion features achieved an accuracy of 68%, and our regression model resulted in a Root-Mean-Square Error (RMSE) score of 3.53. Our Decision Tree (DT) and Random Forest (RF) models using selected features achieved higher classification accuracy 80-90%. Finally, we analyzed the contribution of each feature set to the model output, thus revealing the commands and features most useful in inferring the participants' cognitive status. We found that features of overall performance, features of music-related commands, features of call-related commands, and features from Automatic Speech Recognition (ASR) were the top-four feature sets most impactful on inference accuracy. The results from this controlled study demonstrate the promise of future home-based cognitive assessments using Voice-Assistant Systems.
Collapse
Affiliation(s)
- Xiaohui Liang
- Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Blvd., Boston, MA 02125-3393 USA
| | - John A Batsis
- Division of Geriatric Medicine, University of North Carolina at Chapel Hill, 5017 Old Clinic Building, Chapel Hill, NC 27599 USA
| | - Youxiang Zhu
- Department of Computer Science, University of Massachusetts Boston, 100 Morrissey Blvd., Boston, MA 02125-3393 USA
| | - Tiffany M Driesse
- Division of Geriatric Medicine, University of North Carolina at Chapel Hill, 5017 Old Clinic Building, Chapel Hill, NC 27599 USA
| | - Robert M Roth
- Department of Psychiatry, Geisel School of Medicine at Dartmouth/DHMC, Lebanon, NH 03756 USA
| | - David Kotz
- Department of Computer Science, Dartmouth College, Hanover, NH 03755 USA
| | - Brian MacWhinney
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue Pittsburgh, PA 15213 US
| |
Collapse
|
8
|
Li R, Wang X, Lawler K, Garg S, Bai Q, Alty J. Applications of Artificial Intelligence to aid detection of dementia: a scoping review on current capabilities and future directions. J Biomed Inform 2022; 127:104030. [DOI: 10.1016/j.jbi.2022.104030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/21/2022] [Accepted: 02/12/2022] [Indexed: 12/17/2022]
|
9
|
Verbal fluency in normal aging and cognitive decline: Results of a longitudinal study. COMPUT SPEECH LANG 2021. [DOI: 10.1016/j.csl.2021.101195] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
10
|
Nasreen S, Rohanian M, Hough J, Purver M. Alzheimer’s Dementia Recognition From Spontaneous Speech Using Disfluency and Interactional Features. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.640669] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Alzheimer’s disease (AD) is a progressive, neurodegenerative disorder mainly characterized by memory loss with deficits in other cognitive domains, including language, visuospatial abilities, and changes in behavior. Detecting diagnostic biomarkers that are noninvasive and cost-effective is of great value not only for clinical assessments and diagnostics but also for research purposes. Several previous studies have investigated AD diagnosis via the acoustic, lexical, syntactic, and semantic aspects of speech and language. Other studies include approaches from conversation analysis that look at more interactional aspects, showing that disfluencies such as fillers and repairs, and purely nonverbal features such as inter-speaker silence, can be key features of AD conversations. These kinds of features, if useful for diagnosis, may have many advantages: They are simple to extract and relatively language-, topic-, and task-independent. This study aims to quantify the role and contribution of these features of interaction structure in predicting whether a dialogue participant has AD. We used a subset of the Carolinas Conversation Collection dataset of patients with AD at moderate stage within the age range 60–89 and similar-aged non-AD patients with other health conditions. Our feature analysis comprised two sets: disfluency features, including indicators such as self-repairs and fillers, and interactional features, including overlaps, turn-taking behavior, and distributions of different types of silence both within patient speech and between patient and interviewer speech. Statistical analysis showed significant differences between AD and non-AD groups for several disfluency features (edit terms, verbatim repeats, and substitutions) and interactional features (lapses, gaps, attributable silences, turn switches per minute, standardized phonation time, and turn length). For the classification of AD patient conversations vs. non-AD patient conversations, we achieved 83% accuracy with disfluency features, 83% accuracy with interactional features, and an overall accuracy of 90% when combining both feature sets using support vector machine classifiers. The discriminative power of these features, perhaps combined with more conventional linguistic features, therefore shows potential for integration into noninvasive clinical assessments for AD at advanced stages.
Collapse
|
11
|
Martinc M, Haider F, Pollak S, Luz S. Temporal Integration of Text Transcripts and Acoustic Features for Alzheimer's Diagnosis Based on Spontaneous Speech. Front Aging Neurosci 2021; 13:642647. [PMID: 34194313 PMCID: PMC8236853 DOI: 10.3389/fnagi.2021.642647] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 05/11/2021] [Indexed: 11/20/2022] Open
Abstract
Background: Advances in machine learning (ML) technology have opened new avenues for detection and monitoring of cognitive decline. In this study, a multimodal approach to Alzheimer's dementia detection based on the patient's spontaneous speech is presented. This approach was tested on a standard, publicly available Alzheimer's speech dataset for comparability. The data comprise voice samples from 156 participants (1:1 ratio of Alzheimer's to control), matched by age and gender. Materials and Methods: A recently developed Active Data Representation (ADR) technique for voice processing was employed as a framework for fusion of acoustic and textual features at sentence and word level. Temporal aspects of textual features were investigated in conjunction with acoustic features in order to shed light on the temporal interplay between paralinguistic (acoustic) and linguistic (textual) aspects of Alzheimer's speech. Combinations between several configurations of ADR features and more traditional bag-of-n-grams approaches were used in an ensemble of classifiers built and evaluated on a standardised dataset containing recorded speech of scene descriptions and textual transcripts. Results: Employing only semantic bag-of-n-grams features, an accuracy of 89.58% was achieved in distinguishing between Alzheimer's patients and healthy controls. Adding temporal and structural information by combining bag-of-n-grams features with ADR audio/textual features, the accuracy could be improved to 91.67% on the test set. An accuracy of 93.75% was achieved through late fusion of the three best feature configurations, which corresponds to a 4.7% improvement over the best result reported in the literature for this dataset. Conclusion: The proposed combination of ADR audio and textual features is capable of successfully modelling temporal aspects of the data. The machine learning approach toward dementia detection achieves best performance when ADR features are combined with strong semantic bag-of-n-grams features. This combination leads to state-of-the-art performance on the AD classification task.
Collapse
Affiliation(s)
- Matej Martinc
- Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia
| | - Fasih Haider
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, United Kingdom
| | - Senja Pollak
- Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia
| | - Saturnino Luz
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
12
|
Pevy N, Christensen H, Walker T, Reuber M. Feasibility of using an automated analysis of formulation effort in patients' spoken seizure descriptions in the differential diagnosis of epileptic and nonepileptic seizures. Seizure 2021; 91:141-145. [PMID: 34157636 DOI: 10.1016/j.seizure.2021.06.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 05/17/2021] [Accepted: 06/08/2021] [Indexed: 11/28/2022] Open
Abstract
OBJECTIVE There are three common causes of Transient Loss of Consciousness (TLOC), syncope, epileptic and psychogenic nonepileptic seizures (PNES). Many individuals who have experienced TLOC initially receive an incorrect diagnosis and inappropriate treatment. Whereas syncope can be distinguished relatively easily with a small number of "yes"/"no" questions, the differentiation of the other two causes of TLOC is more challenging. Previous qualitative research based on the methodology of Conversation Analysis has demonstrated that the descriptions of epileptic seizures contain more formulation effort than accounts of PNES. This research investigates whether features likely to reflect the level of formulation effort can be automatically elicited from audio recordings and transcripts of speech and used to differentiate between epileptic and nonepileptic seizures. METHOD Verbatim transcripts of conversations between patients and neurologists were manually produced from video and audio recordings of 45 interactions (21 epilepsy and 24 PNES). The subsection of each transcript containing the person's account of their first seizure was manually extracted for the analysis. Seven automatically detectable features were designed as markers of formulation effort. These features were used to train a Random Forest machine learning classifier. RESULT There were significantly more hesitations and repetitions in descriptions of epileptic than nonepileptic seizures. Using a nested leave-one-out cross validation approach, 71% of seizures were correctly classified by the Random Forest classifier. DISCUSSION This pilot study provides proof of principle that linguistic features that have been automatically extracted from audio recordings and transcripts could be used to distinguish between epileptic seizures and PNES and thereby contribute to the differential diagnosis of TLOC. Future research should explore whether additional observations can be incorporated into a diagnostic stratification tool and compare the performance of these features when they are combined with additional information provided by patients and witnesses about seizure manifestations and medical history.
Collapse
Affiliation(s)
- Nathan Pevy
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield, Sheffield, United Kingdom.
| | - Heidi Christensen
- Department of Computer Science, University of Sheffield, Sheffield, United Kingdom
| | - Traci Walker
- Division of Human Communication Sciences, University of Sheffield, Sheffield, United Kingdom
| | - Markus Reuber
- Academic Neurology Unit, University of Sheffield, Royal Hallamshire Hospital, Sheffield, United Kingdom
| |
Collapse
|
13
|
Clarke N, Barrick TR, Garrard P. A Comparison of Connected Speech Tasks for Detecting Early Alzheimer’s Disease and Mild Cognitive Impairment Using Natural Language Processing and Machine Learning. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.634360] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Alzheimer’s disease (AD) has a long pre-clinical period, and so there is a crucial need for early detection, including of Mild Cognitive Impairment (MCI). Computational analysis of connected speech using Natural Language Processing and machine learning has been found to indicate disease and could be utilized as a rapid, scalable test for early diagnosis. However, there has been a focus on the Cookie Theft picture description task, which has been criticized. Fifty participants were recruited – 25 healthy controls (HC), 25 mild AD or MCI (AD+MCI) – and these completed five connected speech tasks: picture description, a conversational map reading task, recall of an overlearned narrative, procedural recall and narration of a wordless picture book. A high-dimensional set of linguistic features were automatically extracted from each transcript and used to train Support Vector Machines to classify groups. Performance varied, with accuracy for HC vs. AD+MCI classification ranging from 62% using picture book narration to 78% using overlearned narrative features. This study shows that, importantly, the conditions of the speech task have an impact on the discourse produced, which influences accuracy in detection of AD beyond the length of the sample. Further, we report the features important for classification using different tasks, showing that a focus on the Cookie Theft picture description task may narrow the understanding of how early AD pathology impacts speech.
Collapse
|
14
|
Zhu Y, Liang X, Batsis JA, Roth RM. Exploring Deep Transfer Learning Techniques for Alzheimer's Dementia Detection. FRONTIERS IN COMPUTER SCIENCE 2021; 3:624683. [PMID: 34046588 PMCID: PMC8153512 DOI: 10.3389/fcomp.2021.624683] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.
Collapse
Affiliation(s)
- Youxiang Zhu
- Computer Science, University of Massachusetts Boston, Boston, MA, USA
| | - Xiaohui Liang
- Computer Science, University of Massachusetts Boston, Boston, MA, USA
| | - John A. Batsis
- School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Robert M. Roth
- Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| |
Collapse
|
15
|
Jonell P, Moëll B, Håkansson K, Henter GE, Kucherenko T, Mikheeva O, Hagman G, Holleman J, Kivipelto M, Kjellström H, Gustafson J, Beskow J. Multimodal Capture of Patient Behaviour for Improved Detection of Early Dementia: Clinical Feasibility and Preliminary Results. FRONTIERS IN COMPUTER SCIENCE 2021. [DOI: 10.3389/fcomp.2021.642633] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Non-invasive automatic screening for Alzheimer’s disease has the potential to improve diagnostic accuracy while lowering healthcare costs. Previous research has shown that patterns in speech, language, gaze, and drawing can help detect early signs of cognitive decline. In this paper, we describe a highly multimodal system for unobtrusively capturing data during real clinical interviews conducted as part of cognitive assessments for Alzheimer’s disease. The system uses nine different sensor devices (smartphones, a tablet, an eye tracker, a microphone array, and a wristband) to record interaction data during a specialist’s first clinical interview with a patient, and is currently in use at Karolinska University Hospital in Stockholm, Sweden. Furthermore, complementary information in the form of brain imaging, psychological tests, speech therapist assessment, and clinical meta-data is also available for each patient. We detail our data-collection and analysis procedure and present preliminary findings that relate measures extracted from the multimodal recordings to clinical assessments and established biomarkers, based on data from 25 patients gathered thus far. Our findings demonstrate feasibility for our proposed methodology and indicate that the collected data can be used to improve clinical assessments of early dementia.
Collapse
|
16
|
Walker G, Morris LA, Christensen H, Mirheidari B, Reuber M, Blackburn DJ. Characterising spoken responses to an intelligent virtual agent by persons with mild cognitive impairment. CLINICAL LINGUISTICS & PHONETICS 2021; 35:237-252. [PMID: 32552087 DOI: 10.1080/02699206.2020.1777586] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Revised: 05/27/2020] [Accepted: 05/31/2020] [Indexed: 06/11/2023]
Abstract
The diagnosis of Mild Cognitive Impairment (MCI) characterises patients at risk of dementia and may provide an opportunity for disease-modifying interventions. Identifying persons with MCI (PwMCI) from adults of a similar age without cognitive complaints is a significant challenge. The main aims of this study were to determine whether generic speech differences were evident between PwMCI and healthy controls (HC), whether such differences were identifiable in responses to recent or remote memory questions, and to determine which speech variables showed the clearest between-group differences. This study analysed recordings of 8 PwMCI (5 females, 3 males) and 14 HC of a similar age (8 females, 6 males). Participants were recorded interacting with an intelligent virtual agent: a computer-generated talking head on a computer screen which asks pre-recorded questions when prompted by the interviewee through pressing the next key on a computer keyboard. Responses to recent and remote memory questions were analysed. Mann-Whitney U tests were used to test for statistically significant differences between PwMCI and HC on each of 12 speech variables, relating to temporal characteristics, number of words produced and pitch. It was found that compared to HC, PwMCI produce speech for less time and in shorter chunks, they pause more often and for longer, take longer to begin speaking and produce fewer words in their answers. It was also found that the PwMCI and HC were more alike when responding to remote memory questions than when responding to recent memory questions. These findings show great promise and suggest that detailed speech analysis can make an important contribution to diagnostic and stratification systems in patients with memory complaints.
Collapse
Affiliation(s)
- Gareth Walker
- School of English, University of Sheffield , Sheffield, UK
| | - Lee-Anne Morris
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield , Sheffield, UK
| | - Heidi Christensen
- Department of Computer Science, University of Sheffield , Sheffield, UK
| | - Bahman Mirheidari
- Department of Computer Science, University of Sheffield , Sheffield, UK
| | - Markus Reuber
- Academic Neurology Unit, Royal Hallamshire Hospital, University of Sheffield , Sheffield, UK
| | - Daniel J Blackburn
- Sheffield Institute for Translational Neuroscience (SITraN), University of Sheffield , Sheffield, UK
| |
Collapse
|
17
|
Clarke N, Foltz P, Garrard P. How to do things with (thousands of) words: Computational approaches to discourse analysis in Alzheimer's disease. Cortex 2020; 129:446-463. [PMID: 32622173 DOI: 10.1016/j.cortex.2020.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 01/30/2020] [Accepted: 05/07/2020] [Indexed: 12/28/2022]
Abstract
Natural Language Processing (NLP) is an ever-growing field of computational science that aims to model natural human language. Combined with advances in machine learning, which learns patterns in data, it offers practical capabilities including automated language analysis. These approaches have garnered interest from clinical researchers seeking to understand the breakdown of language due to pathological changes in the brain, offering fast, replicable and objective methods. The study of Alzheimer's disease (AD), and preclinical Mild Cognitive Impairment (MCI), suggests that changes in discourse (connected speech or writing) may be key to early detection of disease. There is currently no disease-modifying treatment for AD, the leading cause of dementia in people over the age of 65, but detection of those at risk of developing the disease could help with the identification and testing of medications which can take effect before the underlying pathology has irreversibly spread. We outline important components of natural language, as well as NLP tools and approaches with which they can be extracted, analysed and used for disease identification and risk prediction. We review literature using these tools to model discourse across the spectrum of AD, including the contribution of machine learning approaches and Automatic Speech Recognition (ASR). We conclude that NLP and machine learning techniques are starting to greatly enhance research in the field, with measurable and quantifiable language components showing promise for early detection of disease, but there remain research and practical challenges for clinical implementation of these approaches. Challenges discussed include the availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability.
Collapse
Affiliation(s)
- Natasha Clarke
- Neurosciences Research Centre, Molecular & Clinical Sciences Research Institute, St George's, University of London, Cranmer Terrace, London, UK.
| | - Peter Foltz
- Institute of Cognitive Science, University of Colorado, Boulder, USA.
| | - Peter Garrard
- Neurosciences Research Centre, Molecular & Clinical Sciences Research Institute, St George's, University of London, Cranmer Terrace, London, UK.
| |
Collapse
|
18
|
Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, Gigante A, Valencia A, Rementeria MJ, Chadha AS, Mavridis N. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med 2020; 3:81. [PMID: 32529043 PMCID: PMC7264169 DOI: 10.1038/s41746-020-0288-5] [Citation(s) in RCA: 153] [Impact Index Per Article: 38.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 04/28/2020] [Indexed: 01/10/2023] Open
Abstract
Precision Medicine implies a deep understanding of inter-individual differences in health and disease that are due to genetic and environmental factors. To acquire such understanding there is a need for the implementation of different types of technologies based on artificial intelligence (AI) that enable the identification of biomedically relevant patterns, facilitating progress towards individually tailored preventative and therapeutic interventions. Despite the significant scientific advances achieved so far, most of the currently used biomedical AI technologies do not account for bias detection. Furthermore, the design of the majority of algorithms ignore the sex and gender dimension and its contribution to health and disease differences among individuals. Failure in accounting for these differences will generate sub-optimal results and produce mistakes as well as discriminatory outcomes. In this review we examine the current sex and gender gaps in a subset of biomedical technologies used in relation to Precision Medicine. In addition, we provide recommendations to optimize their utilization to improve the global health and disease landscape and decrease inequalities.
Collapse
Affiliation(s)
- Davide Cirillo
- Barcelona Supercomputing Center (BSC), C/ Jordi Girona, 29, 08034 Barcelona, Spain
| | - Silvina Catuara-Solarz
- Telefonica Innovation Alpha Health, Torre Telefonica, Plaça d’Ernest Lluch i Martin, 5, 08019 Barcelona, Spain
- The Women’s Brain Project (WBP), Guntershausen, Switzerland
| | - Czuee Morey
- The Women’s Brain Project (WBP), Guntershausen, Switzerland
- Wega Informatik AG, Aeschengraben 20, CH-4051 Basel, Switzerland
| | - Emre Guney
- Research Programme on Biomedical Informatics (GRIB), Hospital del Mar Research Institute and Pompeu Fabra University, Dr. Aiguader, 88, 08003 Barcelona, Spain
| | - Laia Subirats
- Eurecat - Centre Tecnològic de Catalunya, C/ Bilbao, 72, Edifici A, 08005 Barcelona, Spain
- eHealth Center, Universitat Oberta de Catalunya, Rambla del Poblenou, 156, 08018 Barcelona, Spain
| | - Simona Mellino
- The Women’s Brain Project (WBP), Guntershausen, Switzerland
| | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), C/ Jordi Girona, 29, 08034 Barcelona, Spain
- ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | | | | | - Nikolaos Mavridis
- The Women’s Brain Project (WBP), Guntershausen, Switzerland
- Interactive Robots and Media Laboratory (IRML), Abu Dhabi, United Arab Emirates
| |
Collapse
|
19
|
de la Fuente Garcia S, Ritchie CW, Luz S. Artificial Intelligence, Speech, and Language Processing Approaches to Monitoring Alzheimer's Disease: A Systematic Review. J Alzheimers Dis 2020; 78:1547-1574. [PMID: 33185605 PMCID: PMC7836050 DOI: 10.3233/jad-200888] [Citation(s) in RCA: 63] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
BACKGROUND Language is a valuable source of clinical information in Alzheimer's disease, as it declines concurrently with neurodegeneration. Consequently, speech and language data have been extensively studied in connection with its diagnosis. OBJECTIVE Firstly, to summarize the existing findings on the use of artificial intelligence, speech, and language processing to predict cognitive decline in the context of Alzheimer's disease. Secondly, to detail current research procedures, highlight their limitations, and suggest strategies to address them. METHODS Systematic review of original research between 2000 and 2019, registered in PROSPERO (reference CRD42018116606). An interdisciplinary search covered six databases on engineering (ACM and IEEE), psychology (PsycINFO), medicine (PubMed and Embase), and Web of Science. Bibliographies of relevant papers were screened until December 2019. RESULTS From 3,654 search results, 51 articles were selected against the eligibility criteria. Four tables summarize their findings: study details (aim, population, interventions, comparisons, methods, and outcomes), data details (size, type, modalities, annotation, balance, availability, and language of study), methodology (pre-processing, feature generation, machine learning, evaluation, and results), and clinical applicability (research implications, clinical potential, risk of bias, and strengths/limitations). CONCLUSION Promising results are reported across nearly all 51 studies, but very few have been implemented in clinical research or practice. The main limitations of the field are poor standardization, limited comparability of results, and a degree of disconnect between study aims and clinical applications. Active attempts to close these gaps will support translation of future research into clinical practice.
Collapse
Affiliation(s)
| | - Craig W. Ritchie
- Centre for Clinical Brain Sciences, The University of Edinburgh, Scotland, UK
| | - Saturnino Luz
- Usher Institute, Edinburgh Medical School, The University of Edinburgh, Scotland, UK
| |
Collapse
|