1
|
Pourramezan Fard A, Mahoor MH, Alsuhaibani M, Dodge HH. Linguistic-based Mild Cognitive Impairment detection using Informative Loss. Comput Biol Med 2024; 176:108606. [PMID: 38763068 DOI: 10.1016/j.compbiomed.2024.108606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 04/17/2024] [Accepted: 05/11/2024] [Indexed: 05/21/2024]
Abstract
This paper presents a deep learning method using Natural Language Processing (NLP) techniques, to distinguish between Mild Cognitive Impairment (MCI) and Normal Cognitive (NC) conditions in older adults. We propose a framework that analyzes transcripts generated from video interviews collected within the I-CONECT study project, a randomized controlled trial aimed at improving cognitive functions through video chats. Our proposed NLP framework consists of two Transformer-based modules, namely Sentence Embedding (SE) and Sentence Cross Attention (SCA). First, the SE module captures contextual relationships between words within each sentence. Subsequently, the SCA module extracts temporal features from a sequence of sentences. This feature is then used by a Multi-Layer Perceptron (MLP) for the classification of subjects into MCI or NC. To build a robust model, we propose a novel loss function, called InfoLoss, that considers the reduction in entropy by observing each sequence of sentences to ultimately enhance the classification accuracy. The results of our comprehensive model evaluation using the I-CONECT dataset show that our framework can distinguish between MCI and NC with an average area under the curve of 84.75%.
Collapse
Affiliation(s)
- Ali Pourramezan Fard
- Ritchie School of Engineering and Computer Science, University of Denver, Denver, CO 80208, USA.
| | - Mohammad H Mahoor
- Ritchie School of Engineering and Computer Science, University of Denver, Denver, CO 80208, USA; DreamFace Technologies LLC, Centennial, CO 8011, USA.
| | - Muath Alsuhaibani
- Ritchie School of Engineering and Computer Science, University of Denver, Denver, CO 80208, USA; Department of Electrical Engineering, Prince Sattam Bin Abdulaziz University, Al-Kharj, 11942, Saudi Arabia.
| | - Hiroko H Dodge
- Department of Neurology, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA.
| |
Collapse
|
2
|
Ambrosini E, Giangregorio C, Lomurno E, Moccia S, Milis M, Loizou C, Azzolino D, Cesari M, Cid Gala M, Galán de Isla C, Gomez-Raja J, Borghese NA, Matteucci M, Ferrante S. Automatic Spontaneous Speech Analysis for the Detection of Cognitive Functional Decline in Older Adults: Multilanguage Cross-Sectional Study. JMIR Aging 2024; 7:e50537. [PMID: 38386279 DOI: 10.2196/50537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 12/18/2023] [Accepted: 02/12/2024] [Indexed: 02/23/2024] Open
Abstract
BACKGROUND The rise in life expectancy is associated with an increase in long-term and gradual cognitive decline. Treatment effectiveness is enhanced at the early stage of the disease. Therefore, there is a need to find low-cost and ecological solutions for mass screening of community-dwelling older adults. OBJECTIVE This work aims to exploit automatic analysis of free speech to identify signs of cognitive function decline. METHODS A sample of 266 participants older than 65 years were recruited in Italy and Spain and were divided into 3 groups according to their Mini-Mental Status Examination (MMSE) scores. People were asked to tell a story and describe a picture, and voice recordings were used to extract high-level features on different time scales automatically. Based on these features, machine learning algorithms were trained to solve binary and multiclass classification problems by using both mono- and cross-lingual approaches. The algorithms were enriched using Shapley Additive Explanations for model explainability. RESULTS In the Italian data set, healthy participants (MMSE score≥27) were automatically discriminated from participants with mildly impaired cognitive function (20≤MMSE score≤26) and from those with moderate to severe impairment of cognitive function (11≤MMSE score≤19) with accuracy of 80% and 86%, respectively. Slightly lower performance was achieved in the Spanish and multilanguage data sets. CONCLUSIONS This work proposes a transparent and unobtrusive assessment method, which might be included in a mobile app for large-scale monitoring of cognitive functionality in older adults. Voice is confirmed to be an important biomarker of cognitive decline due to its noninvasive and easily accessible nature.
Collapse
Affiliation(s)
- Emilia Ambrosini
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
| | - Chiara Giangregorio
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
| | - Eugenio Lomurno
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
| | - Sara Moccia
- BioRobotics Institute and Department of Excellence in Robotics and AI, Scuola Superiore Sant'Anna, Pisa, Italy
| | | | - Christos Loizou
- Department of Electrical Engineering, Computer Engineering and Informatics, Cyprus University of Technology, Limassol, Cyprus
| | - Domenico Azzolino
- Geriatric Unit, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Ca' Granda Ospedale Maggiore Policlinico, Milano, Italy
| | - Matteo Cesari
- Ageing and Health Unit, Department of Maternal, Newborn, Child, Adolescent Health and Ageing, World Health Organization, Geneva, Switzerland
| | - Manuel Cid Gala
- Consejería de Sanidad y Servicios Sociales, Junta de Extremadura, Merida, Spain
| | | | - Jonathan Gomez-Raja
- Consejería de Sanidad y Servicios Sociales, Junta de Extremadura, Merida, Spain
| | | | - Matteo Matteucci
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
| | - Simona Ferrante
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milano, Italy
- Laboratory of E-Health Technologies and Artificial Intelligence Research in Neurology, Joint Research Platform, Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Istituto Neurologico Carlo Besta, Milano, Italy
| |
Collapse
|
3
|
Xu X, Li J, Zhu Z, Zhao L, Wang H, Song C, Chen Y, Zhao Q, Yang J, Pei Y. A Comprehensive Review on Synergy of Multi-Modal Data and AI Technologies in Medical Diagnosis. Bioengineering (Basel) 2024; 11:219. [PMID: 38534493 DOI: 10.3390/bioengineering11030219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 02/15/2024] [Accepted: 02/21/2024] [Indexed: 03/28/2024] Open
Abstract
Disease diagnosis represents a critical and arduous endeavor within the medical field. Artificial intelligence (AI) techniques, spanning from machine learning and deep learning to large model paradigms, stand poised to significantly augment physicians in rendering more evidence-based decisions, thus presenting a pioneering solution for clinical practice. Traditionally, the amalgamation of diverse medical data modalities (e.g., image, text, speech, genetic data, physiological signals) is imperative to facilitate a comprehensive disease analysis, a topic of burgeoning interest among both researchers and clinicians in recent times. Hence, there exists a pressing need to synthesize the latest strides in multi-modal data and AI technologies in the realm of medical diagnosis. In this paper, we narrow our focus to five specific disorders (Alzheimer's disease, breast cancer, depression, heart disease, epilepsy), elucidating advanced endeavors in their diagnosis and treatment through the lens of artificial intelligence. Our survey not only delineates detailed diagnostic methodologies across varying modalities but also underscores commonly utilized public datasets, the intricacies of feature engineering, prevalent classification models, and envisaged challenges for future endeavors. In essence, our research endeavors to contribute to the advancement of diagnostic methodologies, furnishing invaluable insights for clinical decision making.
Collapse
Affiliation(s)
- Xi Xu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jianqiang Li
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Zhichao Zhu
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Linna Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Huina Wang
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Changwei Song
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Yining Chen
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Qing Zhao
- Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
| | - Jijiang Yang
- Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing 100084, China
| | - Yan Pei
- School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu 965-8580, Japan
| |
Collapse
|
4
|
Bucholc M, James C, Khleifat AA, Badhwar A, Clarke N, Dehsarvi A, Madan CR, Marzi SJ, Shand C, Schilder BM, Tamburin S, Tantiangco HM, Lourida I, Llewellyn DJ, Ranson JM. Artificial intelligence for dementia research methods optimization. Alzheimers Dement 2023; 19:5934-5951. [PMID: 37639369 DOI: 10.1002/alz.13441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Revised: 07/19/2023] [Accepted: 07/23/2023] [Indexed: 08/31/2023]
Abstract
Artificial intelligence (AI) and machine learning (ML) approaches are increasingly being used in dementia research. However, several methodological challenges exist that may limit the insights we can obtain from high-dimensional data and our ability to translate these findings into improved patient outcomes. To improve reproducibility and replicability, researchers should make their well-documented code and modeling pipelines openly available. Data should also be shared where appropriate. To enhance the acceptability of models and AI-enabled systems to users, researchers should prioritize interpretable methods that provide insights into how decisions are generated. Models should be developed using multiple, diverse datasets to improve robustness, generalizability, and reduce potentially harmful bias. To improve clarity and reproducibility, researchers should adhere to reporting guidelines that are co-produced with multiple stakeholders. If these methodological challenges are overcome, AI and ML hold enormous promise for changing the landscape of dementia research and care. HIGHLIGHTS: Machine learning (ML) can improve diagnosis, prevention, and management of dementia. Inadequate reporting of ML procedures affects reproduction/replication of results. ML models built on unrepresentative datasets do not generalize to new datasets. Obligatory metrics for certain model structures and use cases have not been defined. Interpretability and trust in ML predictions are barriers to clinical translation.
Collapse
Affiliation(s)
- Magda Bucholc
- Cognitive Analytics Research Lab, School of Computing, Engineering & Intelligent Systems, Ulster University, Derry, UK
| | - Charlotte James
- NIHR Bristol Biomedical Research Centre, University Hospitals Bristol and Weston NHS Foundation Trust and University of Bristol, Bristol, UK
| | - Ahmad Al Khleifat
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - AmanPreet Badhwar
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
- Institut de génie biomédical, Université de Montréal, Montréal, Quebec, Canada
- Département de Pharmacologie et Physiologie, Université de Montréal, Montréal, Quebec, Canada
| | - Natasha Clarke
- Multiomics Investigation of Neurodegenerative Diseases (MIND) Lab, Centre de Recherche de l'Institut Universitaire de Gériatrie de Montréal, Montréal, Quebec, Canada
| | - Amir Dehsarvi
- Aberdeen Biomedical Imaging Centre, School of Medicine, Medical Sciences, and Nutrition, University of Aberdeen, Aberdeen, UK
| | | | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Cameron Shand
- Centre for Medical Image Computing, Department of Computer Science, University College London, London, UK
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, University of Verona, Verona, Italy
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| | | |
Collapse
|
5
|
Shi M, Cheung G, Shahamiri SR. Speech and language processing with deep learning for dementia diagnosis: A systematic review. Psychiatry Res 2023; 329:115538. [PMID: 37864994 DOI: 10.1016/j.psychres.2023.115538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 10/06/2023] [Accepted: 10/08/2023] [Indexed: 10/23/2023]
Abstract
Dementia is a progressive neurodegenerative disease that burdens the person living with the disease, their families, and medical and social services. Timely diagnosis of dementia could be followed by introducing interventions that may slow down its progression or reduce its burdens. However, the diagnostic process of dementia is often complex and resource intensive. Access to diagnostic services is also an issue in low and middle-income countries. The abundance and easy accessibility of speech and language data have created new possibilities for utilizing Deep Learning (DL) technologies to be part of the dementia diagnostic process. This systematic review included studies published between 2012-2022 that utilized such technologies to aid in diagnosing dementia. We identified 72 studies using the PRISMA 2020 protocol, extracted and analyzed data from these studies and reported the related DL technologies. We found these technologies effectively differentiated between healthy individuals and those with a dementia diagnosis, highlighting their potential in the diagnosis of dementia. This systematic review provides insights into the contributions of DL-based speech and language techniques to support the dementia diagnostic process. It also offers an understanding of the advancements made in this field thus far and highlights some challenges that still need to be addressed.
Collapse
Affiliation(s)
- Mengke Shi
- Department of Electrical, Computer and Software Engineering, Faculty of Engineering, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand
| | - Gary Cheung
- Department of Psychological Medicine, Faculty of Medical and Health Sciences, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand
| | - Seyed Reza Shahamiri
- Department of Electrical, Computer and Software Engineering, Faculty of Engineering, University of Auckland, Private Bag 92019, Building 405, Level 6, Room 669, 3 Garfton Road, Auckland 1142, New Zealand.
| |
Collapse
|
6
|
Zolnoori M, Zolnour A, Topaz M. ADscreen: A speech processing-based screening system for automatic identification of patients with Alzheimer's disease and related dementia. Artif Intell Med 2023; 143:102624. [PMID: 37673583 PMCID: PMC10483114 DOI: 10.1016/j.artmed.2023.102624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 06/22/2023] [Accepted: 07/08/2023] [Indexed: 09/08/2023]
Abstract
Alzheimer's disease and related dementias (ADRD) present a looming public health crisis, affecting roughly 5 million people and 11 % of older adults in the United States. Despite nationwide efforts for timely diagnosis of patients with ADRD, >50 % of them are not diagnosed and unaware of their disease. To address this challenge, we developed ADscreen, an innovative speech-processing based ADRD screening algorithm for the protective identification of patients with ADRD. ADscreen consists of five major components: (i) noise reduction for reducing background noises from the audio-recorded patient speech, (ii) modeling the patient's ability in phonetic motor planning using acoustic parameters of the patient's voice, (iii) modeling the patient's ability in semantic and syntactic levels of language organization using linguistic parameters of the patient speech, (iv) extracting vocal and semantic psycholinguistic cues from the patient speech, and (v) building and evaluating the screening algorithm. To identify important speech parameters (features) associated with ADRD, we used the Joint Mutual Information Maximization (JMIM), an effective feature selection method for high dimensional, small sample size datasets. Modeling the relationship between speech parameters and the outcome variable (presence/absence of ADRD) was conducted using three different machine learning (ML) architectures with the capability of joining informative acoustic and linguistic with contextual word embedding vectors obtained from the DistilBERT (Bidirectional Encoder Representations from Transformers). We evaluated the performance of the ADscreen on an audio-recorded patients' speech (verbal description) for the Cookie-Theft picture description task, which is publicly available in the dementia databank. The joint fusion of acoustic and linguistic parameters with contextual word embedding vectors of DistilBERT achieved F1-score = 84.64 (standard deviation [std] = ±3.58) and AUC-ROC = 92.53 (std = ±3.34) for training dataset, and F1-score = 89.55 and AUC-ROC = 93.89 for the test dataset. In summary, ADscreen has a strong potential to be integrated with clinical workflow to address the need for an ADRD screening tool so that patients with cognitive impairment can receive appropriate and timely care.
Collapse
Affiliation(s)
- Maryam Zolnoori
- Columbia University Medical Center, New York, NY, United States of America; School of Nursing, Columbia University, New York, NY, United States of America.
| | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Maxim Topaz
- Columbia University Medical Center, New York, NY, United States of America; School of Nursing, Columbia University, New York, NY, United States of America
| |
Collapse
|
7
|
Teng Y, Yuan Q, Wu Y, Wu S, Su J, Zhang P, Zhang Y. Research on the Chemical Constituents against Alzheimer's Disease of the Fruits of Physalis alkekengi L. var. franchetii (Mast.) Makino. Chem Biodivers 2023; 20:e202301075. [PMID: 37505462 DOI: 10.1002/cbdv.202301075] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 07/28/2023] [Accepted: 07/28/2023] [Indexed: 07/29/2023]
Abstract
Physalis alkekengi L. var. franchetii (Mast.) Makino (PA) is a natural plant which is utilised as a traditional herbal medicine. It has properties that make it effective against inflammation and free radical damage. In the present study, the major constituents of four extraction parts of the fruits of PA (PAF) were investigated by combining ultra-performance liquid chromatography with quadrupole time-of-flight mass spectrometry (UPLC-Q-TOF-MS). The mice model of Alzheimer's disease (AD) induced by aluminum chloride (AlCl3 ) combined with D-galactose (D-gal) was established to comprehend the mechanism behind PAF's anti-AD activity from both behavioural and pathological perspectives. The results showed that four extraction parts of PAF (PAFE) had favorable anti-AD effects and the ethyl acetate (EA) group showed the best activity. UPLC-Q-TOF-MS analysis identified Physalin B, Nobiletin and Caffeic acid as the main anti-AD active constituents in EA extract. This study reveals that PAF can reduce neuroinflammatory damage by inhibiting p38 mitogen-activated protein kinase (p38 MAPK) signaling pathway, which is the theoretical basis for clinical development and utilization of PAF in AD therapy.
Collapse
Affiliation(s)
- Yang Teng
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
- Heilongjiang Pharmaceutical Research Institute, Jiamusi, 154007, China
| | - Qi Yuan
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
| | - You Wu
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
| | - Shuang Wu
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
| | - Jin Su
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
- Heilongjiang Pharmaceutical Research Institute, Jiamusi, 154007, China
| | - Pengxia Zhang
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
- Heilongjiang Pharmaceutical Research Institute, Jiamusi, 154007, China
| | - Yu Zhang
- College of Pharmacy, Jiamusi University, Jiamusi, 154007, China
- Heilongjiang Pharmaceutical Research Institute, Jiamusi, 154007, China
| |
Collapse
|
8
|
Idrisoglu A, Dallora AL, Anderberg P, Berglund JS. Applied Machine Learning Techniques to Diagnose Voice-Affecting Conditions and Disorders: Systematic Literature Review. J Med Internet Res 2023; 25:e46105. [PMID: 37467031 PMCID: PMC10398366 DOI: 10.2196/46105] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 04/26/2023] [Accepted: 05/23/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Normal voice production depends on the synchronized cooperation of multiple physiological systems, which makes the voice sensitive to changes. Any systematic, neurological, and aerodigestive distortion is prone to affect voice production through reduced cognitive, pulmonary, and muscular functionality. This sensitivity inspired using voice as a biomarker to examine disorders that affect the voice. Technological improvements and emerging machine learning (ML) technologies have enabled possibilities of extracting digital vocal features from the voice for automated diagnosis and monitoring systems. OBJECTIVE This study aims to summarize a comprehensive view of research on voice-affecting disorders that uses ML techniques for diagnosis and monitoring through voice samples where systematic conditions, nonlaryngeal aerodigestive disorders, and neurological disorders are specifically of interest. METHODS This systematic literature review (SLR) investigated the state of the art of voice-based diagnostic and monitoring systems with ML technologies, targeting voice-affecting disorders without direct relation to the voice box from the point of view of applied health technology. Through a comprehensive search string, studies published from 2012 to 2022 from the databases Scopus, PubMed, and Web of Science were scanned and collected for assessment. To minimize bias, retrieval of the relevant references in other studies in the field was ensured, and 2 authors assessed the collected studies. Low-quality studies were removed through a quality assessment and relevant data were extracted through summary tables for analysis. The articles were checked for similarities between author groups to prevent cumulative redundancy bias during the screening process, where only 1 article was included from the same author group. RESULTS In the analysis of the 145 included studies, support vector machines were the most utilized ML technique (51/145, 35.2%), with the most studied disease being Parkinson disease (PD; reported in 87/145, 60%, studies). After 2017, 16 additional voice-affecting disorders were examined, in contrast to the 3 investigated previously. Furthermore, an upsurge in the use of artificial neural network-based architectures was observed after 2017. Almost half of the included studies were published in last 2 years (2021 and 2022). A broad interest from many countries was observed. Notably, nearly one-half (n=75) of the studies relied on 10 distinct data sets, and 11/145 (7.6%) used demographic data as an input for ML models. CONCLUSIONS This SLR revealed considerable interest across multiple countries in using ML techniques for diagnosing and monitoring voice-affecting disorders, with PD being the most studied disorder. However, the review identified several gaps, including limited and unbalanced data set usage in studies, and a focus on diagnostic test rather than disorder-specific monitoring. Despite the limitations of being constrained by only peer-reviewed publications written in English, the SLR provides valuable insights into the current state of research on ML-based voice-affecting disorder diagnosis and monitoring and highlighting areas to address in future research.
Collapse
Affiliation(s)
- Alper Idrisoglu
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Ana Luiza Dallora
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
| | - Peter Anderberg
- Department of Health, Blekinge Institute of Technology, Karslkrona, Sweden
- School of Health Sciences, University of Skövde, Skövde, Sweden
| | | |
Collapse
|
9
|
Tang L, Zhang Z, Feng F, Yang LZ, Li H. Explainable Alzheimer's Disease Detection Using Linguistic Features from Automatic Speech Recognition. Dement Geriatr Cogn Disord 2023; 52:240-248. [PMID: 37433284 DOI: 10.1159/000531818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 06/29/2023] [Indexed: 07/13/2023] Open
Abstract
INTRODUCTION Alzheimer's disease (AD) is the most prevalent type of dementia and can cause abnormal cognitive function and progressive loss of essential life skills. Early screening is thus necessary for the prevention and intervention of AD. Speech dysfunction is an early onset symptom of AD patients. Recent studies have demonstrated the promise of automated acoustic assessment using acoustic or linguistic features extracted from speech. However, most previous studies have relied on manual transcription of text to extract linguistic features, which weakens the efficiency of automated assessment. The present study thus investigates the effectiveness of automatic speech recognition (ASR) in building an end-to-end automated speech analysis model for AD detection. METHODS We implemented three publicly available ASR engines and compared the classification performance using the ADReSS-IS2020 dataset. Besides, the SHapley Additive exPlanations algorithm was then used to identify critical features that contributed most to model performance. RESULTS Three automatic transcription tools obtained mean word error rate texts of 32%, 43%, and 40%, respectively. These automated texts achieved similar or even better results than manual texts in model performance for detecting dementia, achieving classification accuracies of 89.58%, 83.33%, and 81.25%, respectively. CONCLUSION Our best model, using ensemble learning, is comparable to the state-of-the-art manual transcription-based methods, suggesting the possibility of an end-to-end medical assistance system for AD detection with ASR engines. Moreover, the critical linguistic features might provide insight into further studies on the mechanism of AD.
Collapse
Affiliation(s)
- Lijuan Tang
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Zhenglin Zhang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Feifan Feng
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Department of Biomedical Engineering, Anhui Medical University, Hefei, China
| | - Li-Zhuang Yang
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| | - Hai Li
- Anhui Province Key Laboratory of Medical Physics and Technology, Institute of Health and Medical Technology, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
- Hefei Cancer Hospital, Chinese Academy of Sciences, Hefei, China
- University of Science and Technology of China, Hefei, China
| |
Collapse
|
10
|
Detecting dementia from speech and transcripts using transformers. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
11
|
Yang Q, Li X, Ding X, Xu F, Ling Z. Deep learning-based speech analysis for Alzheimer's disease detection: a literature review. Alzheimers Res Ther 2022; 14:186. [PMID: 36517837 PMCID: PMC9749308 DOI: 10.1186/s13195-022-01131-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 11/23/2022] [Indexed: 12/23/2022]
Abstract
BACKGROUND Alzheimer's disease has become one of the most common neurodegenerative diseases worldwide, which seriously affects the health of the elderly. Early detection and intervention are the most effective prevention methods currently. Compared with traditional detection methods such as traditional scale tests, electroencephalograms, and magnetic resonance imaging, speech analysis is more convenient for automatic large-scale Alzheimer's disease detection and has attracted extensive attention from researchers. In particular, deep learning-based speech analysis and language processing techniques for Alzheimer's disease detection have been studied and achieved impressive results. METHODS To integrate the latest research progresses, hundreds of relevant papers from ACM, DBLP, IEEE, PubMed, Scopus, Web of Science electronic databases, and other sources were retrieved. We used these keywords for paper search: (Alzheimer OR dementia OR cognitive impairment) AND (speech OR voice OR audio) AND (deep learning OR neural network). CONCLUSIONS Fifty-two papers were finally retained after screening. We reviewed and presented the speech databases, deep learning methods, and model performances of these studies. In the end, we pointed out the mainstreams and limitations in the current studies and provided a direction for future research.
Collapse
Affiliation(s)
- Qin Yang
- iFlytek Research, iFlytek Co.Ltd, Hefei, China
| | - Xin Li
- NELSLIP, University of Science and Technology of China, Hefei, China.
- iFlytek Research, iFlytek Co.Ltd, Hefei, China.
| | - Xinyun Ding
- iFlytek Research, iFlytek Co.Ltd, Hefei, China
| | - Feiyang Xu
- iFlytek Research, iFlytek Co.Ltd, Hefei, China
| | - Zhenhua Ling
- NELSLIP, University of Science and Technology of China, Hefei, China
| |
Collapse
|
12
|
Meng W, Zhang Q, Ma S, Cai M, Liu D, Liu Z, Yang J. A lightweight CNN and Transformer hybrid model for mental retardation screening among children from spontaneous speech. Comput Biol Med 2022; 151:106281. [PMID: 36399858 DOI: 10.1016/j.compbiomed.2022.106281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 10/17/2022] [Accepted: 10/30/2022] [Indexed: 11/06/2022]
Abstract
Mental retardation (MR) is a group of mental disorders characterized by low intelligence and social adjustment difficulties. Early diagnosis is beneficial for the timely intervention of children with MR to ease the degree of disability. Children with MR always have impaired speech functions compared to normal children, which is significant for clinical diagnosis. On the basis of this, our study proposes a spontaneous speech-based framework (MT-Net) for screening MR, which merges mobile inverted bottleneck convolutional blocks (MBConv) and visual Transformer blocks. MT-Net takes log-mel spectrograms converted from raw interview speech as data source, and utilizes MBConv and visual Transformer to learn low-level and high-level features well. In addition, SpecAugment, a data augmentation strategy, has been used to expand our audio dataset to further enhance the performance of MT-Net. The experimental results show that our proposed MT-Net outperforms Transformer networks (ViT) and convolutional neural networks (ResNet18, MobileNetV2, EfficientNetV2), achieving accuracy of 91.60% after using SpecAugment. Our proposed MT-Net has fewer parameters, low computing consumption and high prediction accuracy, which is expected to be an auxiliary screening tool for MR.
Collapse
Affiliation(s)
- Wei Meng
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Qianhong Zhang
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Simeng Ma
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Mincheng Cai
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China
| | - Dujuan Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Zhongchun Liu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China.
| | - Jun Yang
- School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China.
| |
Collapse
|
13
|
Ilias L, Askounis D. Multimodal Deep Learning Models for Detecting Dementia From Speech and Transcripts. Front Aging Neurosci 2022; 14:830943. [PMID: 35370608 PMCID: PMC8969102 DOI: 10.3389/fnagi.2022.830943] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 02/17/2022] [Indexed: 11/13/2022] Open
Abstract
Alzheimer's dementia (AD) entails negative psychological, social, and economic consequences not only for the patients but also for their families, relatives, and society in general. Despite the significance of this phenomenon and the importance for an early diagnosis, there are still limitations. Specifically, the main limitation is pertinent to the way the modalities of speech and transcripts are combined in a single neural network. Existing research works add/concatenate the image and text representations, employ majority voting approaches or average the predictions after training many textual and speech models separately. To address these limitations, in this article we present some new methods to detect AD patients and predict the Mini-Mental State Examination (MMSE) scores in an end-to-end trainable manner consisting of a combination of BERT, Vision Transformer, Co-Attention, Multimodal Shifting Gate, and a variant of the self-attention mechanism. Specifically, we convert audio to Log-Mel spectrograms, their delta, and delta-delta (acceleration values). First, we pass each transcript and image through a BERT model and Vision Transformer, respectively, adding a co-attention layer at the top, which generates image and word attention simultaneously. Secondly, we propose an architecture, which integrates multimodal information to a BERT model via a Multimodal Shifting Gate. Finally, we introduce an approach to capture both the inter- and intra-modal interactions by concatenating the textual and visual representations and utilizing a self-attention mechanism, which includes a gate model. Experiments conducted on the ADReSS Challenge dataset indicate that our introduced models demonstrate valuable advantages over existing research initiatives achieving competitive results in both the AD classification and MMSE regression task. Specifically, our best performing model attains an accuracy of 90.00% and a Root Mean Squared Error (RMSE) of 3.61 in the AD classification task and MMSE regression task, respectively, achieving a new state-of-the-art performance in the MMSE regression task.
Collapse
Affiliation(s)
- Loukas Ilias
- Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
| | | |
Collapse
|