1
|
Skibińska J, Hosek J. Computerized analysis of hypomimia and hypokinetic dysarthria for improved diagnosis of Parkinson's disease. Heliyon 2023; 9:e21175. [PMID: 37908703 PMCID: PMC10613914 DOI: 10.1016/j.heliyon.2023.e21175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 10/07/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open
Abstract
Background and Objective An aging society requires easy-to-use approaches for diagnosis and monitoring of neurodegenerative disorders, such as Parkinson's disease (PD), so that clinicians can effectively adjust a treatment policy and improve patients' quality of life. Current methods of PD diagnosis and monitoring usually require the patients to come to a hospital, where they undergo several neurological and neuropsychological examinations. These examinations are usually time-consuming, expensive, and performed just a few times per year. Hence, this study explores the possibility of fusing computerized analysis of hypomimia and hypokinetic dysarthria (two motor symptoms manifested in the majority of PD patients) with the goal of proposing a new methodology of PD diagnosis that could be easily integrated into mHealth systems. Methods We enrolled 73 PD patients and 46 age- and gender-matched healthy controls, who performed several speech/voice tasks while recorded by a microphone and a camera. Acoustic signals were parametrized in the fields of phonation, articulation and prosody. Video recordings of a face were analyzed in terms of facial landmarks movement. Both modalities were consequently modeled by the XGBoost algorithm. Results The acoustic analysis enabled diagnosis of PD with 77% balanced accuracy, while in the case of the facial analysis, we observed 81% balanced accuracy. The fusion of both modalities increased the balanced accuracy to 83% (88% sensitivity and 78% specificity). The most informative speech exercise in the multimodality system turned out to be a tongue twister. Additionally, we identified muscle movements that are characteristic of hypomimia. Conclusions The introduced methodology, which is based on the myriad of speech exercises likewise audio and video modality, allows for the detection of PD with an accuracy of up to 83%. The speech exercise - tongue twisters occurred to be the most valuable from the clinical point of view. Additionally, the clinical interpretation of the created models is illustrated. The presented computer-supported methodology could serve as an extra tool for neurologists in PD detection and the proposed potential solution of mHealth will facilitate the patient's and doctor's life.
Collapse
Affiliation(s)
- Justyna Skibińska
- Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, Brno, 61600, Czechia
- Unit of Electrical Engineering, Tampere University, Kalevantie 4, Tampere, 33100, Finland
| | - Jiri Hosek
- Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, Brno, 61600, Czechia
| |
Collapse
|
2
|
Hireš M, Drotár P, Pah ND, Ngo QC, Kumar DK. On the inter-dataset generalization of machine learning approaches to Parkinson's disease detection from voice. Int J Med Inform 2023; 179:105237. [PMID: 37801807 DOI: 10.1016/j.ijmedinf.2023.105237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 09/20/2023] [Accepted: 09/24/2023] [Indexed: 10/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Parkinson's disease is the second-most-common neurodegenerative disorder that affects motor skills, cognitive processes, mood, and everyday tasks such as speaking and walking. The voices of people with Parkinson's disease may become weak, breathy, or hoarse and may sound emotionless, with slurred words and mumbling. Algorithms for computerized voice analysis have been proposed and have shown highly accurate results. However, these algorithms were developed on single, limited datasets, with participants possessing similar demographics. Such models are prone to overfitting and are unsuitable for generalization, which is essential in real-world applications. METHODS We evaluated the computerized Parkinson's disease diagnosis performance of various machine learning models and showed that these models degraded rapidly when used on different datasets. We evaluated two mainstream state-of-the-art approaches, one based on deep convolutional neural networks and another based on voice feature extraction followed by a shallow classifier (i.e., extreme gradient boosting (XGBoost)). RESULTS An investigation with four datasets (CzechPD, PC-GITA, ITA, and RMIT-PD) proved that even if the algorithms yielded excellent performance on a single dataset, the results obtained on new data or even a mix of datasets were very unsatisfactory. CONCLUSIONS More work needs to be done to make computerized voice analysis methods for Parkinson's disease diagnosis suitable for real-world applications.
Collapse
Affiliation(s)
- Máté Hireš
- Intelligent Information Systems Lab, Technical University of Kosice, Letna 9, 42001 Kosice, Slovakia
| | - Peter Drotár
- Intelligent Information Systems Lab, Technical University of Kosice, Letna 9, 42001 Kosice, Slovakia.
| | - Nemuel Daniel Pah
- Biosignals Lab, RMIT University, Melbourne, Australia; Universitas Surabaya, Surabaya, Indonesia
| | | | | |
Collapse
|
3
|
B SB, Y BJ. Vision-based gait analysis for real-time Parkinson disease identification and diagnosis system. Health Syst (Basingstoke) 2022. [DOI: 10.1080/20476965.2022.2125838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022] Open
Affiliation(s)
- Sathya Bama B
- Computer Science and Engineering, Sathyabama Institute of Science and Technology, Chennai, India
| | - Bevish Jinila Y
- Sathyabama Institute of Science and Technology, Chennai, India
| |
Collapse
|
4
|
Kothare H, Roesler O, Burke W, Neumann M, Liscombe J, Exner A, Snyder S, Cornish A, Habberstad D, Pautler D, Suendermann-Oeft D, Huber J, Ramanarayanan V. Speech, Facial and Fine Motor Features for Conversation-Based Remote Assessment and Monitoring of Parkinson's Disease. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:3464-3467. [PMID: 36086652 DOI: 10.1109/embc48229.2022.9871375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present a cloud-based multimodal dialogue platform for the remote assessment and monitoring of speech, facial and fine motor function in Parkinson's Disease (PD) at scale, along with a preliminary investigation of the efficacy of the various metrics automatically extracted by the platform. 22 healthy controls and 38 people with Parkinson's Disease (pPD) were instructed to complete four interactive sessions, spaced a week apart, on the platform. Each session involved a battery of tasks designed to elicit speech, facial movements and finger movements. We find that speech, facial kinematic and finger movement dexterity metrics show statistically significant differences between controls and pPD. We further investigate the sensitivity, specificity, reliability and generalisability of these metrics. Our results offer encouraging evidence for the utility of automatically-extracted audiovisual analytics in remote mon-itoring of PD and other movement disorders.
Collapse
|
5
|
Gómez A, Tsanas A, Gómez P, Palacios-Alonso D, Rodellar V, Álvarez A. Acoustic to kinematic projection in Parkinson’s disease dysarthria. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
6
|
Gómez A, Gómez P, Palacios D, Rodellar V, Nieto V, Álvarez A, Tsanas A. A Neuromotor to Acoustical Jaw-Tongue Projection Model With Application in Parkinson's Disease Hypokinetic Dysarthria. Front Hum Neurosci 2021; 15:622825. [PMID: 33790751 PMCID: PMC8005556 DOI: 10.3389/fnhum.2021.622825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2020] [Accepted: 02/17/2021] [Indexed: 11/13/2022] Open
Abstract
Aim The present work proposes the study of the neuromotor activity of the masseter-jaw-tongue articulation during diadochokinetic exercising to establish functional statistical relationships between surface Electromyography (sEMG), 3D Accelerometry (3DAcc), and acoustic features extracted from the speech signal, with the aim of characterizing Hypokinetic Dysarthria (HD). A database of multi-trait signals of recordings from an age-matched control and PD participants are used in the experimental study. Hypothesis: The main assumption is that information between sEMG and 3D acceleration, and acoustic features may be quantified using linear regression methods. Methods Recordings from a cohort of eight age-matched control participants (4 males, 4 females) and eight PD participants (4 males, 4 females) were collected during the utterance of a diadochokinetic exercise (the fast repetition of diphthong [aI]). The dynamic and acoustic absolute kinematic velocities produced during the exercises were estimated by acoustic filter inversion and numerical integration and differentiation of the speech signal. The amplitude distributions of the absolute kinematic and acoustic velocities (AKV and AFV) are estimated to allow comparisons in terms of Mutual Information. Results The regression results show the relationships between sEMG and dynamic and acoustic estimates. The projection methodology may help in understanding the basic neuromotor muscle activity regarding neurodegenerative speech in remote monitoring neuromotor and neurocognitive diseases using speech as the vehicular tool, and in the study of other speech-related disorders. The study also showed strong and significant cross-correlations between articulation kinematics, both for the control and the PD cohorts. The absolute kinematic variables presents an observable difference for the PD participants compared to the control group. Conclusion Kinematic distributions derived from acoustic analysis may be useful biomarkers toward characterizing HD in neuromotor disorders providing new insights into PD.
Collapse
Affiliation(s)
- Andrés Gómez
- Old Medical School, Medical School, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.,NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Pedro Gómez
- NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Daniel Palacios
- NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain.,Escuela Técnica Superior de Ingeniería Informática-Universidad Rey Juan Carlos, Móstoles, Spain
| | - Victoria Rodellar
- NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Víctor Nieto
- NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Agustín Álvarez
- NeuSpeLab, Center for Biomedical Technology, Universidad Politécnica de Madrid, Madrid, Spain
| | - Athanasios Tsanas
- Old Medical School, Medical School, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
7
|
Gupta S, Patil AT, Purohit M, Parmar M, Patel M, Patil HA, Guido RC. Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments. Neural Netw 2021; 139:105-117. [PMID: 33684609 DOI: 10.1016/j.neunet.2021.02.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 01/24/2021] [Accepted: 02/08/2021] [Indexed: 10/22/2022]
Abstract
Recently, we have witnessed Deep Learning methodologies gaining significant attention for severity-based classification of dysarthric speech. Detecting dysarthria, quantifying its severity, are of paramount importance in various real-life applications, such as the assessment of patients' progression in treatments, which includes an adequate planning of their therapy and the improvement of speech-based interactive systems in order to handle pathologically-affected voices automatically. Notably, current speech-powered tools often deal with short-duration speech segments and, consequently, are less efficient in dealing with impaired speech, even by using Convolutional Neural Networks (CNNs). Thus, detecting dysarthria severity-level based on short speech segments might help in improving the performance and applicability of those systems. To achieve this goal, we propose a novel Residual Network (ResNet)-based technique which receives short-duration speech segments as input. Statistically meaningful objective analysis of our experiments, reported over standard Universal Access corpus, exhibits average values of 21.35% and 22.48% improvement, compared to the baseline CNN, in terms of classification accuracy and F1-score, respectively. For additional comparisons, tests with Gaussian Mixture Models and Light CNNs were also performed. Overall, the values of 98.90% and 98.00% for classification accuracy and F1-score, respectively, were obtained with the proposed ResNet approach, confirming its efficacy and reassuring its practical applicability.
Collapse
Affiliation(s)
- Siddhant Gupta
- Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India
| | - Ankur T Patil
- Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India
| | - Mirali Purohit
- Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India
| | | | - Maitreya Patel
- Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India
| | - Hemant A Patil
- Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar 382007, India
| | - Rodrigo Capobianco Guido
- Instituto de Biociências, Letras e Ciências Exatas, Unesp - Univ Estadual Paulista (São Paulo State University), Rua Cristóvão Colombo 2265, Jd Nazareth, 15054-000, São José do Rio Preto - SP, Brazil.
| |
Collapse
|
8
|
Carmona-Duarte C, Ferrer MA, Plamondon R, Gómez-Rodellar A, Gómez-Vilda P. Sigma-Lognormal Modeling of Speech. Cognit Comput 2021; 13:488-503. [PMID: 33786072 PMCID: PMC7943521 DOI: 10.1007/s12559-020-09803-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Accepted: 11/30/2020] [Indexed: 11/26/2022]
Abstract
Human movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject’s age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics-based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma-lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR-TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.
Collapse
Affiliation(s)
- C. Carmona-Duarte
- Instituto Universitario Para El Desarrollo Tecnológico Y La Innovación en Comunicaciones, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - M. A. Ferrer
- Instituto Universitario Para El Desarrollo Tecnológico Y La Innovación en Comunicaciones, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - R. Plamondon
- Laboratoire Scribens, Département de Génie Électrique, Polytechnique Montréal, Montreal, QC Canada
| | - A. Gómez-Rodellar
- Facultad de Informática, Universidad Politécnica de Madrid, Campus de Monte-Gancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain
| | - P. Gómez-Vilda
- Facultad de Informática, Universidad Politécnica de Madrid, Campus de Monte-Gancedo, s/n, 28660 Boadilla del Monte, Madrid, Spain
| |
Collapse
|
9
|
Tsanas A, Little MA, Ramig LO. Remote Assessment of Parkinson's Disease Symptom Severity Using the Simulated Cellular Mobile Telephone Network. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:11024-11036. [PMID: 33495722 PMCID: PMC7821632 DOI: 10.1109/access.2021.3050524] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Accepted: 12/25/2020] [Indexed: 06/12/2023]
Abstract
Telemonitoring of Parkinson's Disease (PD) has attracted considerable research interest because of its potential to make a lasting, positive impact on the life of patients and their carers. Purpose-built devices have been developed that record various signals which can be associated with average PD symptom severity, as quantified on standard clinical metrics such as the Unified Parkinson's Disease Rating Scale (UPDRS). Speech signals are particularly promising in this regard, because they can be easily recorded without the use of expensive, dedicated hardware. Previous studies have demonstrated replication of UPDRS to within less than 2 points of a clinical raters' assessment of symptom severity, using high-quality speech signals collected using dedicated telemonitoring hardware. Here, we investigate the potential of using the standard voice-over-GSM (2G) or UMTS (3G) cellular mobile telephone networks for PD telemonitoring, networks that, together, have greater than 5 billion subscribers worldwide. We test the robustness of this approach using a simulated noisy mobile communication network over which speech signals are transmitted, and approximately 6000 recordings from 42 PD subjects. We show that UPDRS can be estimated to within less than 3.5 points difference from the clinical raters' assessment, which is clinically useful given that the inter-rater variability for UPDRS can be as high as 4-5 UPDRS points. This provides compelling evidence that the existing voice telephone network has potential towards facilitating inexpensive, mass-scale PD symptom telemonitoring applications.
Collapse
Affiliation(s)
- Athanasios Tsanas
- Edinburgh Medical SchoolUsher Institute, The University of EdinburghEdinburghEH16 4UXU.K.
| | - Max A. Little
- School of Computer ScienceUniversity of BirminghamBirminghamB15 2TTU.K.
| | - Lorraine O. Ramig
- Department of Speech, Language, and Hearing ScienceUniversity of Colorado BoulderBoulderCO80309USA
- National Center for Voice and SpeechDenverCO80014USA
| |
Collapse
|
10
|
Hilbert spectrum analysis for automatic detection and evaluation of Parkinson’s speech. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2020.102050] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
11
|
Principal component analysis of the spectrogram of the speech signal: Interpretation and application to dysarthric speech. COMPUT SPEECH LANG 2020. [DOI: 10.1016/j.csl.2019.07.001] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|