1
|
Ramanarayanan V. Multimodal Technologies for Remote Assessment of Neurological and Mental Health. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-13. [PMID: 38984943 DOI: 10.1044/2024_jslhr-24-00142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]
Abstract
PURPOSE Automated remote assessment and monitoring of patients' neurological and mental health is increasingly becoming an essential component of the digital clinic and telehealth ecosystem, especially after the COVID-19 pandemic. This review article reviews various modalities of health information that are useful for developing such remote clinical assessments in the real world at scale. APPROACH We first present an overview of the various modalities of health information-speech acoustics, natural language, conversational dynamics, orofacial or full body movement, eye gaze, respiration, cardiopulmonary, and neural-which can each be extracted from various signal sources-audio, video, text, or sensors. We further motivate their clinical utility with examples of how information from each modality can help us characterize how different disorders affect different aspects of patients' spoken communication. We then elucidate the advantages of combining one or more of these modalities toward a more holistic, informative, and robust assessment. FINDINGS We find that combining multiple modalities of health information allows for improved scientific interpretability, improved performance on downstream health applications such as early detection and progress monitoring, improved technological robustness, and improved user experience. We illustrate how these principles can be leveraged for remote clinical assessment at scale using a real-world case study of the Modality assessment platform. CONCLUSION This review article motivates the combination of human-centric information from multiple modalities to measure various aspects of patients' health, arguing that remote clinical assessment that integrates this complementary information can be more effective and lead to better clinical outcomes than using any one data stream in isolation.
Collapse
Affiliation(s)
- Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA
- Department of Otolaryngology-Head and Neck Surgery, University of California, San Francisco
| |
Collapse
|
2
|
Neumann M, Kothare H, Ramanarayanan V. Multimodal Speech Biomarkers for Remote Monitoring of ALS Disease Progression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.26.24308811. [PMID: 38978682 PMCID: PMC11230328 DOI: 10.1101/2024.06.26.24308811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.
Collapse
Affiliation(s)
| | | | - Vikram Ramanarayanan
- Modality.AI, Inc., San Francisco, CA, USA
- University of California, San Francisco, CA, USA
| |
Collapse
|
3
|
Tessler I, Primov-Fever A, Soffer S, Anteby R, Gecel NA, Livneh N, Alon EE, Zimlichman E, Klang E. Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review. Eur Arch Otorhinolaryngol 2024; 281:863-871. [PMID: 38091100 DOI: 10.1007/s00405-023-08362-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/17/2023] [Indexed: 01/19/2024]
Abstract
OBJECTIVES With smartphones and wearable devices becoming ubiquitous, they offer an opportunity for large-scale voice sampling. This systematic review explores the application of deep learning models for the automated analysis of voice samples to detect vocal cord pathologies. METHODS We conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) reporting guidelines. We searched MEDLINE and Embase databases for original publications on deep learning applications for diagnosing vocal cord pathologies between 2002 and 2022. Risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2). RESULTS Out of the 14 studies that met the inclusion criteria, data from a total of 3037 patients were analyzed. All studies were retrospective. Deep learning applications targeted Reinke's edema, nodules, polyps, cysts, unilateral cord paralysis, and vocal fold cancer detection. Most pathologies had detection accuracy above 90%. Thirteen studies (93%) exhibited a high risk of bias and concerns about applicability. CONCLUSIONS Technology holds promise for enhancing the screening and diagnosis of vocal cord pathologies. While current research is limited, the presented studies offer proof of concept for developing larger-scale solutions.
Collapse
Affiliation(s)
- Idit Tessler
- Department of Otolaryngology Head and Neck Surgery, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel.
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
- ARC Innovation Center, Sheba Medical Center, Tel-Hashomer, Israel.
| | - Adi Primov-Fever
- Department of Otolaryngology Head and Neck Surgery, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Shelly Soffer
- Internal Medicine B, Assuta Medical Center, Ashdod, Israel
- Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Roi Anteby
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Department of Surgery and Transplantation B, Chaim Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
| | - Nir A Gecel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Nir Livneh
- Department of Otolaryngology Head and Neck Surgery, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eran E Alon
- Department of Otolaryngology Head and Neck Surgery, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Eyal Zimlichman
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- ARC Innovation Center, Sheba Medical Center, Tel-Hashomer, Israel
| | - Eyal Klang
- Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Ramat Gan, Israel
- ARC Innovation Center, Sheba Medical Center, Tel-Hashomer, Israel
| |
Collapse
|
4
|
Saab R, Balachandar A, Mahdi H, Nashnoush E, Perri LX, Waldron AL, Sadeghian A, Rubenfeld G, Crowley M, Boulos MI, Murray BJ, Khosravani H. Machine-learning assisted swallowing assessment: a deep learning-based quality improvement tool to screen for post-stroke dysphagia. Front Neurosci 2023; 17:1302132. [PMID: 38130696 PMCID: PMC10734030 DOI: 10.3389/fnins.2023.1302132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 11/07/2023] [Indexed: 12/23/2023] Open
Abstract
Introduction Post-stroke dysphagia is common and associated with significant morbidity and mortality, rendering bedside screening of significant clinical importance. Using voice as a biomarker coupled with deep learning has the potential to improve patient access to screening and mitigate the subjectivity associated with detecting voice change, a component of several validated screening protocols. Methods In this single-center study, we developed a proof-of-concept model for automated dysphagia screening and evaluated the performance of this model on training and testing cohorts. Patients were admitted to a comprehensive stroke center, where primary English speakers could follow commands without significant aphasia and participated on a rolling basis. The primary outcome was classification either as a pass or fail equivalent using a dysphagia screening test as a label. Voice data was recorded from patients who spoke a standardized set of vowels, words, and sentences from the National Institute of Health Stroke Scale. Seventy patients were recruited and 68 were included in the analysis, with 40 in training and 28 in testing cohorts, respectively. Speech from patients was segmented into 1,579 audio clips, from which 6,655 Mel-spectrogram images were computed and used as inputs for deep-learning models (DenseNet and ConvNext, separately and together). Clip-level and participant-level swallowing status predictions were obtained through a voting method. Results The models demonstrated clip-level dysphagia screening sensitivity of 71% and specificity of 77% (F1 = 0.73, AUC = 0.80 [95% CI: 0.78-0.82]). At the participant level, the sensitivity and specificity were 89 and 79%, respectively (F1 = 0.81, AUC = 0.91 [95% CI: 0.77-1.05]). Discussion This study is the first to demonstrate the feasibility of applying deep learning to classify vocalizations to detect post-stroke dysphagia. Our findings suggest potential for enhancing dysphagia screening in clinical settings. https://github.com/UofTNeurology/masa-open-source.
Collapse
Affiliation(s)
- Rami Saab
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Arjun Balachandar
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Hamza Mahdi
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Eptehal Nashnoush
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Lucas X. Perri
- Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada
| | - Ashley L. Waldron
- Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada
| | - Alireza Sadeghian
- Department of Computer Science, Faculty of Science, Toronto Metropolitan University, Toronto, ON, Canada
| | - Gordon Rubenfeld
- Institute of Medical Science, University of Toronto, Toronto, ON, Canada
- Interdepartmental Division of Critical Care, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
| | - Mark Crowley
- Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Mark I. Boulos
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
- Institute of Medical Science, University of Toronto, Toronto, ON, Canada
| | - Brian J. Murray
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
| | - Houman Khosravani
- Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
- Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
5
|
Tognetti A, Thunell E, Zakrzewska M, Olofsson J, Lekander M, Axelsson J, Olsson MJ. Discriminating between sick and healthy faces based on early sickness cues: an exploratory analysis of sex differences. Evol Med Public Health 2023; 11:386-396. [PMID: 37941735 PMCID: PMC10629974 DOI: 10.1093/emph/eoad032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/12/2023] [Indexed: 11/10/2023] Open
Abstract
Background and objectives It has been argued that sex and disease-related traits should influence how observers respond to sensory sickness cues. In fact, there is evidence that humans can detect sensory cues related to infection in others, but lack of power from earlier studies prevents any firm conclusion regarding whether perception of sickness cues is associated with sex and disease-related personality traits. Here, we tested whether women (relative to men), individuals with poorer self-reported health, and who are more sensitive to disgust, vulnerable to disease, and concerned about their health, overestimate the presence of, and/or are better at detecting sickness cues. Methodology In a large online study, 343 women and 340 men were instructed to identify the sick faces from a series of sick and healthy photographs of volunteers with an induced acute experimental inflammation. Participants also completed several disease-related questionnaires. Results While both men and women could discriminate between sick and healthy individuals above chance level, exploratory analyses revealed that women outperformed men in accuracy and speed of discrimination. Furthermore, we demonstrated that higher disgust sensitivity to body odors is associated with a more liberal decision criterion for categorizing faces as sick. Conclusion Our findings give strong support for the human ability to discriminate between sick and healthy individuals based on early facial cues of sickness and suggest that women are significantly, although only slightly, better at this task. If this finding is replicated, future studies should determine whether women's better performance is related to increased avoidance of sick individuals.
Collapse
Affiliation(s)
- Arnaud Tognetti
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- CEE-M, CNRS, INRAE, Institut Agro, University of Montpellier, Montpellier, France
| | - Evelina Thunell
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - Marta Zakrzewska
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| | - Jonas Olofsson
- Department of Psychology, Stockholm University, Stockholm, Sweden
| | - Mats Lekander
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Psychology, Stress Research Institute, Stockholm University, Stockholm, Sweden
- Osher Center for Integrative Health, Karolinska Institutet, Stockholm, Sweden
| | - John Axelsson
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
- Department of Psychology, Stress Research Institute, Stockholm University, Stockholm, Sweden
| | - Mats J Olsson
- Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
6
|
Triantafyllopoulos A, Kathan A, Baird A, Christ L, Gebhard A, Gerczuk M, Karas V, Hübner T, Jing X, Liu S, Mallol-Ragolta A, Milling M, Ottl S, Semertzidou A, Rajamani ST, Yan T, Yang Z, Dineley J, Amiriparian S, Bartl-Pokorny KD, Batliner A, Pokorny FB, Schuller BW. HEAR4Health: a blueprint for making computer audition a staple of modern healthcare. Front Digit Health 2023; 5:1196079. [PMID: 37767523 PMCID: PMC10520966 DOI: 10.3389/fdgth.2023.1196079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine. Thus, we provide an overview and perspective of HEAR4Health: the sketch of a modern, ubiquitous sensing system that can bring computer audition on par with other AI technologies in the strive for improved healthcare systems.
Collapse
Affiliation(s)
- Andreas Triantafyllopoulos
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alexander Kathan
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alice Baird
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Lukas Christ
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Alexander Gebhard
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Maurice Gerczuk
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Vincent Karas
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Tobias Hübner
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Xin Jing
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Shuo Liu
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Adria Mallol-Ragolta
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
| | - Manuel Milling
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Sandra Ottl
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Anastasia Semertzidou
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | | | - Tianhao Yan
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Zijiang Yang
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Judith Dineley
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Shahin Amiriparian
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Katrin D. Bartl-Pokorny
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Division of Phoniatrics, Medical University of Graz, Graz, Austria
| | - Anton Batliner
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
| | - Florian B. Pokorny
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Division of Phoniatrics, Medical University of Graz, Graz, Austria
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
| | - Björn W. Schuller
- EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
- Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
- GLAM – Group on Language, Audio, & Music, Imperial College London, London, United Kingdom
| |
Collapse
|
7
|
Neumann M, Kothare H, Ramanarayanan V. Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis. INTERSPEECH 2023; 2023:2353-2357. [PMID: 39006832 PMCID: PMC11246072 DOI: 10.21437/interspeech.2023-2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Multiple speech biomarkers have been shown to carry useful information regarding Amyotrophic Lateral Sclerosis (ALS) pathology. We propose a two-step framework to compute optimal linear combinations (indexes) of these biomarkers that are more discriminative and noise-robust than the individual markers, which is important for clinical care and pharmaceutical trial applications. First, we use a hierarchical clustering based method to select representative speech metrics from a dataset comprising 143 people with ALS and 135 age- and sex-matched healthy controls. Second, we analyze three methods of index computation that optimize linear discriminability, Youden Index, and sparsity of logistic regression model weights, respectively, and evaluate their performance with 5-fold cross validation. We find that the proposed indexes are generally more discriminative of bulbar vs non-bulbar onset in ALS than their individual component metrics as well as an equally-weighted baseline.
Collapse
|
8
|
Richter V, Neumann M, Green JR, Richburg B, Roesler O, Kothare H, Ramanarayanan V. Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance. INTERSPEECH 2023; 2023:5441-5445. [PMID: 37791043 PMCID: PMC10547018 DOI: 10.21437/interspeech.2023-2115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
We investigate the feasibility, task compliance and audiovisual data quality of a multimodal dialog-based solution for remote assessment of Amyotrophic Lateral Sclerosis (ALS). 53 people with ALS and 52 healthy controls interacted with Tina, a cloud-based conversational agent, in performing speech tasks designed to probe various aspects of motor speech function while their audio and video was recorded. We rated a total of 250 recordings for audio/video quality and participant task compliance, along with the relative frequency of different issues observed. We observed excellent compliance (98%) and audio (95.2%) and visual quality rates (84.8%), resulting in an overall yield of 80.8% recordings that were both compliant and of high quality. Furthermore, recording quality and compliance were not affected by level of speech severity and did not differ significantly across end devices. These findings support the utility of dialog systems for remote monitoring of speech in ALS.
Collapse
|
9
|
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. SENSORS (BASEL, SWITZERLAND) 2023; 23:2293. [PMID: 36850893 PMCID: PMC9962335 DOI: 10.3390/s23042293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/13/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Parkinson's Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.
Collapse
Affiliation(s)
- Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, 10129 Turin, Italy
| | - Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Alessandra Calculli
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| |
Collapse
|