Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW. Is Speech the New Blood? Recent Progress in AI-Based Disease Detection From Audio in a Nutshell. Front Digit Health 2022;4:886615. [PMID: 35651538 PMCID: PMC9149088 DOI: 10.3389/fdgth.2022.886615] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/25/2022] [Indexed: 11/26/2022] Open

For:	Milling M, Pokorny FB, Bartl-Pokorny KD, Schuller BW. Is Speech the New Blood? Recent Progress in AI-Based Disease Detection From Audio in a Nutshell. Front Digit Health 2022;4:886615. [PMID: 35651538 PMCID: PMC9149088 DOI: 10.3389/fdgth.2022.886615] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/25/2022] [Indexed: 11/26/2022] Open

Number

Cited by Other Article(s)

Ramanarayanan V. Multimodal Technologies for Remote Assessment of Neurological and Mental Health. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024:1-13. [PMID: 38984943 DOI: 10.1044/2024_jslhr-24-00142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/11/2024]

Abstract

PURPOSE

Automated remote assessment and monitoring of patients' neurological and mental health is increasingly becoming an essential component of the digital clinic and telehealth ecosystem, especially after the COVID-19 pandemic. This review article reviews various modalities of health information that are useful for developing such remote clinical assessments in the real world at scale.

APPROACH

We first present an overview of the various modalities of health information-speech acoustics, natural language, conversational dynamics, orofacial or full body movement, eye gaze, respiration, cardiopulmonary, and neural-which can each be extracted from various signal sources-audio, video, text, or sensors. We further motivate their clinical utility with examples of how information from each modality can help us characterize how different disorders affect different aspects of patients' spoken communication. We then elucidate the advantages of combining one or more of these modalities toward a more holistic, informative, and robust assessment.

FINDINGS

We find that combining multiple modalities of health information allows for improved scientific interpretability, improved performance on downstream health applications such as early detection and progress monitoring, improved technological robustness, and improved user experience. We illustrate how these principles can be leveraged for remote clinical assessment at scale using a real-world case study of the Modality assessment platform.

CONCLUSION

This review article motivates the combination of human-centric information from multiple modalities to measure various aspects of patients' health, arguing that remote clinical assessment that integrates this complementary information can be more effective and lead to better clinical outcomes than using any one data stream in isolation.

Collapse

Neumann M, Kothare H, Ramanarayanan V. Multimodal Speech Biomarkers for Remote Monitoring of ALS Disease Progression. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.06.26.24308811. [PMID: 38978682 PMCID: PMC11230328 DOI: 10.1101/2024.06.26.24308811] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]

Abstract

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disease that severely impacts affected persons' speech and motor functions, yet early detection and tracking of disease progression remain challenging. The current gold standard for monitoring ALS progression, the ALS functional rating scale - revised (ALSFRS-R), is based on subjective ratings of symptom severity, and may not capture subtle but clinically meaningful changes due to a lack of granularity. Multimodal speech measures which can be automatically collected from patients in a remote fashion allow us to bridge this gap because they are continuous-valued and therefore, potentially more granular at capturing disease progression. Here we investigate the responsiveness and sensitivity of multimodal speech measures in persons with ALS (pALS) collected via a remote patient monitoring platform in an effort to quantify how long it takes to detect a clinically-meaningful change associated with disease progression. We recorded audio and video from 278 participants and automatically extracted multimodal speech biomarkers (acoustic, orofacial, linguistic) from the data. We find that the timing alignment of pALS speech relative to a canonical elicitation of the same prompt and the number of words used to describe a picture are the most responsive measures at detecting such change in both pALS with bulbar (n = 36) and non-bulbar onset (n = 107). Interestingly, the responsiveness of these measures is stable even at small sample sizes. We further found that certain speech measures are sensitive enough to track bulbar decline even when there is no patient-reported clinical change, i.e. the ALSFRS-R speech score remains unchanged at 3 out of a total possible score of 4. The findings of this study have the potential to facilitate improved, accelerated and cost-effective clinical trials and care.

Collapse

Tessler I, Primov-Fever A, Soffer S, Anteby R, Gecel NA, Livneh N, Alon EE, Zimlichman E, Klang E. Deep learning in voice analysis for diagnosing vocal cord pathologies: a systematic review. Eur Arch Otorhinolaryngol 2024;281:863-871. [PMID: 38091100 DOI: 10.1007/s00405-023-08362-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/17/2023] [Indexed: 01/19/2024]

Saab R, Balachandar A, Mahdi H, Nashnoush E, Perri LX, Waldron AL, Sadeghian A, Rubenfeld G, Crowley M, Boulos MI, Murray BJ, Khosravani H. Machine-learning assisted swallowing assessment: a deep learning-based quality improvement tool to screen for post-stroke dysphagia. Front Neurosci 2023;17:1302132. [PMID: 38130696 PMCID: PMC10734030 DOI: 10.3389/fnins.2023.1302132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Accepted: 11/07/2023] [Indexed: 12/23/2023] Open

Abstract

Introduction

Post-stroke dysphagia is common and associated with significant morbidity and mortality, rendering bedside screening of significant clinical importance. Using voice as a biomarker coupled with deep learning has the potential to improve patient access to screening and mitigate the subjectivity associated with detecting voice change, a component of several validated screening protocols.

Methods

In this single-center study, we developed a proof-of-concept model for automated dysphagia screening and evaluated the performance of this model on training and testing cohorts. Patients were admitted to a comprehensive stroke center, where primary English speakers could follow commands without significant aphasia and participated on a rolling basis. The primary outcome was classification either as a pass or fail equivalent using a dysphagia screening test as a label. Voice data was recorded from patients who spoke a standardized set of vowels, words, and sentences from the National Institute of Health Stroke Scale. Seventy patients were recruited and 68 were included in the analysis, with 40 in training and 28 in testing cohorts, respectively. Speech from patients was segmented into 1,579 audio clips, from which 6,655 Mel-spectrogram images were computed and used as inputs for deep-learning models (DenseNet and ConvNext, separately and together). Clip-level and participant-level swallowing status predictions were obtained through a voting method.

Results

The models demonstrated clip-level dysphagia screening sensitivity of 71% and specificity of 77% (F1 = 0.73, AUC = 0.80 [95% CI: 0.78-0.82]). At the participant level, the sensitivity and specificity were 89 and 79%, respectively (F1 = 0.81, AUC = 0.91 [95% CI: 0.77-1.05]).

Discussion

This study is the first to demonstrate the feasibility of applying deep learning to classify vocalizations to detect post-stroke dysphagia. Our findings suggest potential for enhancing dysphagia screening in clinical settings. https://github.com/UofTNeurology/masa-open-source.

Collapse

Affiliation(s)

Rami Saab Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
Arjun Balachandar Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
Hamza Mahdi Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
Eptehal Nashnoush Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
Lucas X. Perri Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada
Ashley L. Waldron Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada
Alireza Sadeghian Department of Computer Science, Faculty of Science, Toronto Metropolitan University, Toronto, ON, Canada
Gordon Rubenfeld Institute of Medical Science, University of Toronto, Toronto, ON, Canada Interdepartmental Division of Critical Care, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
Mark Crowley Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada
Mark I. Boulos Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada Institute of Medical Science, University of Toronto, Toronto, ON, Canada
Brian J. Murray Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada
Houman Khosravani Hurvitz Brain Sciences Program, Division of Neurology, Department of Medicine, Sunnybrook Health Sciences Centre, University of Toronto, Toronto, ON, Canada Goodfellow-Waldron Initiative in Stroke Innovation and Recovery, Division of Neurology, Neurology Quality and Innovation Lab, University of Toronto, Toronto, ON, Canada

Collapse

Tognetti A, Thunell E, Zakrzewska M, Olofsson J, Lekander M, Axelsson J, Olsson MJ. Discriminating between sick and healthy faces based on early sickness cues: an exploratory analysis of sex differences. Evol Med Public Health 2023;11:386-396. [PMID: 37941735 PMCID: PMC10629974 DOI: 10.1093/emph/eoad032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 08/12/2023] [Indexed: 11/10/2023] Open

Triantafyllopoulos A, Kathan A, Baird A, Christ L, Gebhard A, Gerczuk M, Karas V, Hübner T, Jing X, Liu S, Mallol-Ragolta A, Milling M, Ottl S, Semertzidou A, Rajamani ST, Yan T, Yang Z, Dineley J, Amiriparian S, Bartl-Pokorny KD, Batliner A, Pokorny FB, Schuller BW. HEAR4Health: a blueprint for making computer audition a staple of modern healthcare. Front Digit Health 2023;5:1196079. [PMID: 37767523 PMCID: PMC10520966 DOI: 10.3389/fdgth.2023.1196079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open

Affiliation(s)

Andreas Triantafyllopoulos EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alexander Kathan EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alice Baird EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Lukas Christ EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Alexander Gebhard EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Maurice Gerczuk EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Vincent Karas EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Tobias Hübner EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Xin Jing EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Shuo Liu EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Adria Mallol-Ragolta EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
Manuel Milling EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Sandra Ottl EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Anastasia Semertzidou EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Srividya Tirunellai Rajamani EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Tianhao Yan EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Zijiang Yang EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Judith Dineley EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Shahin Amiriparian EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Katrin D. Bartl-Pokorny EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Division of Phoniatrics, Medical University of Graz, Graz, Austria
Anton Batliner EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany
Florian B. Pokorny EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Division of Phoniatrics, Medical University of Graz, Graz, Austria Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany
Björn W. Schuller EIHW – Chair of Embedded Intelligence for Healthcare and Wellbeing, University of Augsburg, Augsburg, Germany Centre for Interdisciplinary Health Research, University of Augsburg, Augsburg, Germany GLAM – Group on Language, Audio, & Music, Imperial College London, London, United Kingdom

Collapse

Neumann M, Kothare H, Ramanarayanan V. Combining Multiple Multimodal Speech Features into an Interpretable Index Score for Capturing Disease Progression in Amyotrophic Lateral Sclerosis. INTERSPEECH 2023;2023:2353-2357. [PMID: 39006832 PMCID: PMC11246072 DOI: 10.21437/interspeech.2023-2100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]

Richter V, Neumann M, Green JR, Richburg B, Roesler O, Kothare H, Ramanarayanan V. Remote Assessment for ALS using Multimodal Dialog Agents: Data Quality, Feasibility and Task Compliance. INTERSPEECH 2023;2023:5441-5445. [PMID: 37791043 PMCID: PMC10547018 DOI: 10.21437/interspeech.2023-2115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]

Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. SENSORS (BASEL, SWITZERLAND) 2023;23:2293. [PMID: 36850893 PMCID: PMC9962335 DOI: 10.3390/s23042293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/13/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]