1
|
Rogers HP, Hseu A, Kim J, Silberholz E, Jo S, Dorste A, Jenkins K. Voice as a Biomarker of Pediatric Health: A Scoping Review. CHILDREN (BASEL, SWITZERLAND) 2024; 11:684. [PMID: 38929263 PMCID: PMC11201680 DOI: 10.3390/children11060684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/28/2024]
Abstract
The human voice has the potential to serve as a valuable biomarker for the early detection, diagnosis, and monitoring of pediatric conditions. This scoping review synthesizes the current knowledge on the application of artificial intelligence (AI) in analyzing pediatric voice as a biomarker for health. The included studies featured voice recordings from pediatric populations aged 0-17 years, utilized feature extraction methods, and analyzed pathological biomarkers using AI models. Data from 62 studies were extracted, encompassing study and participant characteristics, recording sources, feature extraction methods, and AI models. Data from 39 models across 35 studies were evaluated for accuracy, sensitivity, and specificity. The review showed a global representation of pediatric voice studies, with a focus on developmental, respiratory, speech, and language conditions. The most frequently studied conditions were autism spectrum disorder, intellectual disabilities, asphyxia, and asthma. Mel-Frequency Cepstral Coefficients were the most utilized feature extraction method, while Support Vector Machines were the predominant AI model. The analysis of pediatric voice using AI demonstrates promise as a non-invasive, cost-effective biomarker for a broad spectrum of pediatric conditions. Further research is necessary to standardize the feature extraction methods and AI models utilized for the evaluation of pediatric voice as a biomarker for health. Standardization has significant potential to enhance the accuracy and applicability of these tools in clinical settings across a variety of conditions and voice recording types. Further development of this field has enormous potential for the creation of innovative diagnostic tools and interventions for pediatric populations globally.
Collapse
Affiliation(s)
- Hannah Paige Rogers
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Anne Hseu
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Jung Kim
- Department of Pediatrics, Boston Children’s Hospital, Boston, MA 02115, USA
| | | | - Stacy Jo
- Department of Otolaryngology, Boston Children’s Hospital, 333 Longwood Ave, Boston, MA 02115, USA
| | - Anna Dorste
- Boston Children’s Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
| | - Kathy Jenkins
- Department of Cardiology, Boston Children’s Hospital, Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA
| |
Collapse
|
2
|
Wang M, Zhao X, Li F, Wu L, Li Y, Tang R, Yao J, Lin S, Zheng Y, Ling Y, Ren K, Chen Z, Yin X, Wang Z, Gao Z, Zhang X. Using sustained vowels to identify patients with mild Parkinson's disease in a Chinese dataset. Front Aging Neurosci 2024; 16:1377442. [PMID: 38765774 PMCID: PMC11102047 DOI: 10.3389/fnagi.2024.1377442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Accepted: 04/15/2024] [Indexed: 05/22/2024] Open
Abstract
Introduction Parkinson's disease (PD) is the second most common neurodegenerative disease and affects millions of people. Accurate diagnosis and subsequent treatment in the early stages can slow down disease progression. However, making an accurate diagnosis of PD at an early stage is challenging. Previous studies have revealed that even for movement disorder specialists, it was difficult to differentiate patients with PD from healthy individuals until the average modified Hoehn-Yahr staging (mH&Y) reached 1.8. Recent researches have shown that dysarthria provides good indicators for computer-assisted diagnosis of patients with PD. However, few studies have focused on diagnosing patients with PD in the early stages, specifically those with mH&Y ≤ 1.5. Method We used a machine learning algorithm to analyze voice features and developed diagnostic models for differentiating between healthy controls (HCs) and patients with PD, and for differentiating between HCs and patients with mild PD (mH&Y ≤ 1.5). The models were independently validated using separate datasets. Results Our results demonstrate that, a remarkable diagnostic performance of the model in identifying patients with mild PD (mH&Y ≤ 1.5) and HCs, with area under the ROC curve 0.93 (95% CI: 0.851.00), accuracy 0.85, sensitivity 0.95, and specificity 0.75. Conclusion The results of our study are helpful for screening PD in the early stages in the community and primary medical institutions where there is a lack of movement disorder specialists and special equipment.
Collapse
Affiliation(s)
- Miao Wang
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Xingli Zhao
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Fengzhu Li
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Lingyu Wu
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Yifan Li
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Ruonan Tang
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Jiarui Yao
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Shinuan Lin
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Yuan Zheng
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Yun Ling
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Kang Ren
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Zhonglue Chen
- Gyenno Science Co., Ltd., Shenzhen, China
- HUST-GYENNO CNS Intelligent Digital Medicine Technology Center, Wuhan, China
| | - Xi Yin
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Zhenfu Wang
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Zhongbao Gao
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| | - Xi Zhang
- Department of Geriatric Neurology, The Second Medical Center and National Clinical Research Center for Geriatric Disease, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
3
|
Favaro A, Tsai YT, Butala A, Thebaud T, Villalba J, Dehak N, Moro-Velázquez L. Interpretable speech features vs. DNN embeddings: What to use in the automatic assessment of Parkinson's disease in multi-lingual scenarios. Comput Biol Med 2023; 166:107559. [PMID: 37852107 DOI: 10.1016/j.compbiomed.2023.107559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 10/07/2023] [Accepted: 10/09/2023] [Indexed: 10/20/2023]
Abstract
Speech-based approaches for assessing Parkinson's Disease (PD) often rely on feature extraction for automatic classification or detection. While many studies prioritize accuracy by using non-interpretable embeddings from Deep Neural Networks, this work aims to explore the predictive capabilities and language robustness of both feature types in a systematic fashion. As interpretable features, prosodic, linguistic, and cognitive descriptors were adopted, while x-vectors, Wav2Vec 2.0, HuBERT, and TRILLsson representations were used as non-interpretable features. Mono-lingual, multi-lingual, and cross-lingual machine learning experiments were conducted leveraging six data sets comprising speech recordings from various languages: American English, Castilian Spanish, Colombian Spanish, Italian, German, and Czech. For interpretable feature-based models, the mean of the best F1-scores obtained from each language was 81% in mono-lingual, 81% in multi-lingual, and 71% in cross-lingual experiments. For non-interpretable feature-based models, instead, they were 85% in mono-lingual, 88% in multi-lingual, and 79% in cross-lingual experiments. Firstly, models based on non-interpretable features outperformed interpretable ones, especially in cross-lingual experiments. Specifically, TRILLsson provided the most stable and accurate results across tasks and data sets. Conversely, the two types of features adopted showed some level of language robustness in multi-lingual and cross-lingual experiments. Overall, these results suggest that interpretable feature-based models can be used by clinicians to evaluate the deterioration of the speech of patients with PD, while non-interpretable feature-based models can be leveraged to achieve higher detection accuracy.
Collapse
Affiliation(s)
- Anna Favaro
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America.
| | - Yi-Ting Tsai
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Ankur Butala
- Department of Neurology, The Johns Hopkins University, Baltimore, 21218, MD, United States of America; Department of Psychiatry and Behavioral Sciences, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Thomas Thebaud
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Jesús Villalba
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Najim Dehak
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| | - Laureano Moro-Velázquez
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, 21218, MD, United States of America
| |
Collapse
|
4
|
Scimeca S, Amato F, Olmo G, Asci F, Suppa A, Costantini G, Saggio G. Robust and language-independent acoustic features in Parkinson's disease. Front Neurol 2023; 14:1198058. [PMID: 37384279 PMCID: PMC10294689 DOI: 10.3389/fneur.2023.1198058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/26/2023] [Indexed: 06/30/2023] Open
Abstract
Introduction The analysis of vocal samples from patients with Parkinson's disease (PDP) can be relevant in supporting early diagnosis and disease monitoring. Intriguingly, speech analysis embeds several complexities influenced by speaker characteristics (e.g., gender and language) and recording conditions (e.g., professional microphones or smartphones, supervised, or non-supervised data collection). Moreover, the set of vocal tasks performed, such as sustained phonation, reading text, or monologue, strongly affects the speech dimension investigated, the feature extracted, and, as a consequence, the performance of the overall algorithm. Methods We employed six datasets, including a cohort of 176 Healthy Control (HC) participants and 178 PDP from different nationalities (i.e., Italian, Spanish, Czech), recorded in variable scenarios through various devices (i.e., professional microphones and smartphones), and performing several speech exercises (i.e., vowel phonation, sentence repetition). Aiming to identify the effectiveness of different vocal tasks and the trustworthiness of features independent of external co-factors such as language, gender, and data collection modality, we performed several intra- and inter-corpora statistical analyses. In addition, we compared the performance of different feature selection and classification models to evaluate the most robust and performing pipeline. Results According to our results, the combined use of sustained phonation and sentence repetition should be preferred over a single exercise. As for the set of features, the Mel Frequency Cepstral Coefficients demonstrated to be among the most effective parameters in discriminating between HC and PDP, also in the presence of heterogeneous languages and acquisition techniques. Conclusion Even though preliminary, the results of this work can be exploited to define a speech protocol that can effectively capture vocal alterations while minimizing the effort required to the patient. Moreover, the statistical analysis identified a set of features minimally dependent on gender, language, and recording modalities. This discloses the feasibility of extensive cross-corpora tests to develop robust and reliable tools for disease monitoring and staging and PDP follow-up.
Collapse
Affiliation(s)
- Sabrina Scimeca
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Gabriella Olmo
- Department of Control and Computer Engineering, Polytechnic University of Turin, Turin, Italy
| | - Francesco Asci
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
| | - Antonio Suppa
- Department of Human Neuroscience, Sapienza University of Rome, Rome, Italy
- IRCCS Neuromed Institute, Pozzilli, Italy
| | - Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy
| |
Collapse
|
5
|
Exploring facial expressions and action unit domains for Parkinson detection. PLoS One 2023; 18:e0281248. [PMID: 36730168 PMCID: PMC9894465 DOI: 10.1371/journal.pone.0281248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 01/18/2023] [Indexed: 02/03/2023] Open
Abstract
BACKGROUND AND OBJECTIVE Patients suffering from Parkinson's disease (PD) present a reduction in facial movements called hypomimia. In this work, we propose to use machine learning facial expression analysis from face images based on action unit domains to improve PD detection. We propose different domain adaptation techniques to exploit the latest advances in automatic face analysis and face action unit detection. METHODS Three different approaches are explored to model facial expressions of PD patients: (i) face analysis using single frame images and also using sequences of images, (ii) transfer learning from face analysis to action units recognition, and (iii) triplet-loss functions to improve the automatic classification between patients and healthy subjects. RESULTS Real face images from PD patients show that it is possible to properly model elicited facial expressions using image sequences (neutral, onset-transition, apex, offset-transition, and neutral) with accuracy improvements of up to 5.5% (from 72.9% to 78.4%) with respect to single-image PD detection. We also show that our proposed action unit domain adaptation provides improvements of up to 8.9% (from 78.4% to 87.3%) with respect to face analysis. Finally, we also show that triplet-loss functions provide improvements of up to 3.6% (from 78.8% to 82.4%) with respect to action unit domain adaptation applied upon models created from scratch. The code of the experiments is available at https://github.com/luisf-gomez/Explorer-FE-AU-in-PD. CONCLUSIONS Domain adaptation via transfer learning methods seem to be a promising strategy to model hypomimia in PD patients. Considering the good results and also the fact that only up to five images per participant are considered in each sequence, we believe that this work is a step forward in the development of inexpensive computational systems suitable to model and quantify problems of PD patients in their facial expressions.
Collapse
|
6
|
Mishra S, Dash TK, Panda G. Speech phoneme and spectral smearing based non-invasive COVID-19 detection. Front Artif Intell 2023; 5:1035805. [PMID: 36686850 PMCID: PMC9847386 DOI: 10.3389/frai.2022.1035805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2022] [Accepted: 11/18/2022] [Indexed: 01/05/2023] Open
Abstract
COVID-19 is a deadly viral infection that mainly affects the nasopharyngeal and oropharyngeal cavities before the lung in the human body. Early detection followed by immediate treatment can potentially reduce lung invasion and decrease fatality. Recently, several COVID-19 detections methods have been proposed using cough and breath sounds. However, very little study has been done on the use of phoneme analysis and the smearing of the audio signal in COVID-19 detection. In this paper, this problem has been addressed and the classification of speech samples has been carried out in COVID-19-positive and healthy audio samples. Additionally, the grouping of the phonemes based on reference classification accuracies have been proposed for effectiveness and faster detection of the disease at a primary stage. The Mel and Gammatone Cepstral coefficients and their derivatives are used as the features for five standard machine learning-based classifiers. It is observed that the generalized additive model provides the highest accuracy of 97.22% for the phoneme grouping "/t//r//n//g//l/." This smearing-based phoneme classification technique can also be used in the future to classify other speech-related disease detections.
Collapse
Affiliation(s)
- Soumya Mishra
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| | - Tusar Kanti Dash
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| | - Ganapati Panda
- Department of Electronics and Communication Engineering, C. V. Raman Global University, Bhubaneswar, India
| |
Collapse
|
7
|
Things to Consider When Automatically Detecting Parkinson’s Disease Using the Phonation of Sustained Vowels: Analysis of Methodological Issues. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12030991] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Diagnosing Parkinson’s Disease (PD) necessitates monitoring symptom progression. Unfortunately, diagnostic confirmation often occurs years after disease onset. A more sensitive and objective approach is paramount to the expedient diagnosis and treatment of persons with PD (PwPDs). Recent studies have shown that we can train accurate models to detect signs of PD from audio recordings of confirmed PwPDs. However, disparities exist between studies and may be caused, in part, by differences in employed corpora or methodologies. Our hypothesis is that unaccounted covariates in methodology, experimental design, and data preparation resulted in overly optimistic results in studies of PD automatic detection employing sustained vowels. These issues include record-wise fold creation rather than subject-wise; an imbalance of age between the PwPD and control classes; using too small of a corpus compared to the sizes of feature vectors; performing cross-validation without including development data; and the absence of cross-corpora testing to confirm results. In this paper, we evaluate the influence of these methodological issues in the automatic detection of PD employing sustained vowels. We perform several experiments isolating each issue to measure its influence employing three different corpora. Moreover, we analyze if the perceived dysphonia of the speakers could be causing differences in results between the corpora. Results suggest that each independent methodological issue analyzed has an effect on classification accuracy. Consequently, we recommend a list of methodological steps to be considered in future experiments to avoid overoptimistic or misleading results.
Collapse
|
8
|
Advances in Parkinson's Disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102418] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
9
|
A machine learning perspective on the emotional content of Parkinsonian speech. Artif Intell Med 2021; 115:102061. [PMID: 34001321 DOI: 10.1016/j.artmed.2021.102061] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Revised: 02/26/2021] [Accepted: 03/29/2021] [Indexed: 12/23/2022]
Abstract
Patients with Parkinson's disease (PD) have distinctive voice patterns, often perceived as expressing sad emotion. While this characteristic of Parkinsonian speech has been supported through the perspective of listeners, where both PD and healthy control (HC) subjects repeat the same speaking tasks, it has never been explored through a machine learning modelling approach. Our work provides an objective evaluation of this characteristic of the PD speech, by building a transfer learning system to assess how the PD pathology affects the sadness perception. To do so we introduce a Mixture-of-Experts (MoE) architecture for speech emotion recognition designed to be transferable across datasets. Firstly, by relying on publicly available emotional speech corpora, we train the MoE model and then we use it to quantify perceived sadness in never seen before PD and matched HC speech recordings. To build our models (experts), we extracted spectral features of the voicing parts of speech and we trained a gradient boosting decision trees model in each corpus to predict happiness vs. sadness. MoE predictions are created by weighting each expert's prediction according to the distance between the new sample and the expert-specific training samples. The MoE approach systematically infers more negative emotional characteristics in PD speech than in HC. Crucially, these judgments are related to the disease severity and the severity of speech impairment in the PD patients: the more impairment, the more likely the speech is to be judged as sad. Our findings pave the way towards a better understanding of the characteristics of PD speech and show how publicly available datasets can be used to train models that provide interesting insights on clinical data.
Collapse
|