1
|
Schraut T, Döllinger M, Kunduk M, Echternach M, Dürr S, Werz J, Schützenberger A. Machine Learning-Based Estimation of Hoarseness Severity Using Acoustic Signals Recorded During High-Speed Videoendoscopy. J Voice 2025:S0892-1997(24)00437-5. [PMID: 39755525 DOI: 10.1016/j.jvoice.2024.12.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/02/2024] [Accepted: 12/04/2024] [Indexed: 01/06/2025]
Abstract
OBJECTIVES This study investigates the use of sustained phonations recorded during high-speed videoendoscopy (HSV) for machine learning-based assessment of hoarseness severity (H). The performance of this approach is compared with conventional recordings obtained during voice therapy to evaluate key differences and limitations of HSV-derived acoustic recordings. METHODS A database of 617 voice recordings with a duration of 250 ms was gathered during HSV examination (HS). Two databases comprising 809 vowels recorded during voice therapy were used for comparison, examining recording durations of 1 second (VT-1) and 250 ms (VT-2). A total of 490 features were extracted, including perturbation and noise characteristics, spectral and cepstral coefficients, as well as features based on modulation spectrum, nonlinear dynamic analysis, entropy, and empirical mode decomposition. Model development focused on selecting a minimal-optimal feature subset and suitable classification algorithms. Recordings were classified into two groups of hoarseness based on auditory-perceptual ratings by experts, yielding a continuous hoarseness score yˆ. Model performance was evaluated based on classification accuracy, correlation between predicted scores yˆ∈[0,1] and subjective ratings H∈{0,1,2,3}, and correlation between the relative change in quantitative and subjective ratings. RESULTS Logistic regression combined with five acoustic features achieved a classification accuracy of 0.863 (VT-1), 0.847 (VT-2), and 0.742 (HS) on the test sets. A correlation of 0.797 (VT-1), 0.763 (VT-2), and 0.637 (HS) was obtained between yˆ and H, respectively. For 21 test subjects with two recordings, the model yielded a correlation of 0.592 (VT-1), 0.486 (VT-2), and 0.088 (HS) between ∆yˆ and ∆H. CONCLUSION While acoustic signals recorded during HSV show potential for quantitative hoarseness assessment, they are less reliable than voice therapy recordings due to practical challenges associated with oral laryngeal examination. Addressing these limitations, for example, through the use of flexible nasal endoscopy, could improve the quality of HSV-derived acoustic recordings and voice assessments.
Collapse
Affiliation(s)
- Tobias Schraut
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany.
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA 70803
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Munich, Ludwig-Maximilian-Universität München, 81377 Munich, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, University Hospital Regensburg, Universität Regensburg, 93053 Regensburg, Germany
| | - Julia Werz
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
2
|
Farhah N. Utilizing deep learning models in an intelligent spiral drawing classification system for Parkinson's disease classification. Front Med (Lausanne) 2024; 11:1453743. [PMID: 39296906 PMCID: PMC11410056 DOI: 10.3389/fmed.2024.1453743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/23/2024] [Indexed: 09/21/2024] Open
Abstract
Introduction Parkinson's disease (PD) is a neurodegenerative illness that impairs normal human movement. The primary cause of PD is the deficiency of dopamine in the human brain. PD also leads to several other challenges, including insomnia, eating disturbances, excessive sleepiness, fluctuations in blood pressure, sexual dysfunction, and other issues. Methods The suggested system is an extremely promising technological strategy that may help medical professionals provide accurate and unbiased disease diagnoses. This is accomplished by utilizing significant and unique traits taken from spiral drawings connected to Parkinson's disease. While PD cannot be cured, early administration of drugs may significantly improve the condition of a patient with PD. An expeditious and accurate clinical classification of PD ensures that efficacious therapeutic interventions can commence promptly, potentially impeding the advancement of the disease and enhancing the quality of life for both patients and their caregivers. Transfer learning models have been applied to diagnose PD by analyzing important and distinctive characteristics extracted from hand-drawn spirals. The studies were carried out in conjunction with a comparison analysis employing 102 spiral drawings. This work enhances current research by analyzing the effectiveness of transfer learning models, including VGG19, InceptionV3, ResNet50v2, and DenseNet169, for identifying PD using hand-drawn spirals. Results Transfer machine learning models demonstrate highly encouraging outcomes in providing a precise and reliable classification of PD. Actual results demonstrate that the InceptionV3 model achieved a high accuracy of 89% when learning from spiral drawing images and had a superior receiver operating characteristic (ROC) curve value of 95%. Discussion The comparison results suggest that PD identification using these models is currently at the forefront of PD research. The dataset will be enlarged, transfer learning strategies will be investigated, and the system's integration into a comprehensive Parkinson's monitoring and evaluation platform will be looked into as future research areas. The results of this study could lead to a better quality of life for Parkinson's sufferers, individualized treatment, and an early classification.
Collapse
Affiliation(s)
- Nesren Farhah
- Department of Health Informatics, College of Health Sciences, Saudi Electronic University, Riyadh, Saudi Arabia
| |
Collapse
|
3
|
Tirronen S, Kadiri SR, Alku P. The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection. J Voice 2024; 38:975-982. [PMID: 35490081 DOI: 10.1016/j.jvoice.2022.03.021] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/21/2022] [Indexed: 10/18/2022]
Abstract
Automatic voice pathology detection is a research topic, which has gained increasing interest recently. Although methods based on deep learning are becoming popular, the classical pipeline systems based on a two-stage architecture consisting of a feature extraction stage and a classifier stage are still widely used. In these classical detection systems, frame-wise computation of mel-frequency cepstral coefficients (MFCCs) is the most popular feature extraction method. However, no systematic study has been conducted to investigate the effect of the MFCC frame length on automatic voice pathology detection. In this work, we studied the effect of the MFCC frame length in voice pathology detection using three disorders (hyperkinetic dysphonia, hypokinetic dysphonia and reflux laryngitis) from the Saarbrücken Voice Disorders (SVD) database. The detection performance was compared between speaker-dependent and speaker-independent scenarios as well as between speaking task -dependent and speaking task -independent scenarios. The Support Vector Machine, which is the most widely used classifier in the study area, was used as the classifier. The results show that the detection accuracy depended on the MFFC frame length in all the scenarios studied. The best detection accuracy was obtained by using a MFFC frame length of 500 ms with a shift of 5 ms.
Collapse
Affiliation(s)
- Saska Tirronen
- Department of Signal Processing and Acoustics, Aalto University, Finland
| | | | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Finland
| |
Collapse
|
4
|
Kang H, He B, Song R, Wang W. ECAPA-TDNN based online discussion activity-level evaluation. Sci Rep 2024; 14:14744. [PMID: 38926429 PMCID: PMC11208574 DOI: 10.1038/s41598-024-63874-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open
Abstract
With the continuous development and application of online interactive activities and network transmission technology, online interactive behaviors such as online discussion meetings and online teaching have become indispensable in people's studies and work. However, the effectiveness of working with online discussions and feedback from participants on their conference performance has been a major concern, and this is the issue examined in this post. Based on the above issues, this paper designs an online discussion activity-level evaluation system based on voiceprint recognition technology. The application system developed in this project is divided into two parts; the first part is to segment the online discussion audio into multiple independent audio segments by audio segmentation technology and train the voiceprint recognition model to predict the speaker's identity in each separate audio component. In the second part, we propose a linear normalized online meeting activity-level calculation model based on the modified main indexes by traversing and counting each participant's speaking frequency and total speaking time as the main indexes for activity-level evaluation. To make the evaluation results more objective, reasonable, and distinguishable, the activity score of each participant is calculated, and each participant's activity-level in the discussion meeting is derived by combining the fuzzy membership function. To test the system's performance, we designed an experiment with 25 participants in an online discussion meeting, with two assistants manually recording the discussion and a host moderating the meeting. The results of the experiment showed that the system's evaluation results matched those recorded by the two assistants. The system can fulfill the task of distinguishing the level of activity of participants in online discussions.
Collapse
Affiliation(s)
- Hongbo Kang
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Botao He
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Ruoyang Song
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China
| | - Wenqing Wang
- School of Automation, Xi'an University of Posts and Telecommunications, Xi'an, China.
| |
Collapse
|
5
|
Ibarra EJ, Arias-Londoño JD, Zañartu M, Godino-Llorente JI. Towards a Corpus (and Language)-Independent Screening of Parkinson's Disease from Voice and Speech through Domain Adaptation. Bioengineering (Basel) 2023; 10:1316. [PMID: 38002440 PMCID: PMC10669342 DOI: 10.3390/bioengineering10111316] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 11/26/2023] Open
Abstract
End-to-end deep learning models have shown promising results for the automatic screening of Parkinson's disease by voice and speech. However, these models often suffer degradation in their performance when applied to scenarios involving multiple corpora. In addition, they also show corpus-dependent clusterings. These facts indicate a lack of generalisation or the presence of certain shortcuts in the decision, and also suggest the need for developing new corpus-independent models. In this respect, this work explores the use of domain adversarial training as a viable strategy to develop models that retain their discriminative capacity to detect Parkinson's disease across diverse datasets. The paper presents three deep learning architectures and their domain adversarial counterparts. The models were evaluated with sustained vowels and diadochokinetic recordings extracted from four corpora with different demographics, dialects or languages, and recording conditions. The results showed that the space distribution of the embedding features extracted by the domain adversarial networks exhibits a higher intra-class cohesion. This behaviour is supported by a decrease in the variability and inter-domain divergence computed within each class. The findings suggest that domain adversarial networks are able to learn the common characteristics present in Parkinsonian voice and speech, which are supposed to be corpus, and consequently, language independent. Overall, this effort provides evidence that domain adaptation techniques refine the existing end-to-end deep learning approaches for Parkinson's disease detection from voice and speech, achieving more generalizable models.
Collapse
Affiliation(s)
- Emiro J. Ibarra
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Julián D. Arias-Londoño
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Avenida España 1680, Casilla 110-V, Valparaíso 2390123, Chile; (E.J.I.); (M.Z.)
| | - Juan I. Godino-Llorente
- Escuela Técnica Superior de Ingeneiros de Telecomunicación, Universidad Politécnica de Madrid, Avda, Ciudad Universitaria, 30, 28040 Madrid, Spain;
| |
Collapse
|
6
|
Friedman L, Lauber M, Behroozmand R, Fogerty D, Kunecki D, Berry-Kravis E, Klusek J. Atypical vocal quality in women with the FMR1 premutation: an indicator of impaired sensorimotor control. Exp Brain Res 2023; 241:1975-1987. [PMID: 37347418 PMCID: PMC10863608 DOI: 10.1007/s00221-023-06653-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 06/13/2023] [Indexed: 06/23/2023]
Abstract
Women with the FMR1 premutation are susceptible to motor involvement related to atypical cerebellar function, including risk for developing fragile X tremor ataxia syndrome. Vocal quality analyses are sensitive to subtle differences in motor skills but have not yet been applied to the FMR1 premutation. This study examined whether women with the FMR1 premutation demonstrate differences in vocal quality, and whether such differences relate to FMR1 genetic, executive, motor, or health features of the FMR1 premutation. Participants included 35 women with the FMR1 premutation and 45 age-matched women without the FMR1 premutation who served as a comparison group. Three sustained /a/ vowels were analyzed for pitch (mean F0), variability of pitch (standard deviation of F0), and overall vocal quality (jitter, shimmer, and harmonics-to-noise ratio). Executive, motor, and health indices were obtained from direct and self-report measures and genetic samples were analyzed for FMR1 CGG repeat length and activation ratio. Women with the FMR1 premutation had a lower pitch, larger pitch variability, and poorer vocal quality than the comparison group. Working memory was related to harmonics-to-noise ratio and shimmer in women with the FMR1 premutation. Vocal quality abnormalities differentiated women with the FMR1 premutation from the comparison group and were evident even in the absence of other clinically evident motor deficits. This study supports vocal quality analyses as a tool that may prove useful in the detection of early signs of motor involvement in this population.
Collapse
Affiliation(s)
- Laura Friedman
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Meagan Lauber
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Roozbeh Behroozmand
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA
| | - Daniel Fogerty
- Department of Speech and Hearing Science, University of Illinois Urbana-Champaign, Champaign, USA
| | - Dariusz Kunecki
- Department of Pediatrics, Rush University Medical Center, Chicago, USA
| | | | - Jessica Klusek
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA.
| |
Collapse
|
7
|
Fujiki RB, Braden M, Thibeault SL. Voice Therapy Improves Acoustic and Auditory-Perceptual Outcomes in Children. Laryngoscope 2023; 133:977-983. [PMID: 35754165 PMCID: PMC9790974 DOI: 10.1002/lary.30263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 06/02/2022] [Accepted: 06/14/2022] [Indexed: 12/27/2022]
Abstract
PURPOSE This study employed acoustic measures as well as auditory-perceptual assessments to examine the effects of voice therapy in children presenting with benign vocal fold lesions. METHODS A retrospective, observational cohort design was employed. Sustained vowels produced by 129 children diagnosed with benign vocal fold lesions were analyzed, as well as connected speech samples produced by 47 children. Treatment outcome measures included Consensus of Auditory-Perceptual Evaluation of Voice (CAPE-V), jitter, shimmer, Noise-to-Harmonic Ratio (NHR), cepstral peak prominence (CPP), and Low-to-High Ratio (LHR) on sustained vowels, and CPP and LHR on connected speech. RESULTS Following voice therapy, significant improvements in CAPE-V ratings (p < 0.001) were observed. Additionally, jitter (p = 0.041), NHR (p = 0.019), and CPP (p < 0.01) on sustained vowels, and CPP (p = 0.002), and LHR (p = 0.008) on connected speech significantly improved following voice therapy. CPP increased with age in males but did not change in females. CAPE-V ratings and perturbation measures indicated that dysphonia was more severe in younger children pre and post-therapy. CONCLUSIONS Auditory-perceptual and acoustic measures demonstrated improved voice quality following voice therapy in children with dysphonia. CPP effectively quantified voice therapy gains and allowed for analysis of connected speech, in addition to sustained vowels. These findings demonstrate the value of CPP as a tool in assessing therapy outcomes and support the efficacy of voice therapy for children presenting with vocal fold lesions. LEVEL OF EVIDENCE 4 Laryngoscope, 133:977-983, 2023.
Collapse
Affiliation(s)
| | - Maia Braden
- Department of Communication Sciences and Disorders, University of Wisconsin Madison, Madison, Wisconsin, U.S.A
| | - Susan L Thibeault
- Department of Surgery, University of Wisconsin Madison, Madison, Wisconsin, U.S.A
- Department of Communication Sciences and Disorders, University of Wisconsin Madison, Madison, Wisconsin, U.S.A
| |
Collapse
|
8
|
Costantini G, Cesarini V, Di Leo P, Amato F, Suppa A, Asci F, Pisani A, Calculli A, Saggio G. Artificial Intelligence-Based Voice Assessment of Patients with Parkinson's Disease Off and On Treatment: Machine vs. Deep-Learning Comparison. SENSORS (BASEL, SWITZERLAND) 2023; 23:2293. [PMID: 36850893 PMCID: PMC9962335 DOI: 10.3390/s23042293] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 02/13/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Parkinson's Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.
Collapse
Affiliation(s)
- Giovanni Costantini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Valerio Cesarini
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Pietro Di Leo
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| | - Federica Amato
- Department of Control and Computer Engineering, Polytechnic University of Turin, 10129 Turin, Italy
| | - Antonio Suppa
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Francesco Asci
- Department of Human Neurosciences, Sapienza University of Rome, 00185 Rome, Italy
- IRCCS Neuromed Institute, 86077 Pozzilli, Italy
| | - Antonio Pisani
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Alessandra Calculli
- Department of Brain and Behavioral Sciences, University of Pavia, 27100 Pavia, Italy
- IRCCS Mondino Foundation, 27100 Pavia, Italy
| | - Giovanni Saggio
- Department of Electronic Engineering, University of Rome Tor Vergata, 00133 Rome, Italy
| |
Collapse
|
9
|
Addressing smartphone mismatch in Parkinson’s disease detection aid systems based on speech. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2022.104281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
10
|
Calà F, Manfredi C, Battilocchi L, Frassineti L, Cantarella G. Speaking with mask in the COVID-19 era: Multiclass machine learning classification of acoustic and perceptual parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1204. [PMID: 36859154 DOI: 10.1121/10.0017244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 01/26/2023] [Indexed: 06/18/2023]
Abstract
The intensive use of personal protective equipment often requires increasing voice intensity, with possible development of voice disorders. This paper exploits machine learning approaches to investigate the impact of different types of masks on sustained vowels /a/, /i/, and /u/ and the sequence /a'jw/ inside a standardized sentence. Both objective acoustical parameters and subjective ratings were used for statistical analysis, multiple comparisons, and in multivariate machine learning classification experiments. Significant differences were found between mask+shield configuration and no-mask and between mask and mask+shield conditions. Power spectral density decreases with statistical significance above 1.5 kHz when wearing masks. Subjective ratings confirmed increasing discomfort from no-mask condition to protective masks and shield. Machine learning techniques proved that masks alter voice production: in a multiclass experiment, random forest (RF) models were able to distinguish amongst seven masks conditions with up to 94% validation accuracy, separating masked from unmasked conditions with up to 100% validation accuracy and detecting the shield presence with up to 86% validation accuracy. Moreover, an RF classifier allowed distinguishing male from female subject in masked conditions with 100% validation accuracy. Combining acoustic and perceptual analysis represents a robust approach to characterize masks configurations and quantify the corresponding level of discomfort.
Collapse
Affiliation(s)
- F Calà
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - C Manfredi
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - L Battilocchi
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| | - L Frassineti
- Department of Information Engineering, Università degli Studi di Firenze, Firenze, Italy
| | - G Cantarella
- Department of Clinical Sciences and Community Health, University of Milan, Milan, Italy
| |
Collapse
|
11
|
Things to Consider When Automatically Detecting Parkinson’s Disease Using the Phonation of Sustained Vowels: Analysis of Methodological Issues. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12030991] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Diagnosing Parkinson’s Disease (PD) necessitates monitoring symptom progression. Unfortunately, diagnostic confirmation often occurs years after disease onset. A more sensitive and objective approach is paramount to the expedient diagnosis and treatment of persons with PD (PwPDs). Recent studies have shown that we can train accurate models to detect signs of PD from audio recordings of confirmed PwPDs. However, disparities exist between studies and may be caused, in part, by differences in employed corpora or methodologies. Our hypothesis is that unaccounted covariates in methodology, experimental design, and data preparation resulted in overly optimistic results in studies of PD automatic detection employing sustained vowels. These issues include record-wise fold creation rather than subject-wise; an imbalance of age between the PwPD and control classes; using too small of a corpus compared to the sizes of feature vectors; performing cross-validation without including development data; and the absence of cross-corpora testing to confirm results. In this paper, we evaluate the influence of these methodological issues in the automatic detection of PD employing sustained vowels. We perform several experiments isolating each issue to measure its influence employing three different corpora. Moreover, we analyze if the perceived dysphonia of the speakers could be causing differences in results between the corpora. Results suggest that each independent methodological issue analyzed has an effect on classification accuracy. Consequently, we recommend a list of methodological steps to be considered in future experiments to avoid overoptimistic or misleading results.
Collapse
|
12
|
Hireš M, Gazda M, Drotár P, Pah ND, Motin MA, Kumar DK. Convolutional neural network ensemble for Parkinson's disease detection from voice recordings. Comput Biol Med 2021; 141:105021. [PMID: 34799077 DOI: 10.1016/j.compbiomed.2021.105021] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Revised: 11/02/2021] [Accepted: 11/03/2021] [Indexed: 11/03/2022]
Abstract
The computerized detection of Parkinson's disease (PD) will facilitate population screening and frequent monitoring and provide a more objective measure of symptoms, benefiting both patients and healthcare providers. Dysarthria is an early symptom of the disease and examining it for computerized diagnosis and monitoring has been proposed. Deep learning-based approaches have advantages for such applications because they do not require manual feature extraction, and while this approach has achieved excellent results in speech recognition, its utilization in the detection of pathological voices is limited. In this work, we present an ensemble of convolutional neural networks (CNNs) for the detection of PD from the voice recordings of 50 healthy people and 50 people with PD obtained from PC-GITA, a publicly available database. We propose a multiple-fine-tuning method to train the base CNN. This approach reduces the semantical gap between the source task that has been used for network pretraining and the target task by expanding the training process by including training on another dataset. Training and testing were performed for each vowel separately, and a 10-fold validation was performed to test the models. The performance was measured by using accuracy, sensitivity, specificity and area under the ROC curve (AUC). The results show that this approach was able to distinguish between the voices of people with PD and those of healthy people for all vowels. While there were small differences between the different vowels, the best performance was when/a/was considered; we achieved 99% accuracy, 86.2% sensitivity, 93.3% specificity and 89.6% AUC. This shows that the method has potential for use in clinical practice for the screening, diagnosis and monitoring of PD, with the advantage that vowel-based voice recordings can be performed online without requiring additional hardware.
Collapse
Affiliation(s)
- Máté Hireš
- Intelligent Information Systems Lab, Technical University of Košice, Letná 9, 42001, Košice, Slovakia
| | - Matej Gazda
- Intelligent Information Systems Lab, Technical University of Košice, Letná 9, 42001, Košice, Slovakia
| | - Peter Drotár
- Intelligent Information Systems Lab, Technical University of Košice, Letná 9, 42001, Košice, Slovakia.
| | | | | | | |
Collapse
|
13
|
Naranjo L, Pérez CJ, Campos-Roca Y, Madruga M. Replication-based regularization approaches to diagnose Reinke's edema by using voice recordings. Artif Intell Med 2021; 120:102162. [PMID: 34629154 DOI: 10.1016/j.artmed.2021.102162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 08/21/2021] [Accepted: 08/31/2021] [Indexed: 10/20/2022]
Abstract
Reinke's edema is one of the most prevalent laryngeal pathologies. Its detection can be addressed by using computer-aided diagnosis systems based on features extracted from speech recordings. When extracting acoustic features from different voice recordings of a particular subject at a concrete moment, imperfections in technology and the very biological variability result in values that are close, but they are not identical. This suggests that the within-subject variability must be properly addressed in the statistical methodology. Regularization-based regression approaches can be used to reduce the classification errors by favoring the best predictors and penalizing the worst ones. Three replication-based regularization approaches for variable selection and classification have been specifically designed and implemented to take into account the underlying within-subject variability. In order to illustrate the applicability of these approaches, an experiment has been specifically conducted to discriminate Reinke's edema patients (30 subjects) from healthy people (30 subjects) in a hospital environment. The features have been extracted from four phonations of the sustained vowel /a/ recorded for each subject, leading to a database that has fed the proposed machine learning approaches. The proposed replication-based approaches have been proved to be reliable in terms of selected features and predictive ability, leading to a stable accuracy rate of 0.89 under a cross-validation framework. Also, a comparison with traditional independence-based regularization methods reports a great variability of the latter in terms of selected features and accuracy metrics. Therefore, the proposed approaches contribute to fill a gap in the scientific literature on statistical approaches considering within-subject variability and can be used to build a robust expert system.
Collapse
Affiliation(s)
- Lizbeth Naranjo
- Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, 04510 Ciudad de México, Mexico
| | - Carlos J Pérez
- Departamento de Matemáticas, Facultad de Veterinaria, Universidad de Extremadura, 10003 Cáceres, Spain.
| | - Yolanda Campos-Roca
- Departamento de Tecnologías de los Computadores y de las Comunicaciones, Escuela Politécnica, Universidad de Extremadura, 10003 Cáceres, Spain
| | - Mario Madruga
- Departamento de Matemáticas, Facultad de Veterinaria, Universidad de Extremadura, 10003 Cáceres, Spain
| |
Collapse
|
14
|
Clarfeld LA, Gramling R, Rizzo DM, Eppstein MJ. A general model of conversational dynamics and an example application in serious illness communication. PLoS One 2021; 16:e0253124. [PMID: 34197490 PMCID: PMC8248661 DOI: 10.1371/journal.pone.0253124] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2020] [Accepted: 05/29/2021] [Indexed: 11/19/2022] Open
Abstract
Conversation has been a primary means for the exchange of information since ancient times. Understanding patterns of information flow in conversations is a critical step in assessing and improving communication quality. In this paper, we describe COnversational DYnamics Model (CODYM) analysis, a novel approach for studying patterns of information flow in conversations. CODYMs are Markov Models that capture sequential dependencies in the lengths of speaker turns. The proposed method is automated and scalable, and preserves the privacy of the conversational participants. The primary function of CODYM analysis is to quantify and visualize patterns of information flow, concisely summarized over sequential turns from one or more conversations. Our approach is general and complements existing methods, providing a new tool for use in the analysis of any type of conversation. As an important first application, we demonstrate the model on transcribed conversations between palliative care clinicians and seriously ill patients. These conversations are dynamic and complex, taking place amidst heavy emotions, and include difficult topics such as end-of-life preferences and patient values. We use CODYMs to identify normative patterns of information flow in serious illness conversations, show how these normative patterns change over the course of the conversations, and show how they differ in conversations where the patient does or doesn’t audibly express anger or fear. Potential applications of CODYMs range from assessment and training of effective healthcare communication to comparing conversational dynamics across languages, cultures, and contexts with the prospect of identifying universal similarities and unique “fingerprints” of information flow.
Collapse
Affiliation(s)
- Laurence A. Clarfeld
- Department of Computer Science, University of Vermont, Burlington, VT, United States of America
- * E-mail:
| | - Robert Gramling
- Department of Family Medicine, University of Vermont, Burlington, VT, United States of America
| | - Donna M. Rizzo
- Department of Civil and Environmental Engineering, University of Vermont, Burlington, VT, United States of America
- Vermont Complex Systems Center, University of Vermont, Burlington, VT, United States of America
| | - Margaret J. Eppstein
- Department of Computer Science, University of Vermont, Burlington, VT, United States of America
- Vermont Complex Systems Center, University of Vermont, Burlington, VT, United States of America
| |
Collapse
|
15
|
Madruga M, Campos-Roca Y, Pérez CJ. Impact of noise on the performance of automatic systems for vocal fold lesions detection. Biocybern Biomed Eng 2021. [DOI: 10.1016/j.bbe.2021.07.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
16
|
Classification of ALS patients based on acoustic analysis of sustained vowel phonations. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
17
|
Lin H, Karjadi C, Ang TFA, Prajakta J, McManus C, Alhanai TW, Glass J, Au R. Identification of digital voice biomarkers for cognitive health. EXPLORATION OF MEDICINE 2020; 1:406-417. [PMID: 33665648 PMCID: PMC7929495 DOI: 10.37349/emed.2020.00028] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 09/04/2020] [Indexed: 01/03/2023] Open
Abstract
AIM Human voice contains rich information. Few longitudinal studies have been conducted to investigate the potential of voice to monitor cognitive health. The objective of this study is to identify voice biomarkers that are predictive of future dementia. METHODS Participants were recruited from the Framingham Heart Study. The vocal responses to neuropsychological tests were recorded, which were then diarized to identify participant voice segments. Acoustic features were extracted with the OpenSMILE toolkit (v2.1). The association of each acoustic feature with incident dementia was assessed by Cox proportional hazards models. RESULTS Our study included 6, 528 voice recordings from 4, 849 participants (mean age 63 ± 15 years old, 54.6% women). The majority of participants (71.2%) had one voice recording, 23.9% had two voice recordings, and the remaining participants (4.9%) had three or more voice recordings. Although all asymptomatic at the time of examination, participants who developed dementia tended to have shorter segments than those who were dementia free (P < 0.001). Additionally, 14 acoustic features were significantly associated with dementia after adjusting for multiple testing (P < 0.05/48 = 1 × 10-3). The most significant acoustic feature was jitterDDP_sma_de (P = 7.9 × 10-7), which represents the differential frame-to-frame Jitter. A voice based linear classifier was also built that was capable of predicting incident dementia with area under curve of 0.812. CONCLUSIONS Multiple acoustic and linguistic features are identified that are associated with incident dementia among asymptomatic participants, which could be used to build better prediction models for passive cognitive health monitoring.
Collapse
Affiliation(s)
- Honghuang Lin
- Section of Computational Biomedicine, Department of Medicine, Boston University School of Medicine, Boston, MA 02118, USA
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
| | - Cody Karjadi
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
| | - Ting F. A. Ang
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA 02118, USA
- Slone Epidemiology Center, Boston University School of Medicine, Boston, MA 02118, USA
| | - Joshi Prajakta
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
| | - Chelsea McManus
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
| | - Tuka W. Alhanai
- Department of Electrical and Computer Engineering, New York University Abu Dhabi, Abu Dhabi, UAE
| | - James Glass
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Rhoda Au
- The Framingham Heart Study, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Anatomy and Neurobiology, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, Boston, MA 02118, USA
- Slone Epidemiology Center, Boston University School of Medicine, Boston, MA 02118, USA
- Department of Neurology, Boston University School of Medicine, Boston, MA 02118, USA
| |
Collapse
|
18
|
Miramont JM, Restrepo JF, Codino J, Jackson-Menaldi C, Schlotthauer G. Voice Signal Typing Using a Pattern Recognition Approach. J Voice 2020; 36:34-42. [PMID: 32376059 DOI: 10.1016/j.jvoice.2020.03.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Revised: 03/24/2020] [Accepted: 03/26/2020] [Indexed: 11/29/2022]
Abstract
Voice signal classification in three types according to their degree of periodicity, a task known as signal typing, is a relevant preprocessing step before computing any perturbation measures. However, it is a time consuming and subjective activity. This has given rise to interest in automatic systems that use objective measures to distinguish among the different signal types. The purpose of this paper is twofold. First, to propose a pattern recognition approach for automatic voice signal typing based on a multi-class linear Support Vector Machine, and using rather well-known parameters like Jitter, Shimmer, Harmonic-to-Noise Ratio, and Cepstral Prominence Peak in combination with nonlinear dynamics measures. Two novel features are also proposed as objective parameters. Second, to validate this approach using a large amount of signals coming from two well-known corpora using cross-dataset experiments to assess the generalizability of the system. A total amount of 1262 signals labeled by professional voice pathologists were used with this purpose. Statistically significant differences between all types were found for all features. Accuracies over 82.71% were estimated in all intra-datasets and inter-datasets using cross-validation. Finally, the use of posterior probabilities is proposed as a measure of the reliability of the assigned type. This could help clinicians to make a more informed decision about the type assigned to a voice. These outcomes suggest that the proposed approach can successfully discriminate among the three voice types, paving the way to a fully automatic tool for voice signal typing in the future.
Collapse
Affiliation(s)
- J M Miramont
- Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática, UNER-CONICET, Oro Verde, Entre Ríos, Argentina.
| | - Juan F Restrepo
- Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática, UNER-CONICET, Oro Verde, Entre Ríos, Argentina
| | - J Codino
- Lakeshore Professional Voice Center, Lakeshore Ear, Nose and Throat Center, St. Clair Shores, Michigan
| | - C Jackson-Menaldi
- Lakeshore Professional Voice Center, Lakeshore Ear, Nose and Throat Center, St. Clair Shores, Michigan; Department of Otolaryngology, School of Medicine, Wayne State University, Detroit, Michigan
| | - G Schlotthauer
- Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática, UNER-CONICET, Oro Verde, Entre Ríos, Argentina
| |
Collapse
|
19
|
Parkinson’s Disease Detection from Drawing Movements Using Convolutional Neural Networks. ELECTRONICS 2019. [DOI: 10.3390/electronics8080907] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Nowadays, an important research effort in healthcare biometrics is finding accurate biomarkers that allow developing medical-decision support tools. These tools help to detect and supervise illnesses like Parkinson’s disease (PD). This paper contributes to this effort by analyzing a convolutional neural network (CNN) for PD detection from drawing movements. This CNN includes two parts: feature extraction (convolutional layers) and classification (fully connected layers). The inputs to the CNN are the module of the Fast Fourier’s transform in the range of frequencies between 0 Hz and 25 Hz. We analyzed the discrimination capability of different directions during drawing movements obtaining the best results for X and Y directions. This analysis was performed using a public dataset: Parkinson Disease Spiral Drawings Using Digitized Graphics Tablet dataset. The best results obtained in this work showed an accuracy of 96.5%, a F1-score of 97.7%, and an area under the curve of 99.2%.
Collapse
|