1
|
Jiang JY, Hsu PM, Pan YA, Yu YH, Chen CK, Hsieh LC. Cepstral Peak Prominence: A Valuable Measure of Voice Outcome Severity in Patients With Unilateral Vocal Fold Paralysis. J Voice 2025:S0892-1997(24)00410-7. [PMID: 39757085 DOI: 10.1016/j.jvoice.2024.11.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 11/14/2024] [Accepted: 11/15/2024] [Indexed: 01/07/2025]
Abstract
OBJECTIVES This study investigated the relationship between the position of the paralyzed vocal fold and voice quality in patients with unilateral vocal fold paralysis (UVFP) and identified a reliable acoustic analysis tool to enhance the accuracy of voice quality assessments in this population. METHODS A retrospective case-control study was conducted with 70 patients with UVFP diagnosed at Mackay Memorial Hospital. Acoustic features-jitter, shimmer, the harmonic-to-noise ratio (HNR), and the cepstral peak prominence smoothed (CPPs)-were analyzed using the Praat software. A speech-language pathologist performed an auditory-perceptual assessment by using a perceptual voice evaluation scale, and a senior laryngologist reviewed the paralyzed fold's position endoscopically. Spearman's linear regression analysis was used to examine correlations between perceptual and acoustic parameters and the position of the paralyzed vocal fold. RESULTS The position of the paralyzed vocal fold exhibited weak correlations with acoustic and auditory-perceptual variables (r = 0.205-0.39). By contrast, moderate-to-strong correlations were discovered between auditory-perceptual variables and acoustic parameters (r = 0.378-0.803). Notably, the CPPs was more strongly associated with overall grade (severity: r = 0.673) and breathiness (r = -0.803) than with jitter, shimmer, and the HNR (r = 0.378-0.614). CONCLUSIONS The position of the paralyzed vocal fold alone is insufficient for predicting voice outcomes in patients with UVFP. The CPPs is a more valuable indicator of perceived dysphonia severity, particularly in cases with audible breathiness, making it superior to jitter, shimmer, and the HNR for perceptual voice assessments in patients with UVFP.
Collapse
Affiliation(s)
- Jing-Yi Jiang
- Department of Otolaryngology Head and Neck Surgery, Mackay Memorial Hospital, Taipei, Taiwan
| | - Pei-Min Hsu
- Department of Special Education (Master's Program of Speech and Language Pathology), University of Taipei, Taipei, Taiwan
| | - Yi-An Pan
- Department of Otolaryngology Head and Neck Surgery, Mackay Memorial Hospital, Taipei, Taiwan
| | - Yi-Hsuan Yu
- Department of Otolaryngology Head and Neck Surgery, Mackay Memorial Hospital, Taipei, Taiwan
| | - Chin-Kuo Chen
- School of Traditional Chinese Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan; Department of Otolaryngology-Head and Neck Surgery and Communication Enhancement Center, Chang Gung Memorial Hospital, Taoyuan, Taiwan; Department of Otolaryngology-Head and Neck Surgery, Chang Gung Memorial Hospital, Keelung, Taiwan
| | - Li-Chun Hsieh
- Department of Otolaryngology Head and Neck Surgery, Mackay Memorial Hospital, Taipei, Taiwan; School of Medicine, Mackay Medical College, New Taipei City, Taiwan; Department of Audiology and Speech Language Pathology, Mackay Medical College, New Taipei City, Taiwan.
| |
Collapse
|
2
|
Shabnam S, Pushpavathi M, Gopi Sankar R, Sridharan KV, Vasanthalakshmi MS. A Comprehensive Application for Grading Severity of Voice Based on Acoustic Voice Quality Index v.02.03. J Voice 2025; 39:287.e1-287.e9. [PMID: 36192290 DOI: 10.1016/j.jvoice.2022.08.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 08/06/2022] [Accepted: 08/08/2022] [Indexed: 10/07/2022]
Abstract
Acoustic Voice Quality Index is a six-variable acoustic model for the multiparametric measurement developed by Maryn et al. Studies have provided evidence regarding the practical usefullness, internal consistency, external validity, diagnostic accuracy, and responsiveness to change of AVQI. Recently, researchers have been exploring the utility of AVQI in classifying the voice severity. The aim of the present study was to determine the diagnostic accuracy of the AVQI v.02.03 in discriminating across the perceptual levels of dysphonia severity in 18-40 years age range in Kannada speaking population; and to develop an application to depict the AVQI based severity of dysphonia. For the study, 163 individuals in normophonic and 134 individuals in dysphonic group were considered in the age range of 18-40 years. All participants were native speakers of Kannada language. The sustained vowel /a/ and reading of standard Kannada passage were considered as stimuli for extracting AVQI analysed using AVQI script version 02.03. The AVQI cut-off values obtained were 2.50 (AROC=0.894; Sensitivity= 84.7%; Specificity= 83.1%), 4.17 (AROC=0.953; Sensitivity= 84.4%; Specificity= 88.5%) and 6.23 (AROC=1.000; Sensitivity= 100%; Specificity= 100%) for normal vs. mild, mild vs. moderate and moderate vs. severe respectively. A user friendly application was developed which provides a simplified output for AVQI cut-off values which can be comprehendible by patients with voice disorder/ non-professionals and health professionals.
Collapse
Affiliation(s)
- Srushti Shabnam
- Nitte Institute of Speech and Hearing, Mangalore, Karnataka, India.
| | - M Pushpavathi
- All India Institute of Speech and Hearing, Mysuru, Karnataka, India
| | - R Gopi Sankar
- Department of Clinical Services, All India Institute of Speech and Hearing, Mysuru, Karnataka, India
| | | | - M S Vasanthalakshmi
- Department of Speech Language Pathology, All India Institute of Speech and Hearing, Mysuru, Karnataka, India
| |
Collapse
|
3
|
Batthyany C, Latoszek BBV, Maryn Y. Meta-Analysis on the Validity of the Acoustic Voice Quality Index. J Voice 2024; 38:1527.e1-1527.e19. [PMID: 35752532 DOI: 10.1016/j.jvoice.2022.04.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/27/2022] [Accepted: 04/27/2022] [Indexed: 02/01/2023]
Abstract
BACKGROUND Acoustic measurements are useful tools to objectively measure overall voice quality. The Acoustic Voice Quality Index (AVQI) has shown to be a valid multiparametric tool to objectify dysphonia severity. The increasing number of validity studies investigating AVQI's validity demands a comprehensive synthesis of the available outcomes. OBJECTIVE OF REVIEW The aim of the present meta-analysis is to quantify the evidence for the diagnostic accuracy of the AVQI, including its sensitivity, specificity and likelihood ratio statistics, and its concurrent validity and sensitivity to changes in auditory-perceptual voice quality ratings. TYPE OF REVIEW Meta-analysis SEARCH STRATEGY: MEDLINE, EMBASE, the Cochrane library and Web of Science were searched from 2010 till April 2021 with an additional manual search, using keywords related to AVQI and common terminologies of validity outcomes. Studies considering the clinical validity of AVQI (ie, diagnostic accuracy, concurrent validity and sensitivity to change), using auditory-perceptual voice quality evaluation as reference, were included. EVALUATION METHOD The Preferred Reporting Items for Systematic reviews and Meta-Analyses of Diagnostic Test Accuracy Studies (PRISMA-DTA) guidelines were used. Quality assessment of included studies was conducted using the QUADAS-2 tool. For the diagnostic accuracy of AVQI, the pooled sensitivity, specificity and likelihood ratio statistics were determined using a summary receiver operating characteristic approach. Weighted correlation coefficient measures (rW¯) were used to assess the concurrent validity and sensitivity to change. RESULTS A total of 198 studies were screened and 33 articles were included. In total, voice samples of 11447, 10272, and 367 different subjects were considered for analysis of diagnostic accuracy, concurrent validity and change responsiveness, respectively. Satisfying diagnostic accuracy results were found with a pooled sensitivity of 0.83 (95% CI: 0.82-0.83), a pooled specificity of 0.89 (95% CI: 0.88-0.90), a pooled positive LR of 7.75 (95% CI: 6.04-9.95), a pooled negative LR of 0.20 (95% CI: 0.16-0.23), and a pooled diagnostic odds ratio of 47.13 (95% CI: 34.82-63.79). Summary receiver operating characteristic curve analysis showed an excellent AUC value of 0.937 and Q* index of 0.874. Strong correlations of rW¯ = 0.838 for concurrent validity and rW¯ = 0.796 for sensitivity to change were found. CONCLUSIONS Our results confirm the general clinical utility of the AVQI as a robust and valid objective measure for evaluating overall dysphonia severity across languages and study methods.
Collapse
Affiliation(s)
- Christina Batthyany
- GZA Sint-Augustinus, Department of Otorhinolaryngology and Head & Neck Surgery, European Institute of ORL-HNS, Antwerp, Belgium
| | - Ben Barsties V Latoszek
- SRH University of Applied Health Sciences, Speech-Language Pathology, Düsseldorf, Germany; University of Münster, University Hospital Münster, Department of Phoniatrics and Pediatric Audiology, Münster, Germany
| | - Youri Maryn
- GZA Sint-Augustinus, Department of Otorhinolaryngology and Head & Neck Surgery, European Institute of ORL-HNS, Antwerp, Belgium; Ghent University, Faculty of Medicine and Health Sciences, Department of Rehabilitation Sciences, Ghent, Belgium; University College Ghent, Department of Speech-Language Therapy and Audiology, Ghent, Belgium; Université Catholique de Louvain, Faculty of Psychology and Pedagogical Sciences, School of Logopedics, Ottignies-Louvain-La-Neuve, Belgium; Phonanium, Lokeren, Belgium.
| |
Collapse
|
4
|
Saeedi S, Aghajanzadeh M, Khoddami SM, Dabirmoghaddam P, Jalaie S. The Validity of Cepstral Analysis to Distinguish Between Different Levels of Perceptual Dysphonia in the Persian Vocal Tasks. J Voice 2024; 38:1523.e9-1523.e16. [PMID: 35599059 DOI: 10.1016/j.jvoice.2022.04.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 04/07/2022] [Accepted: 04/08/2022] [Indexed: 11/20/2022]
Abstract
OBJECTIVES/HYPOTHESIS The validity of cepstral analysis (Cepstral Peak Prominence [CPP] and Cepstral Peak Prominence-Smoothed [CPPS]) as an indicator of perceptual dysphonia was investigated in the Persian language STUDY DESIGN: Cross-sectional study. METHODS A total of 223 participants (159 with and 64 without dysphonia) uttered vowels /a/ and /i/, six standard sentences, and non-standard connected speech. All vocal samples were perceptually evaluated by three raters on a visual analog scale and put into four groups (normal voice, mild, moderate, and severe perpetual dysphonia). CPP and CPPS of sustained vowel /a/, reading the second standard sentence, and a sentence extracted from non-standard connected speech were established using "Praat" software. Statistical analysis involved a one-way factorial analysis of variance (ANOVA), Kruskal-Wallis H, Kendall's Tau-b correlation, t test, and receiver operating characteristics (ROC) curve. RESULTS The results showed that CPP of sustained vowels and reading the standard sentence and CPPS of sustained vowel differed significantly (P < 0.05), except between the normal voice and mild perpetual dysphonia groups (P > 0.05). The CPP of non-standard connected speech, CPPS of reading the standard sentence, and non-standard connected speech differed significantly between all groups (P < 0.05). The mean of cepstral analysis of all tasks, "averaged CPP," and "averaged CPPS" were significantly different between two groups of the normal voice and perceptual dysphonia (P < 0.05). Correlation between the cepstral analysis and the perceptual ratings demonstrated that the correlation coefficients for CPP and CPPS were between 0.4 and 0.6 (P < 0.05). ROC curve analysis revealed that the area under the ROC curve for "averaged CPP" and "averaged CPPS" was greater than 0.8 (P < 0.05). The values of 22.11 and 12.29 were determined as cut-off scores of "averaged CPP" and "averaged CPPS," respectively. CONCLUSIONS Cepstral analysis was known as useful clinical tool for diagnosis of perpetual dysphonia and determining its severity level in the Persian language.
Collapse
Affiliation(s)
- Saeed Saeedi
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahshid Aghajanzadeh
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran.
| | - Seyyedeh Maryam Khoddami
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Payman Dabirmoghaddam
- Otorhinolaryngology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Shohreh Jalaie
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
5
|
Shu M, Zhang Y, Jiang JJ. The Effect of Mandarin Vowels on Acoustic Analysis: A Prospective Observational Study. J Voice 2024; 38:1296-1301. [PMID: 35508424 DOI: 10.1016/j.jvoice.2022.03.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2022] [Revised: 03/30/2022] [Accepted: 03/30/2022] [Indexed: 10/18/2022]
Abstract
OBJECTIVES Although vowels are of interest for acoustic analysis in clinics, there is no consensus regarding the effect of vowel selection on acoustic perturbation parameters. This study aimed to reveal the effects of Mandarin vowels on acoustic measurements. STUDY DESIGN A prospective observational study. METHODS This prospective observational study enrolled normal phonation Mandarin speakers at the Otolaryngology Department of the Eye & ENT Hospital affiliated with Fudan University from December 2020 to August 2021. This study recruited 107 normal-voiced Mandarin speakers (59 women and 49 men) with a median age of 26 (22, 33) years old. The objective measures included traditional acoustic parameters (fundamental frequency, harmonic-to-noise ratio, percent jitter, and percent shimmer) and cepstral analysis (smoothed cepstral peak prominence) of six Mandarin vowels (ɑ /a/, o /o/, e /ɤ/, i /i/, u /u/, ü /y/). RESULTS The acoustic analysis revealed no significant differences in the fundamental frequency among vowels. The low vowel /a/ had the highest values for percent jitter and percent shimmer and the lowest harmonic-to-noise ratio value. The back vowel /u/ had the lowest cepstral measures (P < 0.05). CONCLUSIONS The acoustic analysis significantly varied across the different Mandarin vowels, and these differences must be considered for the effective clinical application of objective evaluations.
Collapse
Affiliation(s)
- Min Shu
- Eye & ENT Hospital of Fudan University, Department of Otolaryngology Head & Neck Surgery, China.
| | - Yi Zhang
- Eye & ENT Hospital of Fudan University, Department of Otolaryngology Head & Neck Surgery, China
| | - Jack J Jiang
- Eye & ENT Hospital of Fudan University, Department of Otolaryngology Head & Neck Surgery, China; Otolaryngology-Head and Neck Surgery, Department of Surgery, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| |
Collapse
|
6
|
Saeedi S, Aghajanzadeh M, Khoddami SM, Dabirmoghaddam P, Jalaie S, Aghadoost S. The Relationship between Traditional Acoustic Measures and Cepstral Analysis of Voice. Folia Phoniatr Logop 2024:1-13. [PMID: 39419016 DOI: 10.1159/000542063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 10/08/2024] [Indexed: 10/19/2024] Open
Abstract
INTRODUCTION In this study, the correlations between traditional acoustic measures (TAMs) and cepstral analysis (CA) were explored in Persian. METHODS This investigation was a cross-sectional study including 179 dysphonic (n = 141) and normophonic (n = 38) speakers. The TAMs (jitter, shimmer, and noise-to-harmonic ratio) and CA (cepstral peak prominence and cepstral peak prominence smoothed) values were obtained during vowel prolongation, reading a standard sentence, and a nonstandard running speech sample using Praat software. The difference of acoustic measures between normophonic and dysphonic speakers and intercorrelation among acoustic measures and correlation between the acoustic measures and perceived dysphonia levels were analyzed with independent t test, Mann-Whitney U test, Pearson, Spearman, and Kendall's Tau-b correlation tests using IBM SPSS Statistics. RESULTS The findings showed that dysphonic speakers had higher TAM values and lower CA values than normophonic speakers (p < 0.05). In dysphonic speakers, a large correlation was discovered among all acoustic measurements (r = 0.52-0.96; p < 0.05), while in various perceived dysphonic speakers, there was a correlation of varying strength (r = 0.25-0.97; p < 0.05). Ultimately, there was a significant small-to-large correlation between the acoustic measures and perceived dysphonia levels (r = 0.34-0.58; p < 0.05). CONCLUSION This research demonstrated that Persian speakers with dysphonia experienced a rise in TAM and a corresponding reduction in CA. In the future, multi-parametric indices can be developed using both TAM and CA to include various aspects of vocal production and yield a single, comprehensive value.
Collapse
Affiliation(s)
- Saeed Saeedi
- Independent Researcher in Laryngology, Voice Pathology, and Speech-Language Pathology, Tehran, Iran,
| | - Mahshid Aghajanzadeh
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyyedeh Maryam Khoddami
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Payman Dabirmoghaddam
- Otorhinolaryngology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Shohreh Jalaie
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Samira Aghadoost
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
7
|
Jamshidpour P, Moradi N, Raiesian S, Shaterzadeh Yazdi MJ, Soltani M, Seyedtabib M, Masoudrad M, Nourbakhsh M. Cepstral Analysis of Voice in Patients With Temporomandibular Disorders. Ann Otol Rhinol Laryngol 2024; 133:848-856. [PMID: 39054799 DOI: 10.1177/00034894241264938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/27/2024]
Abstract
OBJECTIVES This study aimed to assess the voice quality of patients with temporomandibular disorders (TMDs) compared with healthy subjects using cepstral analysis and investigate the relationship between the TMD severity and the values of cepstral analysis. METHODS Subjects who met the inclusion criteria completed a general health questionnaire and the Fonseca Anamnestic Index. Patients who had TMDs with FAI were subjected to an examination based on the Diagnostic Criteria for Temporomandibular Disorders. The final sample included 65 subjects, 31 TMDs patients (with a mean age ± standard deviation of 36.64 ± 13.67 years), and 34 healthy individuals in the control group (with a mean age ± standard deviation of 30.35 ± 7.78 years). Cepstral Peak Prominence (CPP) and Smoothened Cepstral Peak Prominence (CPPS) of a sustained vowel and connected speech were computed using Praat software. RESULTS TMD patients indicated lower cepstral values and lower voice quality compared to the control group. Significant differences were found between TMD and control groups for all cepstral parameters (P < .001) and cepstral measurements showed a moderate to strong negative correlation with TMD severity (P < .001, rho = -0.57 to -0.88). CONCLUSION The outcomes of the present study indicate that cepstral analysis can accurately distinguish the reduced voice quality of TMD patients from normal voice.
Collapse
Affiliation(s)
- Parizad Jamshidpour
- Department of Speech Therapy, School of Rehabilitation Sciences, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
- Musculoskeletal Rehabilitation Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Negin Moradi
- Department of Communication Sciences and Disorders, University of Wisconsin-River Falls, River Falls, WI, USA
| | - Shahrokh Raiesian
- Department of Oral and Maxillofacial Surgery, School of Dentistry, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | | | - Majid Soltani
- Department of Speech Therapy, School of Rehabilitation Sciences, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
- Musculoskeletal Rehabilitation Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Maryam Seyedtabib
- Department of Biostatistics and Epidemiology, School of Public Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Mahdis Masoudrad
- Department of Oral and Maxillofacial Surgery, School of Dentistry, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran
| | - Mandana Nourbakhsh
- Department of Linguistics, Faculty of Literature, Alzahra University, Tehran, Iran
| |
Collapse
|
8
|
Nguyen DD, Novakovic D, Madill C. Voice disorder discrimination using vowel acoustic measures in female speakers. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2024; 59:2087-2102. [PMID: 38884559 DOI: 10.1111/1460-6984.13081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 05/19/2024] [Indexed: 06/18/2024]
Abstract
BACKGROUND Sustained vowels are important vocal tasks that have been investigated in discriminating voice disorders using acoustic analysis. To date, no study has combined vowel acoustic measures only that evaluate major aspects of the pathological voice signals in voice disorder discrimination. AIMS To investigate the value of vowel acoustic measures that quantify glottal noise, signal stability, signal periodicity, spectral slope and overall voice quality in discriminating female speakers with and without voice disorders. METHODS & PROCEDURES Sustained vowel /ɑ/ samples were extracted from 133 voice-disordered female patients and 97 non-voice disordered female speakers and were signal typed prior to analysis. Praat software was used to measure harmonics-to-noise ratio (HNR), glottal-to-noise excitation ratio (GNE), the standard deviation of fundamental frequency (F0SD) and cepstral peak prominence (CPPp); and the Analysis of Dysphonia in Speech and Voice (ADSV) program was used to measure CPPadsv, low/high spectral ratio (LH) and the cepstral/spectral index of dysphonia (CSID). Outcome measures included sensitivity, specificity, and discrimination accuracy. OUTCOMES & RESULTS As individual acoustic measures, only spectral-based measures showed good (CPPadsv) and acceptable (CSID) discrimination results. The HNR, GNE and CPPp measures had acceptable sensitivity but poor or non-acceptable specificity and discrimination accuracy. Logistic regression models with all Praat measures (F0SD, HNR, GNE, CPPp) plus ADSV measures (CPPadsv, LH or CSID) provided excellent sensitivity, good-to-excellent specificity and excellent discrimination accuracy. ROC analysis for all individual measures showed that CPPadsv, CSID, CPPp, GNE and F0SD had the highest area under the curve (AUC) values. CONCLUSIONS & IMPLICATIONS A combination of acoustic measures that evaluate the major aspects of vocal dysfunction resulted in good to excellent voice discrimination outcomes. Individual acoustic measures had lower discrimination ability than combined measures. The findings implied that acoustic measures extracted from a prolonged vowel were useful in voice disorder discrimination. WHAT THIS PAPER ADDS What is already known on this subject Acoustic measures hold great value in discriminating voice disorders from normal voices. However, no study has evaluated discrimination values of a combination of sustained vowel acoustic measures that quantify additive noise, signal stability, signal periodicity, spectral slope and overall voice quality in single-gender cohorts. Previous studies have not used signal typing (the classification of the acoustic signals) for time-based measures, impacting the reliability of discrimination. What this study adds to the existing knowledge This study was the first to implement signal typing to include sustained vowel samples of Types 1 and 2 signals for discrimination statistics. We showed that a combination of vocal acoustic measures using time- and spectral-based extraction from the sustained /ɑ/ vowel evaluating additive noise, signal stability, signal periodicity, spectral slope and overall voice quality resulted in good to excellent sensitivity, specificity and discrimination accuracy. As individual measures, traditional time-based measures such as HNR had rather limited discrimination values whilst spectral-based measures provided higher discrimination values. Measures that are sensitive to signal types have low discrimination ability. What are the potential or actual clinical implications of this work? The sustained vowel /ɑ/ is a relevant, universal vocal task for clinical application using acoustic measures to discriminate female speakers with and without voice disorders if signal typing is implemented. Clinical voice assessment using vowels may not be effective if relying solely on time-based measurements. Spectral-based measures perform better in voice disorder discrimination given their insensitivity to signal types. The most effective voice disorder discrimination could only be obtained using a combination of acoustic measures that quantify major phenomena in the signals of disordered voices. Using measures extracted from both programs, Praat and ADSV, is useful given that specific settings in a program may impact on discrimination accuracy.
Collapse
Affiliation(s)
- Duy Duong Nguyen
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Daniel Novakovic
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| | - Catherine Madill
- Voice Research Laboratory, Sydney School of Health Sciences, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia
| |
Collapse
|
9
|
Yun EWT, Nguyen DD, Carding P, Hodges NJ, Chacon AM, Madill C. The Relationship Between Pitch Discrimination and Acoustic Voice Measures in a Cohort of Female Speakers. J Voice 2024; 38:1023-1034. [PMID: 35317969 DOI: 10.1016/j.jvoice.2022.02.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 02/13/2022] [Accepted: 02/13/2022] [Indexed: 11/28/2022]
Abstract
BACKGROUND Evidence across a range of musically trained, hearing disordered and voice disordered populations present conflicting results regarding the relationship between pitch discrimination (PD) and voice quality. PD characteristics of female speakers with and without a musical training background and no self-reported voice disorder, and the relationship between PD and voice quality in this particular population, have not been investigated. AIMS To evaluate PD characteristics in a cohort of female participants without a self-reported voice disorder and the relationship between PD and acoustic voice measures. METHOD One hundred fourteen female participants were studied, all of whom self-reported as being non-voice disordered. All completed the Newcastle Assessment of Pitch Discrimination which involved a two-tone PD task. Their voices were recorded producing standardized vocal tasks. Voice samples were acoustically analyzed for frequency-domain measures (fundamental frequency and its standard deviation, and harmonics-to-noise ratio) and spectral-domain measures (cepstral peak prominence and the Cepstral/Spectral Index of Dysphonia). Data were analyzed for the whole cohort and for musical and non-musical training backgrounds. RESULTS In the whole cohort, there were no significant correlations between PD and acoustic voice measures. PD accuracy in musically trained speakers was better than in non-trained speakers and correlated with fundamental frequency standard deviation in prolonged vowel tasks. Vocalists demonstrated superior PD accuracy and fundamental frequency standard deviation in prolonged vowels compared to instrumentalists but did not show significant correlations between PD and acoustic measures. The Newcastle Assessment of Pitch Discrimination was a reliable tool, showing moderate-good prediction value in differentiating musical background. CONCLUSIONS There was little evidence of a relationship between PD and acoustic measures of voice quality, regardless of musical training background and superior PD accuracy among the musically trained. These data do not support ideas concerning the co-development of perception and action among individuals identified as having voice quality measures within normal ranges. Numerous measures of voice quality, including measures sensitive to pitch, did not distinguish across musically and non-musically trained individuals, despite individual differences in pitch discrimination.
Collapse
Affiliation(s)
- Emily Wing-Tung Yun
- Discipline of Speech Pathology, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia; Doctor Liang Voice Program, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia
| | - Duy Duong Nguyen
- Discipline of Speech Pathology, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia; Doctor Liang Voice Program, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia
| | - Paul Carding
- Oxford Institute of Nursing, Midwifery and Allied Health Research, Oxford Brookes University, Oxford, England
| | - Nicola J Hodges
- School of Kinesiology, University of British Columbia, Vancouver, British Columbia, Canada
| | - Antonia Margarita Chacon
- Discipline of Speech Pathology, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia; Doctor Liang Voice Program, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia
| | - Catherine Madill
- Discipline of Speech Pathology, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia; Doctor Liang Voice Program, Faculty of Medicine and Health, Sydney School of Health Sciences, The University of Sydney, Sydney, Australia.
| |
Collapse
|
10
|
Lee YW, Kim GH. Usefulness of Direct Magnitude Estimation (DME) and Acoustic Analysis in Measuring Dysphonia Severity. J Voice 2024:S0892-1997(24)00225-X. [PMID: 39179470 DOI: 10.1016/j.jvoice.2024.07.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2024] [Revised: 07/11/2024] [Accepted: 07/12/2024] [Indexed: 08/26/2024]
Abstract
OBJECTIVES The purposes of this study were (1) to analyze the reliability of direct magnitude estimation (DME) in auditory perceptual assessments measuring dysphonia severity and (2) to analyze the relationship between DME and four acoustic parameters (cepstral peak prominence [CPP], cepstral peak prominence-smoothed [CPPs], Acoustic Voice Quality Index [AVQI], and Acoustic Breathiness Index [ABI]) and (3) to predict dysphonia severity based on DME using four acoustic parameters. STUDY DESIGN One hundred and sixty-one voice samples for dysphonia patients were used. In this study, we combined sustained vowel samples and connected speech samples using the Praat software to make the concatenated samples for implementing acoustic analysis and auditory perceptual assessments. For acoustic analysis, we analyzed each value of CPP, CPPs, AVQI, and ABI. For auditory perceptual assessments, three speech-language pathologists evaluated dysphonia severity from the concatenated samples. Finally, we performed a stepwise multiple regression analysis to verify which combination of the four acoustic parameters could best predict perceived dysphonia severity based on the DME. RESULTS DME was found to have high reliability for auditory perceptual assessments measuring dysphonia severity, and there was a significant correlation between DME and four acoustic parameters. Finally, a two-variable model (AVQI and ABI) was useful for predicting perceived dysphonia severity based on the DME. CONCLUSIONS We verified the usefulness of DME scales in judging the dysphonia severity of dysphonic patients when used with acoustic analysis. Also, the two-variable model was useful to predict perceived dysphonia severity based on the DME.
Collapse
Affiliation(s)
- Yeon Woo Lee
- Department of Speech-Language Pathology, Kosin University, Busan, Republic of Korea
| | - Geun Hyo Kim
- Department of Otorhinolaryngology-Head and Neck Surgery and Biomedical Research Institute, Pusan National University Hospital, Busan, Republic of Korea.
| |
Collapse
|
11
|
Nanjundaswamy RKB, Jayakumar T. Comparison of Two Multiparameter Acoustic Voice Outcome Indices in the Treatment of Hyperfunctional Voice Disorders: Dysphonia Severity Index and Acoustic Voice Quality Index. J Voice 2024:S0892-1997(24)00174-7. [PMID: 38906742 DOI: 10.1016/j.jvoice.2024.06.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 05/31/2024] [Accepted: 06/01/2024] [Indexed: 06/23/2024]
Abstract
INTRODUCTION The Dysphonia Severity Index (DSI) and Acoustic Voice Quality Index (AVQI) are the two widely used multiparameter acoustic instrumented indices that estimate dysphonia severity and track treatment outcomes. This study compared the performance of these two indices in identifying voice quality changes with eclectic voice therapy in individuals with hyperfunctional voice disorders (HFVD). METHOD Twenty individuals with HFVD including eight males and 13 females in the age range of 20-55 years received an eclectic voice therapy program named the Comprehensive Voice Habilitation Program. All the participants attended 15 sessions of voice therapy. DSI and AVQI measures were obtained at the baseline, immediate post therapy, 15 days post therapy (follow-up 1), and 60 days post therapy (follow-up 2). Repeated measures analysis of variance was performed to verify whether there were any differences between the time points for dependent variables DSI and AVQI. The effect sizes obtained for the DSI and AVQI measures were also noted. RESULTS A significant difference was obtained between the baseline and post therapy, follow-up 1 and follow-up 2 for AVQI measure with a very large effect size, ηp2 = 0.451. In contrast, DSI showed a significant difference only between the baseline and follow-up 1 with effect size, ηp2 = 0.187. CONCLUSIONS The results of this study confirmed that both DSI and AVQI were effective in tracking the changes in the severity of dysphonia. However, when compared, AVQI appeared to be more sensitive than DSI in potentially reflecting the effect of eclectic voice therapy in HFVD.
Collapse
Affiliation(s)
| | - Thirunavukkarasu Jayakumar
- Department of Speech-Language Sciences, All India Institute of Speech and Hearing, Mysuru, Karnataka, India.
| |
Collapse
|
12
|
Heller Murray E, Yucel R. Longitudinal Evaluation of Cepstral Peak Prominence in Children. J Voice 2024:S0892-1997(24)00138-3. [PMID: 38760251 PMCID: PMC11568071 DOI: 10.1016/j.jvoice.2024.04.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 04/12/2024] [Accepted: 04/15/2024] [Indexed: 05/19/2024]
Abstract
OBJECTIVES To evaluate whether the acoustic measure of cepstral peak prominence changes during typical development in children 2-7. METHODS Data were retrospectively analyzed from the Arizona Child Acoustic Database Repository in this longitudinal cohort study. The Repository contains longitudinal data recordings from 63 total children between 2 and 7 years of age. Thirty-one children met the inclusion criteria for the current analysis (at least five time points of usable speech data, no history of speech or language difficulties, no significant dysphonia, and were monolingual speakers of American English). Cepstral peak prominence measures were calculated in Praat for each child, at each timepoint. Additional acoustic measures of vocal fundamental frequency, vocal intensity, and stimuli length were also calculated. These measures were chosen as previous work has shown they may impact cepstral peak prominence values. RESULTS Linear mixed-effects regression models examined the relationship between cepstral peak prominence and age, after controlling for vocal fundamental frequency, vocal intensity, and stimuli length. Within-participant effects of age were found, indicating a trajectory change in which cepstral peak prominence increases with age in this population. This positive relationship between a cepstral peak prominence and age was nonlinear, with a steeper slope between age and cepstral peak prominence after 5 years of age. CONCLUSIONS This is the first study to examine the typical developmental trajectory of cepstral peak prominence children between 2 and 7 years, a critical period of vocal development. Cepstral peak prominence increased with age, suggesting an increase in periodicity of vocal fold vibration that coincides with the significant vocal fold structural changes occurring during this time. Outcomes present important normative information on vocal development, essential for effectively understanding the difference between what vocal changes are part of normative development and what changes indicate a voice disorder.
Collapse
Affiliation(s)
- Elizabeth Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania.
| | - Recai Yucel
- Department of Epidemiology and Biostatistics, College of Public Health, Temple University, Philadelphia, Pennsylvania
| |
Collapse
|
13
|
Şimşek S, Aydinli FE, Taşkin A, Başar K, Yilmaz T, Özcebe E. Exploring the Relationship Between Acoustic Measurements and Self-Perception of Voice in Trans Women. J Voice 2024:S0892-1997(24)00086-9. [PMID: 38677906 DOI: 10.1016/j.jvoice.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/14/2024] [Accepted: 03/14/2024] [Indexed: 04/29/2024]
Abstract
OBJECTIVE This study aimed to explore the strength and direction of the relationship between spectral cepstral-based, time-based acoustic measures and the self-perception of voice in trans women. METHODS Forty-eight trans women were included in the study. Analysis of the sustained vowel phonation was performed using Multidimensional Voice Profile Analysis (MDVP), and spectral-cepstral analyses of the sustained vowel phonation, all-voiced weighted sentence, and spontaneous speech were made via Analysis of Dysphonia in Speech and Voice (ADSV) software. For self-perceptual evaluations, the Trans Woman Voice Questionnaire (TWVQ) and the Self-perception of Voice Femininity Scale (SPVF) were used. The correlation between MDVP, spectral-cepstral parameters, and TWVQ and SPVF scores was calculated. RESULTS The present study found a positive relationship between F0, SPVF, and TWVQ. Among the perturbation parameters, the jitter was the only one found to correlate with SPVF and TWVQ. The CPPF0 parameter was found to be associated with a more feminine voice perception and a higher voice-related quality of life in all speech samples in the present study. In addition, higher CPP values achieved from vowel phonation were associated with less feminine voice perception and lower voice-related quality of life. The present study also suggests a weak correlation with the SPVF and Cepstral Peak Prominence Standard Deviation (CPPF0 SD) of the spontaneous speech sample in a negative direction. CONCLUSIONS This study found weak and moderate levels of correlations between F0, jitter (%), CPP, CPPF0, CPPF0 SD parameters, and self-perceptual measures. These findings suggested that such a level of relationship is attributable to the fact that these tools evaluate different aspects of voice in accordance with the International Classification of Functioning System. According to this pioneering study, it would be beneficial to incorporate spectral-cepstral measures into the objective assessment protocol for trans women's voices.
Collapse
Affiliation(s)
- Sinem Şimşek
- Hacettepe University, Faculty of Health Science, Department of Speech and Language Therapy, Ankara, Turkey
| | - Fatma Esen Aydinli
- Hacettepe University, Faculty of Health Science, Department of Speech and Language Therapy, Ankara, Turkey.
| | - Ayşenur Taşkin
- Hacettepe University, Faculty of Health Science, Department of Speech and Language Therapy, Ankara, Turkey
| | - Koray Başar
- Hacettepe University, Faculty of Medicine, Department of Psychiatry, Ankara, Turkey
| | - Taner Yilmaz
- Hacettepe University, Faculty of Medicine, Department of Ear-Nose-Throat, Ankara, Turkey
| | - Esra Özcebe
- Hacettepe University, Faculty of Health Science, Department of Speech and Language Therapy, Ankara, Turkey
| |
Collapse
|
14
|
Alves JDN, de Almeida AAF, Yamasaki R, Lopes LW. The influence of listener experience, measurement scale and speech task on the reliability of auditory-perceptual evaluation of vocal quality. Codas 2024; 36:e20230175. [PMID: 38629682 PMCID: PMC11065405 DOI: 10.1590/2317-1782/20232023175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 09/24/2023] [Indexed: 04/19/2024] Open
Abstract
PURPOSE To assess the influence of the listener experience, measurement scales and the type of speech task on the auditory-perceptual evaluation of the overall severity (OS) of voice deviation and the predominant type of voice (rough, breathy or strain). METHODS 22 listeners, divided into four groups participated in the study: speech-language pathologist specialized in voice (SLP-V), SLP non specialized in voice (SLP-NV), graduate students with auditory-perceptual analysis training (GS-T), and graduate students without auditory-perceptual analysis training (GS-U). The subjects rated the OS of voice deviation and the predominant type of voice of 44 voices by visual analog scale (VAS) and the numerical scale (score "G" from GRBAS), corresponding to six speech tasks such as sustained vowel /a/ and /ɛ/, sentences, number counting, running speech, and all five previous tasks together. RESULTS Sentences obtained the best interrater reliability in each group, using both VAS and GRBAS. SLP-NV group demonstrated the best interrater reliability in OS judgment in different speech tasks using VAS or GRBAS. Sustained vowel (/a/ and /ɛ/) and running speech obtained the best interrater reliability among the groups of listeners in judging the predominant vocal quality. GS-T group got the best result of interrater reliability in judging the predominant vocal quality. CONCLUSION The time of experience in the auditory-perceptual judgment of the voice, the type of training to which they were submitted, and the type of speech task influence the reliability of the auditory-perceptual evaluation of vocal quality.
Collapse
Affiliation(s)
| | | | - Rosiane Yamasaki
- Universidade Federal de São Paulo - UNIFESP - São Paulo (SP), Brasil.
| | | |
Collapse
|
15
|
Mehta R, Mat Q, Maniaci A, Lelubre C, Duterme J. Influence of a Surgical Mask on Voice Analysis in Dysphonic Patients During the COVID-19 Pandemic. OTO Open 2024; 8:e102. [PMID: 38229973 PMCID: PMC10790191 DOI: 10.1002/oto2.102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Revised: 11/27/2023] [Accepted: 12/15/2023] [Indexed: 01/18/2024] Open
Abstract
Objective COVID-19 has radically changed medical practice. The main objective of this study was to assess the impact of surgical mask (SM) on voice quality analyzes in a group of patient with different common benign vocal organic pathologies. Study Design A cross-over study. Setting A group of 20 patients with different organic benign vocal pathologies was recruited from the ENT consultation of the University Hospital of Charleroi in Belgium. Methods On the day of the assessment, each subject underwent an endonasal laryngeal videostroboscopy followed by a voice analysis (VA) with and without a new SM. The following parameters were analyzed: fundamental frequency, maximum frequency, range in amplitude and frequency of the voice, jitter and maximum phonatory time. Results In this research, we showed that VA can be performed with an SM while not changing the measured vocal parameters. These results also suggest that for the same individual a VA performed before the pandemic without SM could be compared to one with a SM to follow the patient's evolution of his or her voice quality. Conclusion The wearing of an SM during VA should always be recommended in case of immunodeficiency, a contagious disease of the patient or during a (new) pandemic.
Collapse
Affiliation(s)
- Rupal Mehta
- Department of OtorhinolaryngologyC.H.U. CharleroiCharleroiBelgium
| | - Quentin Mat
- Department of OtorhinolaryngologyC.H.U. CharleroiCharleroiBelgium
- Faculty of Medicine and PharmacyUniversity of Mons (UMons)MonsBelgium
- COVID‐19 Task Force of the Young Otolaryngologists of the International Federations of Oto‐rhino‐laryngological Society (YO‐IFOS)ParisFrance
| | - Antonino Maniaci
- COVID‐19 Task Force of the Young Otolaryngologists of the International Federations of Oto‐rhino‐laryngological Society (YO‐IFOS)ParisFrance
- Faculty of Medicine and SurgeryUniversity of Enna “Kore”EnnaItaly
| | - Christophe Lelubre
- Faculty of Medicine and PharmacyUniversity of Mons (UMons)MonsBelgium
- Department of Internal MedicineC.H.U. CharleroiCharleroiBelgium
| | | |
Collapse
|
16
|
Cantor-Cutiva LC, Ramani SA, Walden PR, Hunter EJ. Screening of Voice Pathologies: Identifying the Predictive Value of Voice Acoustic Parameters for Common Voice Pathologies. J Voice 2023:S0892-1997(23)00390-9. [PMID: 38143203 PMCID: PMC11193840 DOI: 10.1016/j.jvoice.2023.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/01/2023] [Accepted: 12/04/2023] [Indexed: 12/26/2023]
Abstract
BACKGROUND Voice acoustic analysis is important for objectively assessing voice production and diagnosing voice disorders. AIM This study aimed to investigate the sensitivity of various voice acoustic parameters in differentiating common voice pathology types. METHODS Data from the publicly available Perceptual Voice Qualities Database were analyzed; the database includes recordings of participants with and without voice disorders. A wide range of acoustic parameters was estimated from the recordings, such as alpha ratio, harmonics-to-noise ratio (HNR), cepstral peak prominence smoothed (CPPS), pitch period entropy (PPE), fundamental frequency, jitter, shimmer, and sound pressure levels. The predictive capabilities of the parameters were evaluated using receiver operating characteristic curves. Linear regression analysis determined the associations between parameters and voice disorders. Principal component analysis was conducted to identify important parameters for distinguishing voice disorders. RESULTS AND CONCLUSION This study has identified significant differences in acoustic parameters between those with and without voice disorders. Notably, the combination of five parameters-namely, PPE, shimmer, jitter, CPPS, and HNR-was identified as a strong predictor in voice disorder screening. These findings contribute substantially to the field of voice disorders, offering valuable insights for screening and diagnosis.
Collapse
Affiliation(s)
| | - Sai Aishwarya Ramani
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | | | - Eric J Hunter
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, Iowa
| |
Collapse
|
17
|
Contreras RC, Viana MS, Fonseca ES, Dos Santos FL, Zanin RB, Guido RC. An Experimental Analysis on Multicepstral Projection Representation Strategies for Dysphonia Detection. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23115196. [PMID: 37299922 DOI: 10.3390/s23115196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2023] [Revised: 05/20/2023] [Accepted: 05/23/2023] [Indexed: 06/12/2023]
Abstract
Biometrics-based authentication has become the most well-established form of user recognition in systems that demand a certain level of security. For example, the most commonplace social activities stand out, such as access to the work environment or to one's own bank account. Among all biometrics, voice receives special attention due to factors such as ease of collection, the low cost of reading devices, and the high quantity of literature and software packages available for use. However, these biometrics may have the ability to represent the individual impaired by the phenomenon known as dysphonia, which consists of a change in the sound signal due to some disease that acts on the vocal apparatus. As a consequence, for example, a user with the flu may not be properly authenticated by the recognition system. Therefore, it is important that automatic voice dysphonia detection techniques be developed. In this work, we propose a new framework based on the representation of the voice signal by the multiple projection of cepstral coefficients to promote the detection of dysphonic alterations in the voice through machine learning techniques. Most of the best-known cepstral coefficient extraction techniques in the literature are mapped and analyzed separately and together with measures related to the fundamental frequency of the voice signal, and its representation capacity is evaluated on three classifiers. Finally, the experiments on a subset of the Saarbruecken Voice Database prove the effectiveness of the proposed material in detecting the presence of dysphonia in the voice.
Collapse
Affiliation(s)
- Rodrigo Colnago Contreras
- Department of Computer Science and Statistics, Institute of Biosciences, Letters and Exact Sciences, São Paulo State University, São José do Rio Preto 15054-000, SP, Brazil
| | | | | | | | - Rodrigo Bruno Zanin
- Faculty of Architecture and Engineering, Mato Grosso State University, Cáceres 78217-900, MT, Brazil
| | - Rodrigo Capobianco Guido
- Department of Computer Science and Statistics, Institute of Biosciences, Letters and Exact Sciences, São Paulo State University, São José do Rio Preto 15054-000, SP, Brazil
| |
Collapse
|
18
|
Saeedi S, Aghajanzadeh M, Khoddami SM, Dabirmoghaddam P, Jalaie S. Relationship of cepstral analysis with voice self-assessments in dysphonic and normal speakers. Eur Arch Otorhinolaryngol 2023; 280:1803-1813. [PMID: 36229669 DOI: 10.1007/s00405-022-07690-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Accepted: 10/04/2022] [Indexed: 11/03/2022]
Abstract
PURPOSE This study aimed to investigate the relationship of cepstral analysis (Cepstral Peak Prominence [CPP] and Cepstral Peak Prominence-Smoothed [CPPS]) with voice self-assessments (The Persian version of vocal tract discomfort [VTDp] scale and non-standard hoarseness self-assessment [NHS] questionnaire). METHODS 223 participants (159 with and 64 without dysphonia) were asked to utter the vocal tasks namely vowels /a/ and /e/, six standard sentences, and a non-standard connected speech sample. CPP and CPPS were calculated in these three vocal tasks using the "Praat" software. The participants also asked to complete the VTDp scale and the NHS questionnaire. RESULTS The means of frequency and severity the VTDp and the means of NHS were statistically different between the dysphonic and normal voice groups (P < 0.05), except for tickling (P > 0.05). There was a very low significant correlation between cepstral analysis with aching and in the dysphonic group (P < 0.05). However, a very low to low significant correlation between cepstral analysis with burning, tight, aching, tickling, sore, and both frequency and severity subscales scores of the VTDp in the normal voice group (P < 0.05). Moreover, the means of the cepstral analysis did not differ significantly between all scores of the NHS in the dysphonic the normal voice groups (P > 0.05), except for 1 with 3, 4, and 5 in the dysphonic group (P < 0.05). CONCLUSION The cepstral analysis can provide some information about the status of vocal tract and person's perception of his/her own voice quality.
Collapse
Affiliation(s)
- Saeed Saeedi
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahshid Aghajanzadeh
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran.
| | - Seyyedeh Maryam Khoddami
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Payman Dabirmoghaddam
- Otorhinolaryngology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Shohreh Jalaie
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
19
|
Echternach M, Nusseck M, Strasding M, Richter B. Differences of Electroglottographical Contact Quotients between Connected Speech and Sustained Phonation in Clinical Measurement of Voice. J Voice 2023:S0892-1997(23)00077-2. [PMID: 36941166 DOI: 10.1016/j.jvoice.2023.02.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 03/23/2023]
Abstract
INTRODUCTION In clinical practice, sustained phonation is mostly used for acoustic voice measurements, while perceptual evaluation is based on connected speech. Since sustained phonation could be associated with the use of the singing voice, and since vocal registers are more relevant for singing rather than speech, it is unclear if vocal registers contribute to observable vocal fold contact differences between sustained phonation and speech. MATERIAL AND METHODS Sustained phonation (vowel [a] on comfortable pitch and loudness) and connected speech (German text: Der Nordwind und die Sonne) were analyzed for 1216 subjects (426 with and 790 without dysphonia) using the Laryngograph system (combining electroglottography and audio recordings). From these samples, fundamental frequency (ƒo), contact quotient (CQ), sound pressure level (SPL) and frequency perturbation (jitter first for sustained and cFx for connected speech) were evaluated. RESULTS Compared to connected speech, the values of ƒo and SPL were higher for sustained phonation. For female voices, ƒo difference was greater than for male voices. At the same time, and only for the females, CQ was lower for the sustained phonation, indicating a register difference. CONCLUSION In order to achieve a better comparability, sustained phonation should be standardized regarding the ƒo and SPL values in correspondence to the ƒo and SPL range of reading a text. This should also reduce the risk of using a different register for different types of phonation.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany.
| | - Manfred Nusseck
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Malin Strasding
- Division of Fixed Prosthodontics and Biomaterials, Université de Genève, Geneve, Switzerland
| | - Bernhard Richter
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
20
|
Anand S. Perceptual and Computational Estimates of Vocal Breathiness and Roughness in Sustained Phonation and Connected Speech. J Voice 2023:S0892-1997(23)00069-3. [PMID: 36933971 DOI: 10.1016/j.jvoice.2023.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/10/2023] [Accepted: 02/13/2023] [Indexed: 03/18/2023]
Abstract
OBJECTIVE Clinical assessment of voice quality (VQ) often uses a combination of sustained phonations and more prolonged and more complex vocalizations. The purpose of this study was to compare the perceived vocal breathiness and vocal roughness of sustained phonations and connected speech over a wide range of dysphonia severity and to evaluate their relationship with acoustic measures and bioinspired models of breathiness and roughness. METHODS VQ dimension-specific single-variable matching task (SVMT) was used to index the perceived breathiness or roughness of five male and five female talkers on the basis of a sustained /a/ phonation and the 5th CAPE-V sentence. Acoustic measures of cepstral peak, autocorrelation peak and psychoacoustic measures of pitch strength, and temporal envelope standard deviation (EnvSD) was used to predict perceived breathiness and roughness judgments obtained from 10 listeners, respectively. RESULTS High intra- and inter-listener reliability was observed for sustained phonations and connected speech. Perceived breathiness and roughness of sustained vowels and sentences obtained using SVMT were highly correlated for most dysphonic voices. The pitch strength model of breathiness was able to capture larger amount of perceptual variance compared to cepstral peak in both vowels and sentences. Autocorrelation peak was strongly correlated to perceived roughness in sentences while EnvSD was strongly correlated to perceived roughness in vowels. CONCLUSIONS Results provide evidence that perception of VQ via SVMT can be successfully extended to connected speech. Computational models of VQ can be easily adapted to connected speech. Such automated models of VQ perception are valuable due to their computational efficiency and their ability to accurately capture the non-linearities of the human auditory system.
Collapse
Affiliation(s)
- Supraja Anand
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida.
| |
Collapse
|
21
|
Saeedi S, Khoddami SM, Dabirmoghaddam P, Jalaie S, Aghajanzadeh M. Relationship Between Aerodynamic Measurement of Maximum Phonation Time With Acoustic Analysis and the Effects of Sex and Dysphonia Type. J Voice 2023:S0892-1997(23)00081-4. [PMID: 36990864 DOI: 10.1016/j.jvoice.2023.02.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/18/2023] [Accepted: 02/20/2023] [Indexed: 03/29/2023]
Abstract
OBJECTIVES/HYPOTHESIS This study set out to uncover the correlation between maximum phonation time (MPT) with acoustic and cepstral analysis in the dysphonic and control groups, considering the effects of sex and dysphonia type. METHODS For this cross-sectional study, a sample of 179 attendees (141 dysphonic and 38 control) were randomly selected and requested to sustain the vowel /a/ as long as they could with their habitual pitch and loudness. Reading standard sentences and conversational connected speech tasks were obtained too. Using Praat, the MPT, jitter, shimmer, noise-to-harmonic ratio, cepstral peak prominence (CPP), and smoothed cepstral peak prominence (CPPS) were calculated in the target vocal tasks. RESULTS There was a very low to low significant correlation (r = 0.00-0.50) between MPT amounts and acoustic analysis in the dysphonic group (P < 0.05), except for between MPT with shimmer (P > 0.05). In contrast, findings showed no significant correlation between MPT and acoustic analysis in the control group, not even separated by sex (P > 0.05). There was a very low to low correlation between MPT amounts and acoustic analysis in the male dysphonic group (P < 0.05), except for the MPT with shimmer (P > 0.05). There was no significant correlation between MPT and acoustic analysis in the female dysphonic group (P > 0.05), except for MPT with CPP (sustained vowel) (P < 0.05). Finally, very low to high correlations between MPT and some of the acoustic analysis in all the different dysphonia types were observed (P < 0.05). CONCLUSIONS MPT contains some information about the acoustic features of the dysphonic voice, specifically the CPP and smoothed cepstral peak prominence. The data suggested that the observed relationship between MPT and the acoustic analysis has the capacity to be considered for the development of new multiparametric tests of voice assessment in dysphonia, regarding the sex and dysphonia type.
Collapse
Affiliation(s)
- Saeed Saeedi
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Seyyedeh Maryam Khoddami
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Payman Dabirmoghaddam
- Otorhinolaryngology Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Shohreh Jalaie
- Department of Physiotherapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran
| | - Mahshid Aghajanzadeh
- Department of Speech Therapy, School of Rehabilitation, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
22
|
Hosokawa K, Iwahashi T, Iwahashi M, Iwaki S, Kato C, Yoshida M, Yoshida D, Kitayama I, Umatani M, Matsushiro N, Ogawa M, Inohara H. The Significant Influence of Hoarseness Levels in Connected Speech on the Voice-Related Disability Evaluated Using Voice Handicap Index-10. J Voice 2023; 37:290.e7-290.e16. [PMID: 33376022 DOI: 10.1016/j.jvoice.2020.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Revised: 11/17/2020] [Accepted: 11/19/2020] [Indexed: 11/18/2022]
Abstract
OBJECTIVES This retrospective study examines the influence of voice quality in connected speech (CS) and sustained vowels (SV) on the voice-related disability in patients' daily living documented by Voice Handicap Index-10 (VHI-10). METHODS A total of 500 voice recordings of CS and SV samples from 338 patients with voice disturbances were included, along with the patients' age, diagnoses, maximum phonation time, and VHI-10. Dataset-1 comprised of 338 untreated patients, whereas Dataset-2 included 162 patients before and after phonosurgeries. As a preliminary study, the concurrent and diagnostic validities based on auditory-perceptual judgments were examined for cepstral peak prominence (CPP) and CPP smoothed (CPPS) for CS and SV tasks. Next, simple correlations and multivariate regression analyses (MRA) were performed to identify which of the acoustic measures for the CS or SV tasks significantly influenced the total score or improvement of VHI-10. RESULTS The preliminary study confirmed high correlations with hoarseness levels as well as the excellent diagnostic accuracy of CPP and CPPS for both CS and SV tasks. In Dataset-1, the simple correlations and MRA results showed that cepstral measures in both tasks demonstrated moderate correlations with, and significant contribution to the total score of VHI-10, respectively. However, in Dataset-2, the changes of cepstral measures, as well as the median pitch after phonosurgeries in the CS tasks only, showed significant contributions to the improvement of VHI-10. CONCLUSION The study demonstrated that the hoarseness levels in both the CS and SV tasks equivalently influenced the VHI-10 scores, and that the post-surgical change of voice quality only in the CS tasks influenced the improvement of voice-related disability in daily living.
Collapse
Affiliation(s)
- Kiyohito Hosokawa
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan; Department of Otorhinolaryngology, Japan Community Health care Organization (JCHO) Osaka Hospital, Osaka-city, Osaka, Japan; Department of Otorhinolaryngology, Osaka Police Hospital, Osaka-city, Osaka, Japan.
| | - Toshihiko Iwahashi
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan
| | - Mio Iwahashi
- Nimura ENT Voice Clinic, Osaka-city, Osaka, Japan
| | - Shinobu Iwaki
- Department of Otorhinolaryngology and Head & Neck Surgery, Kobe University Graduate School of Medicine, Kobe-city, Hyogo, Japan
| | - Chieri Kato
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan
| | - Misao Yoshida
- Department of Rehabilitation, Nishinomiya Kaisei Hospital, Nishinomiya-city, Hyogo, Japan
| | - Daichi Yoshida
- Department of Otorhinolaryngology, Japan Community Health care Organization (JCHO) Osaka Hospital, Osaka-city, Osaka, Japan
| | - Itsuki Kitayama
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan; Department of Otorhinolaryngology, Japan Community Health care Organization (JCHO) Osaka Hospital, Osaka-city, Osaka, Japan
| | - Masanori Umatani
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan
| | - Naoki Matsushiro
- Department of Otorhinolaryngology, Osaka Police Hospital, Osaka-city, Osaka, Japan
| | - Makoto Ogawa
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan; Department of Otorhinolaryngology, Japan Community Health care Organization (JCHO) Osaka Hospital, Osaka-city, Osaka, Japan
| | - Hidenori Inohara
- Department of Otorhinolaryngology and Head & Neck Surgery, Osaka University Graduate School of Medicine, Suita-city, Japan
| |
Collapse
|
23
|
Moein N, Dehqan A, Scherer RC. Chronic voice disorder after coronavirus disease 2019 infection and its treatment using the cricothyroid visor maneuver: a case report. J Med Case Rep 2023; 17:67. [PMID: 36841775 PMCID: PMC9968215 DOI: 10.1186/s13256-023-03780-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Accepted: 01/16/2023] [Indexed: 02/27/2023] Open
Abstract
BACKGROUND Regarding human coronavirus, the severe acute respiratory syndrome coronavirus 2 pandemic, the novelty of disease, and consequently the lack of studies, the etiology of dysphonia in patients with coronavirus disease 2019 is still unknown and needs to be investigated. The purpose of the current study is to investigate the effect of a new manual therapy technique, cricothyroid visor maneuver, on muscle tension dysphonia symptoms for a patient who had experienced dysphonia symptoms due to the coronavirus disease 2019 infection. CASE PRESENTATION A 55-year-old retired Iranian teacher who was diagnosed with muscle tension dysphonia by an otolaryngologist participated in this study. Fifty days before being referred to an otolaryngologist, he was diagnosed with coronavirus disease 2019 on the basis of the results of a standard laboratory test, namely real-time polymerase chain reaction. Treatment was provided in ten sessions. Pre- and post-treatment audio recordings of sustained vowels, selected sentences, and connected speech samples were submitted for auditory perceptual and acoustic analysis to assess the effects of the treatment program. Also, videolaryngostroboscopy voice quality perceptions by the patient, both before and after therapy, were assessed. The reduction in all features of the Consensus Auditory-Perceptual Evaluation of Voice was observed. The results of acoustic assessment showed that jitter (35.13%) and shimmer (20.48%) decreased; moreover, the harmonics-to-noise ratio (1.17%), cepstral peak prominence smoothed (28.53%) and maximum phonation time (15.5%) increased after treatment sessions. The scores of four parameters of Stroboscopy Examination Rating Form (SERF) form changed after cricothyroid visor maneuver therapy. Also, the visual analog scales score at the pre-treatment assessment was 40, and increased to 90 at the post-treatment assessment. CONCLUSIONS The effectiveness of cricothyroid visor maneuver therapy on dysphonia associated with coronavirus disease 2019 was investigated in the current study. This case study has highlighted chronic dysphonia after coronavirus disease 2019 infection, and suggests that the cricothyroid visor maneuver therapy approach may have positive outcomes for patients with muscle tension dysphonia with this background.
Collapse
Affiliation(s)
- Narges Moein
- grid.411746.10000 0004 4911 7066Department of Speech Language Pathology, School of Rehabilitation Sciences, Iran University of Medical Sciences, Madadkaran St., Shahnazari Ave., Mirdamad Blvd., Madar Sq., Tehran, Iran
| | - Ali Dehqan
- Rehabilitation Sciences Research Center, Zahedan University of Medical Sciences, Zahedan, Iran.
| | - Ronald C. Scherer
- grid.253248.a0000 0001 0661 0035Department of Communication Disorders, Bowling Green State University, Bowling Green, OH USA
| |
Collapse
|
24
|
Dehqan A. Outcomes of Cricothyroid Visor Maneuver (CVM) for Treatment of Vocal Polyp: A Case Report. J Voice 2023; 37:144.e1-144.e7. [PMID: 33199079 DOI: 10.1016/j.jvoice.2020.11.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 10/30/2020] [Accepted: 11/02/2020] [Indexed: 01/11/2023]
Abstract
OBJECTIVE The aim of the study is the clinical investigation of a patient with a vocal fold polyp, and the visual, acoustical, perceptual, and self-report changes before and after using the cricothyroid visor manoeuvre (CVM). DESIGN A 48-year-old female university professor, gynecologist, and obstetrician with a history of laryngopharyngeal reflux and a left vocal polyp participated. Treatment was provided in 10 sessions. Pre- and post-treatment audio recordings of sustained vowels, selected sentences, and connected speech samples were submitted to auditory-perceptual and acoustical analysis to assess the effects of the two-treatment program. Also, laryngoscopic images, perceptions by the patient about her voice quality and quality of life before and after therapy were assessed. RESULTS Improvements in acoustic parameters were obtained especially in perturbation and CCPS parameters. The overall voice quality scores on the CAPE-V were moderate before therapy and became mild after therapy. Laryngoscopy images demonstrated improvement in the glottis closure configuration in two phases (open and close) in pre- and post- CVM therapy and a decrease in polyp size. The patient had improvement in VAS, IVQLP, and VRQOL scores. CONCLUSION The CVM therapy used in the study resulted in positive changes in acoustic and perceptual-auditory aspects of voice production, self-report, and QOL for the patient. The CVM approach appears to have been effective for this case in decreasing the polyp size or its regression or for vocal adaptation.
Collapse
Affiliation(s)
- Ali Dehqan
- Cellular and Molecular Research Center, Zahedan University of Medical Sciences, Zahedan, Iran; Department of Speech therapy, School of Rehabilitation, Zahedan University of Medical Sciences, Zahedan, Iran.
| |
Collapse
|
25
|
Shrivas A, Deshpande S, Gidaye G, Nirmal J, Ezzine K, Frikha M, Desai K, Shinde S, Oza AD, Burduhos-Nergis DD, Burduhos-Nergis DP. Employing Energy and Statistical Features for Automatic Diagnosis of Voice Disorders. Diagnostics (Basel) 2022; 12:diagnostics12112758. [PMID: 36428819 PMCID: PMC9689977 DOI: 10.3390/diagnostics12112758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/30/2022] [Accepted: 11/09/2022] [Indexed: 11/16/2022] Open
Abstract
The presence of laryngeal disease affects vocal fold(s) dynamics and thus causes changes in pitch, loudness, and other characteristics of the human voice. Many frameworks based on the acoustic analysis of speech signals have been created in recent years; however, they are evaluated on just one or two corpora and are not independent to voice illnesses and human bias. In this article, a unified wavelet-based paradigm for evaluating voice diseases is presented. This approach is independent of voice diseases, human bias, or dialect. The vocal folds' dynamics are impacted by the voice disorder, and this further modifies the sound source. Therefore, inverse filtering is used to capture the modified voice source. Furthermore, the fundamental frequency independent statistical and energy metrics are derived from each spectral sub-band to characterize the retrieved voice source. Speech recordings of the sustained vowel /a/ were collected from four different datasets in German, Spanish, English, and Arabic to run the several intra and inter-dataset experiments. The classifiers' achieved performance indicators show that energy and statistical features uncover vital information on a variety of clinical voices, and therefore the suggested approach can be used as a complementary means for the automatic medical assessment of voice diseases.
Collapse
Affiliation(s)
- Avinash Shrivas
- Department of Computer Science & Technology, Degree College of Physical Education, Sant Gadge Baba Amravati University, Amravati 444605, India
- Correspondence: (A.S.); (D.P.B.-N.); Tel.: +91-9819261821 (A.S.)
| | - Shrinivas Deshpande
- Department of Computer Science & Technology, Degree College of Physical Education, Sant Gadge Baba Amravati University, Amravati 444605, India
| | - Girish Gidaye
- Department of Electronics and Computer Science, Vidyalankar Institute of Technology, Mumbai University, Mumbai 400037, India
| | - Jagannath Nirmal
- Department of Electronics Engineering, Somaiya Vidyavihar University, Mumbai 400077, India
| | - Kadria Ezzine
- ATISP, ENET’COM, Sfax University, Sfax 3000, Tunisia
| | | | - Kamalakar Desai
- Department of Electronics and Telecommunication Engineering, Bharati Vidyapeeth’s College of Engineering, Shivaji University, Kolhapur 416013, India
| | - Sachin Shinde
- Department of Mechanical Engineering, Datta Meghe College of Engineering, Mumbai University, Airoli, Navi Mumbai 400708, India
| | - Ankit D. Oza
- Department of Computer Sciences and Engineering, Institute of Advanced Research, The University for Innovation, Gandhianagar 382426, India
| | - Dumitru Doru Burduhos-Nergis
- Faculty of Materials Science and Engineering, Gheorghe Asachi Technical University of Iasi, 700050 Iasi, Romania
| | - Diana Petronela Burduhos-Nergis
- Faculty of Materials Science and Engineering, Gheorghe Asachi Technical University of Iasi, 700050 Iasi, Romania
- Correspondence: (A.S.); (D.P.B.-N.); Tel.: +91-9819261821 (A.S.)
| |
Collapse
|
26
|
Zainaee S, Khadivi E, Jamali J, Sobhani-Rad D, Maryn Y, Ghaemi H. The acoustic voice quality index, version 2.06 and 3.01, for the Persian-speaking population. JOURNAL OF COMMUNICATION DISORDERS 2022; 100:106279. [PMID: 36399989 DOI: 10.1016/j.jcomdis.2022.106279] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/05/2022] [Accepted: 11/07/2022] [Indexed: 06/16/2023]
Abstract
INTRODUCTION Dysphonia assessment includes approaches like acoustic analysis, which is non-invasive and easy to use and provides an understandable numerical output. The Acoustic Voice Quality Index (AVQI) is an acoustic model that can detect dysphonia. The Persian language is spoken by around 70,000,000 native speakers. Since AVQI versions 2.06 and 3.01 have not been validated for the Persian yet, this study investigated their concurrent validity and diagnostic accuracy among the Persian-speaking population. METHODS This scale development study was conducted from 2020 to 2021 on 180 normophonic and dysphonic native Persian-speaking residents of Mashhad, Iran. Five raters rated the samples by auditory-perceptual-judgments, including Grade from the Grade-Rough-Breathy-Asthenic-Strained (an ordinal scale) and the overall dysphonia severity from the Persian version Consensus Auditory Perceptual Evaluation of Voice (a continuous scale) to investigate both versions' concurrent validity. The intra- and inter-rater reliability and concurrent validity were evaluated for both scales. Both versions' diagnostic accuracy was assessed by the receiver operating characteristic, and the optimal thresholds were determined. RESULTS AVQI-version-2-Persian thresholds of 3.47 and 4.04 provided sensitivity of 88.30% and 85.53% and specificity of 79.07% and 85.58% by the ordinal and continuous scales, respectively. AVQI-version-3-Persian thresholds of 3.07 and 3.03 also rendered sensitivity of 74.47% and 85.53%, and specificity of 97.67% and 91.35% by the ordinal and continuous scales sequentially. CONCLUSION The significant values of concurrent validities and diagnostic accuracies of both versions of AVQI-Persian confirmed that it can discriminate between normal and pathological voices among the Persian-speaking population. Hence, it can be used for screening or diagnosis purposes.
Collapse
Affiliation(s)
- Shahryar Zainaee
- Department of Speech Therapy, School of Paramedical sciences, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Ehsan Khadivi
- Sinus and Surgical Endoscopic Research Center, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Jamshid Jamali
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Davood Sobhani-Rad
- Department of Speech Therapy, School of Paramedical sciences, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Youri Maryn
- Department of Speech, Language and Hearing Sciences, Faculty of Medicine and Health Sciences, University of Ghent, Ghent, Belgium
| | - Hamide Ghaemi
- Department of Speech Therapy, School of Paramedical sciences, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
27
|
Heller Murray ES, Chao A, Colletti L. A Practical Guide to Calculating Cepstral Peak Prominence in Praat. J Voice 2022:S0892-1997(22)00275-2. [PMID: 36210224 DOI: 10.1016/j.jvoice.2022.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 09/01/2022] [Accepted: 09/02/2022] [Indexed: 11/05/2022]
Abstract
The acoustic measure of cepstral peak prominence (CPP) is recommended for the analysis of dysphonia. Yet, clinical use of this measure is not universal, as clinicians and researchers are still learning the strengths and limitations of this measure. Furthermore, affordable access to specialized acoustic software is a significant barrier to universal CPP use. This article will provide a guide on how to calculate CPP in Praat, a free software program, using a new CPP plugin. Important external factors that could influence CPP measures are discussed, and suggestions for clinical use are provided. As CPP becomes more widely used by clinicians and researchers, it is important to consider external factors that may inadvertently influence CPP values. Controlling for these external factors will aid in reducing variability across CPP values, which will make CPP a valuable tool for both clinical and research purposes.
Collapse
Affiliation(s)
- Elizabeth S Heller Murray
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania.
| | - Andie Chao
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania
| | - Lauren Colletti
- Department of Communication Sciences and Disorders, College of Public Health, Temple University, Philadelphia, Pennsylvania
| |
Collapse
|
28
|
Mehta R, Mat Q, Lelubre C, Lechien JR, Duterme JP. Influence of a surgical mask on voice analysis in healthy subjects in the COVID-19 pandemic: A cross-over study. Clin Otolaryngol 2022; 47:692-695. [PMID: 35836337 PMCID: PMC9349984 DOI: 10.1111/coa.13964] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2021] [Revised: 05/17/2022] [Accepted: 06/18/2022] [Indexed: 11/26/2022]
Affiliation(s)
- Rupal Mehta
- Departement of Otorhinolaryngology, C.H.U. Charleroi, Charleroi, Belgium
| | - Quentin Mat
- Departement of Otorhinolaryngology, C.H.U. Charleroi, Charleroi, Belgium.,Faculty of Medicine and Pharmacy, University of Mons (UMons), Mons, Belgium.,COVID-19 Task Force of the Young Otolaryngologists of the International Federations of Oto-rhino-laryngological Society (YO-IFOS), Paris, France
| | - Christophe Lelubre
- Faculty of Medicine and Pharmacy, University of Mons (UMons), Mons, Belgium.,Department of Internal Medicine, C.H.U. Charleroi, Charleroi, Belgium
| | - Jerome René Lechien
- Faculty of Medicine and Pharmacy, University of Mons (UMons), Mons, Belgium.,COVID-19 Task Force of the Young Otolaryngologists of the International Federations of Oto-rhino-laryngological Society (YO-IFOS), Paris, France.,Department of Otolaryngology-Head and Neck Surgery, Foch Hospital, School of Medicine, UFR Simone Veil, Université Versailles Saint-Quentin-en-Yvelines (Paris Saclay University), Paris, France
| | | |
Collapse
|
29
|
Ghasemzadeh H, Doyle PC, Searl J. Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:580. [PMID: 35931551 PMCID: PMC9458292 DOI: 10.1121/10.0012734] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 06/09/2022] [Accepted: 06/30/2022] [Indexed: 06/15/2023]
Abstract
Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.
Collapse
Affiliation(s)
- Hamzeh Ghasemzadeh
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, One Bowdoin Square, 11th Floor, Boston, Massachusetts 02114, USA
| | - Philip C Doyle
- Department of Otolaryngology Head and Neck Surgery, Division of Laryngology, Stanford University School of Medicine, Stanford University, 801 Welch Road, Stanford, California. 94305, USA
| | - Jeff Searl
- Department of Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, Oyer Speech & Hearing Building, East Lansing, Michigan 48824, USA
| |
Collapse
|
30
|
Verde L, De Pietro G, Sannino G. Artificial Intelligence Techniques for the Non-invasive Detection of COVID-19 Through the Analysis of Voice Signals. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2021; 48:1-11. [PMID: 34642613 PMCID: PMC8500467 DOI: 10.1007/s13369-021-06041-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Accepted: 07/30/2021] [Indexed: 11/17/2022]
Abstract
Healthcare sensors represent a valid and non-invasive instrument to capture and analyse physiological data. Several vital signals, such as voice signals, can be acquired anytime and anywhere, achieved with the least possible discomfort to the patient thanks to the development of increasingly advanced devices. The integration of sensors with artificial intelligence techniques contributes to the realization of faster and easier solutions aimed at improving early diagnosis, personalized treatment, remote patient monitoring and better decision making, all tasks vital in a critical situation such as that of the COVID-19 pandemic. This paper presents a study about the possibility to support the early and non-invasive detection of COVID-19 through the analysis of voice signals by means of the main machine learning algorithms. If demonstrated, this detection capacity could be embedded in a powerful mobile screening application. To perform this important study, the Coswara dataset is considered. The aim of this investigation is not only to evaluate which machine learning technique best distinguishes a healthy voice from a pathological one, but also to identify which vowel sound is most seriously affected by COVID-19 and is, therefore, most reliable in detecting the pathology. The results show that Random Forest is the technique that classifies most accurately healthy and pathological voices. Moreover, the evaluation of the vowel /e/ allows the detection of the effects of COVID-19 on voice quality with a better accuracy than the other vowels.
Collapse
Affiliation(s)
- Laura Verde
- Department of Mathematics and Physics, University of Campania “Luigi Vanvitelli”, viale Lincoln, 5, 81100 Caserta Italy
| | - Giuseppe De Pietro
- Institute of High–Performance Computing and Networking (ICAR) - National Research Council of Italy (CNR), via Pietro Castellino, 111, 80131 Naples Italy
| | - Giovanna Sannino
- Institute of High–Performance Computing and Networking (ICAR) - National Research Council of Italy (CNR), via Pietro Castellino, 111, 80131 Naples Italy
| |
Collapse
|
31
|
Vásquez-Correa JC, Rios-Urrego CD, Arias-Vergara T, Schuster M, Rusz J, Nöth E, Orozco-Arroyave JR. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages. Pattern Recognit Lett 2021. [DOI: 10.1016/j.patrec.2021.04.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Demirci AN, Köse A, Aydinli FE, İncebay Ö, Yilmaz T. Investigating the cepstral acoustic characteristics of voice in healthy children. Int J Pediatr Otorhinolaryngol 2021; 148:110815. [PMID: 34217000 DOI: 10.1016/j.ijporl.2021.110815] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/16/2021] [Accepted: 06/24/2021] [Indexed: 10/21/2022]
Abstract
OBJECTIVES This study aimed to determine the cepstral acoustic parameters that vary depending on age and gender in vocally healthy children, and to establish normative data for cepstral analysis. BACKGROUND Cepstral measurements are among the strongest predictors of auditory-perceptual evaluation of voice and differentiate between healthy and dysphonic voices. More spesificially, ceptral peak prominence is accepted to be as a strong acoustic predictor of breathiness and overall severity of dysphonia. Cepstral measures determine voice quality reliably not only in sustained vowel samples but also in running speech samples. Determining the parameters related to the acoustic profile of children with normal voices can lead us to a better understanding of the effect of changes in the larynx and vocal fold structure during growth and development. There is a limited number of norm studies examining the cepstral acoustic properties of pediatric voice. Determining norm-specific values and clinical guidelines of cepstral acoustics according to the age and gender in vocally healthy children are utmost important. METHODS A total of 160 vocally healthy children were divided into the following four age groups: Group-I included children within the age range of 4-7 years, Group-II included 7-11 years, Group-III 11-14 years, and Group-IV included children within the age range of 14-18 years. An equal number of male and female participants were assigned to each group. PENTAX Medical CSL Model 4500 was used for recording all tasks. For acoustic analysis, Multi-Dimensional Voice Program and Analysis of Dysphonia in Speech and Voice were used. RESULTS Cepstral Peak Prominence (CPP), Cepstral Peak Prominence Standard Deviation (CPP SD), and Low-To-High Spectral Ratio (L/H Ratio) increased with age. It is found that the CPP parameter of all-voiced sentences and nasal-weighted sentences increased with age in boys, while no significant pattern was observed in any sample for girls. For L/H ratio, it can be said that there is a general increase with age in all speech samples, except for the vowel-weighted and voiceless plosive sentence samples, evident especially in the group above the age of 15 years. This study concluded that the CPP SD parameter in the vowel-weighted sentences increased with age in boys. It was also noticed in this study that CPP F0 standard deviation (SD) intervals were narrower in vowel-weighted, easy onset, and voiceless plosive sentence samples than in all-voiced, hard glottal attack and nasal-weighted sentence samples. CONCLUSION This study established cepstral acoustic normative values for a wide age range of the pediatric population. It is thought that age and gender specific cepstral acoustic findings presented in this study contributed to the related literature. In addition, to our knowledge, this is the first study that provides a normative cepstral acoustic database of the CAPE-V/Turkish sentences in the pediatric population.
Collapse
Affiliation(s)
- Ayşe Nur Demirci
- Department of Speech and Language Therapy, Hacettepe University Faculty of Health Sciences, Hacettepe, Ankara, Turkey.
| | - Ayşen Köse
- Department of Speech and Language Therapy, Hacettepe University Faculty of Health Sciences, Hacettepe, Ankara, Turkey
| | - Fatma Esen Aydinli
- Department of Speech and Language Therapy, Hacettepe University Faculty of Health Sciences, Hacettepe, Ankara, Turkey
| | - Önal İncebay
- Department of Speech and Language Therapy, Hacettepe University Faculty of Health Sciences, Hacettepe, Ankara, Turkey
| | - Taner Yilmaz
- Department of Otolaryngology-Head and Neck Surgery, Hacettepe University Faculty of Medicine, Hacettepe, Ankara, Turkey
| |
Collapse
|
33
|
Nip ISB, Garellek M. Voice Quality of Children With Cerebral Palsy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3051-3059. [PMID: 34260269 PMCID: PMC8740668 DOI: 10.1044/2021_jslhr-20-00633] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 02/08/2021] [Accepted: 03/29/2021] [Indexed: 05/19/2023]
Abstract
Purpose Many children with cerebral palsy (CP) are described as having altered vocal quality. The current study utilizes psychoacoustic measures, namely, low-amplitude (H1*-H2*) and high-amplitude (H1*-A2*) spectral tilt and cepstral peak prominence (CPP), to identify the vocal fold articulation characteristics in this population. Method Eight children with CP and eight typically developing (TD) peers produced vowel singletons [i, ɑ, u] and a story retell task with the same vowels in the words "beets, Bobby, boots." H1*-H2*, H1*-A2*, and CPP were extracted from each vowel. Results were analyzed with mixed linear models to identify the effect of Group (CP, TD), Task (vowel singleton, story retell), and Vowel [i, ɑ, u] on the dependent variables. Results Children with CP have lower spectral tilt values (H1*-H2* and H1*-A2*) and lower CPP values than their TD peers. For both groups, vowel singletons were associated with lower CPP values as compared to story retell. Finally, the vowel [ɑ] was associated with higher spectral tilt and higher CPP values as compared to [i, u]. Conclusions Children with CP have more constricted and creaky vocal quality due to lower spectral tilt and greater noise. Unlike adults, children demonstrate poorer vocal fold articulation when producing vowel singletons as compared to story retell. Finally, low vowels like [ɑ] seem to be produced with less constriction and noise as compared to high vowels.
Collapse
|
34
|
Meghraoui D, Boudraa B, Merazi-Meksen T, Gómez Vilda P. A novel pre-processing technique in pathologic voice detection: Application to Parkinson’s disease phonation. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2021.102604] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
35
|
Validation of Acoustic Voice Quality Index Version 3.01 and Acoustic Breathiness Index in Korean Population. J Voice 2021; 35:660.e9-660.e18. [DOI: 10.1016/j.jvoice.2019.10.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Revised: 10/08/2019] [Accepted: 10/10/2019] [Indexed: 11/21/2022]
|
36
|
Pützer M, Wokurek W. Electroglottographic and Acoustic Parametrization of Phonatory Quality Provide Voice Profiles of Pathological Speakers. J Voice 2021:S0892-1997(21)00121-1. [PMID: 34049759 DOI: 10.1016/j.jvoice.2021.03.024] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Revised: 03/16/2021] [Accepted: 03/18/2021] [Indexed: 11/18/2022]
Abstract
OBJECTIVES The present study firstly tries to find subgroups of pathological male and female phonation using data from a number of 534 pathological speakers. Secondly, this subgroup classification provides a basis for achieving voice profiles of pathological phonatory quality. METHODS Using complementarily orientated electroglottographic and acoustic parametrization of phonatory quality, sustained vowel productions of 267 male and 267 female speakers were considered. RESULTS In a first step, a clustering technique differentiates three subgroups within each gender on the basis of the EGG- and three subgroups on the basis of the acoustic parameters. In a second step, this subgroup definition allows one to present voice profiles of pathological speakers by combining the parameter means of the electroglottographically determined subgroups with those of the acoustically determined subgroups. CONCLUSIONS The presented voice profiles provide a finer reference basis for the classification of different pathological phonation types as well as for the evaluation of shifts in individual phonatory behavior due to therapy or spontaneous recovery.
Collapse
Affiliation(s)
- Manfred Pützer
- Language Science and Technology; Neurophonetics & Clinical Phonetics, Saarland University, Saarbrücken, Germany.
| | - Wolfgang Wokurek
- Institute for Natural Language Processing, University of Stuttgart, Stuttgart, Germany
| |
Collapse
|
37
|
Yücelbaş C. A new approach: information gain algorithm-based k-nearest neighbors hybrid diagnostic system for Parkinson's disease. Phys Eng Sci Med 2021; 44:511-524. [PMID: 33852120 DOI: 10.1007/s13246-021-01001-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 04/09/2021] [Indexed: 11/28/2022]
Abstract
Parkinson's disease (PD) is a slow and insidiously progressive neurological brain disorder. The development of expert systems capable of automatically and highly accurately diagnosing early stages of PD based on speech signals would provide an important contribution to the health sector. For this purpose, the Information Gain Algorithm-based K-Nearest Neighbors (IGKNN) model was developed. This approach was applied to the feature data sets formed using the Tunable Q-factor Wavelet Transform (TQWT) method. First, 12 sub-feature data sets forming the TQWT feature group were analyzed separately after which the one with the best performance was selected, and the IGKNN model was applied to this sub-feature data set. Finally, it was observed that the performance results provided with the IGKNN system for this sub-feature data set were better than those for the complete set of data. According to the results, values of receiver operating characteristic and precision-recall curves exceeded 0.95, and a classification accuracy of almost 98% was obtained with the 22 features selected from this sub-group. In addition, the kappa coefficient was 0.933 and showed a perfect agreement between actual and predicted values. The performance of the IGKNN system was also compared with results from other studies in the literature in which the same data were used, and the approach proposed in this study far outperformed any approaches reported in the literature. Also, as in this IGKNN approach, an expert system that can diagnose PD and achieve maximum performance with fewer features from the audio signals has not been previously encountered.
Collapse
Affiliation(s)
- Cüneyt Yücelbaş
- Electrical-Electronics Engineering Department, Hakkari University, 30000, Hakkari, Turkey.
| |
Collapse
|
38
|
Pierce JL, Tanner K, Merrill RM, Shnowske L, Roy N. A Field-Based Approach to Establish Normative Acoustic Data for Healthy Female Voices. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:691-706. [PMID: 33561361 DOI: 10.1044/2020_jslhr-20-00490] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose The primary aim of this study was to obtain high-quality acoustic normative data in natural field environments for female voices. A secondary aim was to examine acoustic measurement variability in field environments. Method This study employed a within-subject repeated-measures experimental design that included 45 young female adults with normal voices. Participants were stratified by age (18-23, 24-29, and 30-35 years). After initial evaluation and instruction, participants completed voice recordings during seven consecutive days using a standard protocol, including both connected speech and sustained vowels. Thirty-two cepstral-, spectral-, and time-based acoustic measures were acquired using Praat and the Analysis of Dysphonia in Speech and Voice. Results Among the 958 total recordings, greater than 90% satisfied inclusion criteria based on protocol compliance, peak clipping, and signal-to-noise ratio. Significant differences were observed for age (p < .05). For 19 acoustic measures, values improved significantly as signal-to-noise ratio increased. Cepstral- and spectral-based measures demonstrated less measurement variability as compared with time-based measures. Conclusions With adequate training, field audio recordings represent a viable option for clinical voice management. The significant age effects observed in this study support the need for more specific criteria when collecting and applying normative data. Cepstral- and spectral-based measures demonstrated the least measurement variability. This study provides additional evidence for multiparameter acoustic voice measurement, specifically toward ecologically valid sampling in natural environments. Future studies should expand on these findings in other populations with normal and disordered voices.
Collapse
Affiliation(s)
- Jenny L Pierce
- Department of Surgery, The University of Utah, Salt Lake City
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
| | - Kristine Tanner
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Ray M Merrill
- Department of Public Health, Brigham Young University, Provo, UT
| | - Lauren Shnowske
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
- Department of Communication Sciences and Disorders, University of Kentucky, Lexington
| | - Nelson Roy
- Department of Communication Sciences & Disorders, The University of Utah, Salt Lake City
| |
Collapse
|
39
|
Albert G, Arnocky S, Puts DA, Hodges-Simeon CR. Can listeners assess men's self-reported health from their voice? EVOL HUM BEHAV 2021. [DOI: 10.1016/j.evolhumbehav.2020.08.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
40
|
Alves M, Silva G, Bispo BC, Dajer ME, Rodrigues PM. Voice Disorders Detection Through Multiband Cepstral Features of Sustained Vowel. J Voice 2021; 37:322-331. [PMID: 33663909 DOI: 10.1016/j.jvoice.2021.01.018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Revised: 01/18/2021] [Accepted: 01/21/2021] [Indexed: 11/28/2022]
Abstract
This study aims to detect voice disorders related to vocal fold nodule, Reinke's edema and neurological pathologies through multiband cepstral features of the sustained vowel /a/. Detection is performed between pairs of study groups and multiband analysis is accomplished using the wavelet transform. For each pair of groups, a parameters selection is carried out. Time series of the selected parameters are used as input for four classifiers with leave-one-out cross validation. Classification accuracies of 100% are achieved for all pairs including the control group, surpassing the state-of-art methods based on cepstral features, while accuracies higher than 88.50% are obtained for the pathological pairs. The results indicated that the method may be adequate to assist in the diagnosis of the voice disorders addressed. The results must be updated in the future with a larger population to ensure generalization.
Collapse
Affiliation(s)
- Marco Alves
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| | - Gabriel Silva
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| | - Bruno C Bispo
- Department of Electrical and Electronic Engineering, Federal University of Santa Catarina, Florianópolis-SC, Brazil.
| | - María E Dajer
- Department of Electrical Engineering, Federal University of Technology - Paraná, Cornélio Procópio-PR, Brazil.
| | - Pedro M Rodrigues
- Universidade Católica Portuguesa, CBQF - Centro de Biotecnologia e Química Fina - Laboratório Associado, Escola Superior de Biotecnologia, Porto, Portugal.
| |
Collapse
|
41
|
Englert M, Barsties v. Latoszek B, Maryn Y, Behlau M. Validation of the Acoustic Voice Quality Index, Version 03.01, to the Brazilian Portuguese Language. J Voice 2021; 35:160.e15-160.e21. [DOI: 10.1016/j.jvoice.2019.07.024] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2019] [Revised: 07/23/2019] [Accepted: 07/26/2019] [Indexed: 10/26/2022]
|
42
|
Soumya M, Narasimhan SV. Correlation Between Subjective and Objective Parameters of Voice in Elderly Male Speakers. J Voice 2020; 36:823-831. [PMID: 33092948 DOI: 10.1016/j.jvoice.2020.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 08/31/2020] [Accepted: 10/06/2020] [Indexed: 11/30/2022]
Abstract
INTRODUCTION Literature review suggests that the analysis of acoustic, cepstral, and spectral parameters of voice offers excellent discrimination between the normal and pathological voices and strongly correlates with the perception of voice quality. Although the correlation between the subjective and objective voice measures can facilitate the clinician to distinguish pathological voices from normal voices, only a handful of investigations have examined the relationship between these measures in aging voices. OBJECTIVES To investigate the differences in the subjective and objective parameters (acoustic, spectral, and cepstral parameters) of the voice in elderly male speakers with and without symptoms of dysphonia, and to document the correlation between the subjective and objective parameters in the voice of elderly male speakers. STUDY DESIGN Retrospective standard group comparison study. METHODS Phonation and speech samples were collected from 30 elderly male participants having no vocal symptoms related to dysphonia and 30 elderly male participants with the self-reported presence of vocal symptoms related to dysphonia. The subjective, acoustic, spectral, and cepstral parameters were analyzed from all the voice samples. RESULTS Results revealed significant differences in subjective, acoustic, cepstral, and spectral parameters of voice between the voice samples of the elderly individuals with and without dysphonic symptoms. Perceptual parameters showed a weak and moderate correlation with acoustic parameters and a strong correlation with spectral and cepstral parameters of voice. CONCLUSION Further studies on the correlation between the subjective and objective parameters of voice in elderly male speakers with various types of laryngeal pathologies would throw light on distinguishing the voice of normal aging from the impact of any associated laryngeal pathology to make diagnostic distinctions.
Collapse
Affiliation(s)
- Mahesh Soumya
- MASLP, JSS Institute of Speech & Hearing, Mysore, Karnataka, India
| | | |
Collapse
|
43
|
Asiaee M, Vahedian-Azimi A, Atashi SS, Keramatfar A, Nourbakhsh M. Voice Quality Evaluation in Patients With COVID-19: An Acoustic Analysis. J Voice 2020; 36:879.e13-879.e19. [PMID: 33051108 PMCID: PMC7528943 DOI: 10.1016/j.jvoice.2020.09.024] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 09/26/2020] [Accepted: 09/29/2020] [Indexed: 01/19/2023]
Abstract
Objectives With the COVID-19 outbreak around the globe and its potential effect on infected patients’ voice, this study set out to evaluate and compare the acoustic parameters of voice between healthy and infected people in an objective manner. Methods Voice samples of 64 COVID-19 patients and 70 healthy Persian speakers who produced a sustained vowel /a/ were evaluated. Between-group comparisons of the data were performed using the two-way ANOVA and Wilcoxon's rank-sum test. Results The results revealed significant differences in CPP, HNR, H1H2, F0SD, jitter, shimmer, and MPT values between COVID-19 patients and the healthy participants. There were also significant differences between the male and female participants in all the acoustic parameters, except jitter, shimmer and MPT. No interaction was observed between gender and health status in any of the acoustic parameters. Conclusion The statistical analysis of the data revealed significant differences between the experimental and control groups in this study. Changes in the acoustic parameters of voice are caused by the insufficient airflow, and increased aperiodicity, irregularity, signal perturbation and level of noise, which are the consequences of pulmonary and laryngological involvements in patients with COVID-19.
Collapse
Affiliation(s)
- Maral Asiaee
- Department of Linguistics, Faculty of Literature, Alzahra University, Tehran, Iran
| | - Amir Vahedian-Azimi
- Trauma research Center, Nursing Faculty, Baqiyatallah University of Medical Sciences, Tehran, Iran
| | - Seyed Shahab Atashi
- Department of Food and Drug control, Jundishapour University of Medical Sciences, Ahvaz, Iran
| | | | - Mandana Nourbakhsh
- Department of Linguistics, Faculty of Literature, Alzahra University, Tehran, Iran.
| |
Collapse
|
44
|
Validation and Test-Retest Reliability of Acoustic Voice Quality Index Version 02.06 in the Turkish Language. J Voice 2020; 36:736.e25-736.e32. [PMID: 32962941 DOI: 10.1016/j.jvoice.2020.08.021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2020] [Revised: 08/13/2020] [Accepted: 08/14/2020] [Indexed: 11/21/2022]
Abstract
OBJECTIVES The aim of the present study was to investigate the validity (both concurrent and diagnostic) and test-retest reliability of Acoustic Voice Quality Index (AVQI) version 2 (AVQI 02.06) in Turkish speaking population. MATERIALS AND METHODS Two hundred and fifty five native Turkish subjects with normal voices (n = 128) and with voice disorders (n = 127) were asked to sustain the vowel [a:] and read aloud the Turkish phonetically balanced text. To determine the test-retest reliability of AVQI, 20 dysphonic (ie, around 15% of the group), and 20 normophonic (ie, around 15% of the group) were reassessed 15 minutes after the first AVQI determination. A three middle seconds of sustained vowel [a:] and a sentence with 25 syllables was concatenated, and AVQI analysis was conducted. The auditory-perceptual evaluation was performed by five experienced raters with Grade (G) from GRBAS Protocol. RESULTS There was a statistically significant correlation between AVQI scores and auditory-perceptual evaluation of overall voice quality (rs = 0.717, P < 0.001). AVQI gave a threshold of 2.98 for the dysphonic voice. The values of intraclass correlation coefficient with two-way mixed-effects model, single-measures type, absolute agreement definition showed an excellent test-retest reliability for AVQI in Turkish language (intraclass correlation coefficient = 0.986). CONCLUSION AVQI v.02.06 is a valid and robust tool in differentiating dysphonic and normal voice, and has excellent test-retest reliability in Turkish language.
Collapse
|
45
|
Altay EV, Alatas B. Association analysis of Parkinson disease with vocal change characteristics using multi-objective metaheuristic optimization. Med Hypotheses 2020; 141:109722. [DOI: 10.1016/j.mehy.2020.109722] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/08/2020] [Accepted: 04/08/2020] [Indexed: 11/26/2022]
|
46
|
Saki N, Bayat A, Nikakhlagh S, Zamani P, Khaleghi A, Karimi M, Dastoorpoor M. Acoustic Voice Analysis in Postlingual Deaf Adult Cochlear Implant Users: A Within-Group Comparison Study. J Voice 2020; 36:439.e1-439.e8. [DOI: 10.1016/j.jvoice.2020.06.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/10/2020] [Accepted: 06/10/2020] [Indexed: 10/23/2022]
|
47
|
Sampaio M, Vaz Masson ML, de Paula Soares MF, Bohlender JE, Brockmann-Bauser M. Effects of Fundamental Frequency, Vocal Intensity, Sample Duration, and Vowel Context in Cepstral and Spectral Measures of Dysphonic Voices. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:1326-1339. [PMID: 32348195 DOI: 10.1044/2020_jslhr-19-00049] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose Smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR) are acoustic measures related to the periodicity, harmonicity, and noise components of an acoustic signal. To date, there is little evidence about the advantages of CPPS over HNR in voice diagnostics. Recent studies indicate that voice fundamental frequency (F0) and intensity (sound pressure level [SPL]), sample duration (DUR), vowel context (speech vs. sustained phonation), and syllable stress (SS) may influence CPPS and HNR results. The scope of this work was to investigate the effects of voice F0 and SPL, DUR, SS, and token on CPPS and HNR in dysphonic voices. Method In this retrospective study, 27 Brazilian Portuguese speakers with voice disorders were investigated. Recordings of sustained vowels (SVs) /a:/ and manually extracted vowels (EVs) /a/ from Consensus Auditory-Perceptual Evaluation of Voice sentences were acoustically analyzed with the Praat program. Results There was a highly significant effect of F0, SPL, and DUR on both CPPS and HNR (p < .001), whereas SS and vowel context significantly affected CPPS only (p < .05). Higher SPL, F0, and lower DUR were related to higher CPPS and HNR. SVs moderately-to-highly correlated with EVs for CPPS, whereas HNR had few and moderate correlations. In addition, CPPS and HNR highly correlated in SVs and seven EVs (p < .05). Conclusion Speaking prosodic variations of F0, SPL, and DUR influenced both CPPS and HNR measures and led to acoustic differences between sustained and excised vowels, especially in CPPS. Vowel context, prosodic factors, and token type should be controlled for in clinical acoustic voice assessment.
Collapse
Affiliation(s)
- Marília Sampaio
- Department of Speech, Language and Hearing Sciences, Institute of Health Sciences, Federal University of Bahia, Salvador, Brazil
- Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, Switzerland
| | - Maria Lúcia Vaz Masson
- Department of Speech, Language and Hearing Sciences, Institute of Health Sciences, Federal University of Bahia, Salvador, Brazil
| | - Maria Francisca de Paula Soares
- Department of Speech, Language and Hearing Sciences, Institute of Health Sciences, Federal University of Bahia, Salvador, Brazil
| | - Jörg Edgar Bohlender
- Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, Switzerland
- University of Zurich, Switzerland
| | - Meike Brockmann-Bauser
- Department of Phoniatrics and Speech Pathology, Clinic for Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, Switzerland
- University of Zurich, Switzerland
| |
Collapse
|
48
|
Kitayama I, Hosokawa K, Iwahashi T, Iwahashi M, Iwaki S, Kato C, Yoshida M, Umatani M, Matsushiro N, Ogawa M, Inohara H. Intertext Variability of Smoothed Cepstral Peak Prominence, Methods to Control It, and Its Diagnostic Properties. J Voice 2020; 34:305-319. [DOI: 10.1016/j.jvoice.2018.09.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Revised: 09/24/2018] [Accepted: 09/25/2018] [Indexed: 11/30/2022]
|
49
|
Lopes LW, Vieira VJD, Costa SLDNC, Correia SÉN, Behlau M. Effectiveness of Recurrence Quantification Measures in Discriminating Subjects With and Without Voice Disorders. J Voice 2020; 34:208-220. [DOI: 10.1016/j.jvoice.2018.09.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2018] [Revised: 09/05/2018] [Accepted: 09/06/2018] [Indexed: 10/28/2022]
|
50
|
Guan H, Lerch A. Evaluation of Feature Learning Methods for Voice Disorder Detection. INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING 2019. [DOI: 10.1142/s1793351x19400191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Voice disorder is a frequently encountered health issue. Many people, however, either cannot afford to visit a professional doctor or neglect to take good care of their voice. In order to give a patient a preliminary diagnosis without using professional medical devices, previous research has shown that the detection of voice disorders can be carried out by utilizing machine learning and acoustic features extracted from voice recordings. Considering the increasing popularity of deep learning, feature learning and transfer learning, this study explores the possibilities of using these methods to assign voice recordings into one of two classes—Normal and Pathological. While the results show the general viability of deep learning and feature learning for the automatic recognition of voice disorders, they also lead to discussions on how to choose a pre-trained model when using transfer learning for this task. Furthermore, the results demonstrate the shortcomings of the existing datasets for voice disorder detection such as insufficient dataset size and lack of generality.
Collapse
Affiliation(s)
- Hongzhao Guan
- Center for Music Technology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - Alexander Lerch
- Center for Music Technology, Georgia Institute of Technology, Atlanta, GA 30332, USA
| |
Collapse
|