1
|
Keung LC, Richardson K, Sharp Matheron D, Martel-Sauvageau V. A Comparison of Healthy and Disordered Voices Using Multi-Dimensional Voice Program, Praat, and TF32. J Voice 2024; 38:963.e23-963.e38. [PMID: 35246346 DOI: 10.1016/j.jvoice.2022.01.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 01/06/2022] [Accepted: 01/10/2022] [Indexed: 10/19/2022]
Abstract
PURPOSE Instrumental voice assessment plays a critical role in identifying vocal issues and for documenting treatment outcomes. The reported voice data, however, are sensitive to the algorithm used by each acoustic analysis software program (AASP) to analyze the corresponding waveform. In the present study, five acoustic measures were compared across healthy speakers and speakers with dysphonia for three AASPs commonly used in research, education, and clinical practice: Multidimensional Voice Program (MDVP) by Computerized Speech Lab, Praat, and TF32. MATERIALS AND METHODS Sustained vowel phonations for the quantal vowels /ɑ/, /i/, and /u/ were analyzed for 80 speakers with organic dysphonia and 60 age- and sex-matched healthy controls. Descriptive, inferential, and correlation data are reported for mean fundamental frequency (mean F0), standard deviation of fundamental frequency (SD F0), short-term perturbation measures of jitter and shimmer, and harmonic-to-noise ratio (HNR). RESULTS The present study replicated previous findings of interprogram differences for healthy speakers, with MDVP consistently yielding higher values than Praat and TF32 for SD F0, jitter, and shimmer and lower values for HNR. Similar, but magnified patterns of results were observed for speakers with dysphonia. CONCLUSION The variation observed across programs calls into question the validity in comparing voice outcomes reported by one AASP to those previously obtained by another, particularly for acoustic signals with aperiodic components that are commonly present in disordered voices. It is advised that waveforms be visually inspected prior to conducting acoustic analysis, and that voice outcomes not be combined or compared across AASPs.
Collapse
Affiliation(s)
- Lap-Ching Keung
- Department of Communication Disorders, University of Massachusetts Amherst, Amherst, Massachusetts
| | - Kelly Richardson
- Department of Communication Disorders, University of Massachusetts Amherst, Amherst, Massachusetts.
| | - Deborah Sharp Matheron
- Communication Disorders and Sciences Department, State University of New York College at Cortland, Cortland, New York
| | - Vincent Martel-Sauvageau
- Rehabilitation Department, Speech-Language Pathology Program, Université Laval, Quebec City, Quebec, Canada
| |
Collapse
|
2
|
Yücel Ekici N, Demet Akbaş E, Kadir Arslan A. Voice aspects in children with precocious puberty. Int J Pediatr Otorhinolaryngol 2024; 180:111962. [PMID: 38657429 DOI: 10.1016/j.ijporl.2024.111962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 04/18/2024] [Accepted: 04/22/2024] [Indexed: 04/26/2024]
Abstract
PURPOSE In this prospective study, we aimed to investigate the difference in voice acoustic parameters between girls with idiopathic central precocious puberty (ICPP) and those who developed normally during prepuberty. MATERIALS AND METHODS Our study recruited 54 girls diagnosed with ICPP and randomly sampled 51 healthy prepubertal girls as the control. Tanner stages, circulating hormone levels and bone ages of the girls with ICPP and the age and body mass index (BMI) of all participants were recorded. Acoustic analyses were performed using PRAAT computer-based voice analysis software and the mean pitch (F0), jitter, shimmer, noise-to harmonic-ratio (NHR) and harmonic-to-noise ratio (HNR) values were compared in the patient and control groups. RESULTS The two groups did not significantly differ in age or BMI. In the evaluation of the F0 and jitter values, we were found to be lower in the control group than in the patient group. However, we did not find a statistical significance. The mean shimmer values of the patient group were significantly higher than those of the control group. In addition, a statistically significant difference was noted for the mean HNR and NHR values (P < 0.001). A moderate negative correlation was found between shimmer and hormone levels in the patient group. CONCLUSIONS Voice acoustic parameters one of the defining features of girls with ICPP. Voice changes in acoustic parameters could reflect hormonal changes during puberty. Clinicians should suspect ICPP when there is a change in the voice.
Collapse
Affiliation(s)
- Nur Yücel Ekici
- Department of Otorhinolaryngology, University of Health Sciences Adana City Training and Research Hospital, Adana, Turkey.
| | - Emine Demet Akbaş
- Department of Pediatric Endocrinology, University of Health Sciences Adana City Training and Research Hospital, Adana, Turkey
| | - Ahmet Kadir Arslan
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inönü University, Malatya, Turkey
| |
Collapse
|
3
|
Ikuma T, McWhorter AJ, Oral E, Kunduk M. Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice. J Voice 2023:S0892-1997(23)00154-6. [PMID: 37302909 DOI: 10.1016/j.jvoice.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/07/2023] [Accepted: 05/08/2023] [Indexed: 06/13/2023]
Abstract
OBJECTIVES This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects. METHODS Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested. RESULTS Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections. CONCLUSIONS Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana.
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana
| | - Evrim Oral
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Melda Kunduk
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana; Dept. of Communication Sciences & Disorders, Louisiana State University, Baton Rouge, Louisiana
| |
Collapse
|
4
|
Ikuma T, Story B, McWhorter AJ, Adkins L, Kunduk M. Harmonics-to-noise ratio estimation with deterministically time-varying harmonic model for pathological voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1783. [PMID: 36182331 DOI: 10.1121/10.0014177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 09/01/2022] [Indexed: 06/16/2023]
Abstract
The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Lacey Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Melda Kunduk
- Department of Communication Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| |
Collapse
|
5
|
Mohammed AA, Nagy A. Fundamental Frequency and Jitter Percent in MDVP and PRAAT. J Voice 2021:S0892-1997(21)00107-7. [PMID: 33926765 DOI: 10.1016/j.jvoice.2021.03.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 02/12/2021] [Accepted: 03/01/2021] [Indexed: 11/26/2022]
Abstract
PURPOSE This study initially investigated the relationship between Fundamental Frequency and Jitter Percent across and within MDVP and PRAAT. Subsequently, it explored if the measured acoustic signal's Length or the analysis temporal segment selection impacts potential correlation across the tools' measures. METHODS We collected forty-two Maximum Phonation Time acoustic signals from 10 participants with Healthy Voices in a standardized setting. We excluded from enrollment any potential participants having a history of voice disorders or showing an abnormality in a pre-study assessment. RESULTS There is no correlation between Jitter percent's values and Fundamental Frequency within either Tool in our healthy voice samples. The Length of the acoustic signal and temporal analysis selection impact the correlation between Jitter Percent measurements across the two tools; The correlation between Fundamental Frequency measurements across the devices was not affected. Means of Fundamental Frequency did not differ across the two devices but show a persistent pattern of greater values in MDVP. Jitter Percent measurements were significantly higher in MDVP CONCLUSIONS: There is a potential for clinicians using PRAAT assessments in the clinic to make inferences from research using MDVP as an analysis tool. Further work is needed in patients with Voice disorders to explore that possibility.
Collapse
Affiliation(s)
- Ahmed A Mohammed
- Department of ENT, Ain Shams University; Assistant professor of Phoniatrics, Cairo, Egypt.
| | - Ahmed Nagy
- Communicative Disorders and Sciences Department, University at Buffalo, Buffalo, NY, USA.; Faculty of Medicine - Fayoum University, Fayoum, Egypt.
| |
Collapse
|
6
|
Kent RD, Eichhorn JT, Vorperian HK. Acoustic parameters of voice in typically developing children ages 4-19 years. Int J Pediatr Otorhinolaryngol 2021; 142:110614. [PMID: 33450527 PMCID: PMC7902385 DOI: 10.1016/j.ijporl.2021.110614] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 12/31/2020] [Accepted: 12/31/2020] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Report data on acoustic measures of voice in sustained vowels produced by typically developing children, aged 4-19 years, to add to the cross-sectional reference values in a pediatric database. METHODS Recordings of sustained vowel/ɑ/phonation were obtained from 158 children (80 males, 78 females) aged 4-19 years who were judged to be typically developing with respect to speech and voice. Acoustic analyses were performed with the Multidimensional Voice Program (MDVP™) and the Analysis of Dysphonia in Speech and Voice (ADSV™), both from Pentax Medical. RESULTS Values from both MDVP and ADSV are reported for children in the following age cohorts: 4-6 years, 7-9 years, 10-12 years, 13-15 years, and 16-19 years. CONCLUSION The data in this study complement previously published data and contribute to a pediatric reference database useful for research and for clinical practice related to children's voice. Acoustic parameters most sensitive to age and sex are identified.
Collapse
Affiliation(s)
- Raymond D. Kent
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| | - Julie T. Eichhorn
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| | - Houri K. Vorperian
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Ave., Madison, WI 53705
| |
Collapse
|
7
|
van der Woerd B, Wu M, Parsa V, Doyle PC, Fung K. Evaluation of Acoustic Analyses of Voice in Nonoptimized Conditions. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:3991-3999. [PMID: 33186510 DOI: 10.1044/2020_jslhr-20-00212] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Objectives This study aimed to evaluate the fidelity and accuracy of a smartphone microphone and recording environment on acoustic measurements of voice. Method A prospective cohort proof-of-concept study. Two sets of prerecorded samples (a) sustained vowels (/a/) and (b) Rainbow Passage sentence were played for recording via the internal iPhone microphone and the Blue Yeti USB microphone in two recording environments: a sound-treated booth and quiet office setting. Recordings were presented using a calibrated mannequin speaker with a fixed signal intensity (69 dBA), at a fixed distance (15 in.). Each set of recordings (iPhone-audio booth, Blue Yeti-audio booth, iPhone-office, and Blue Yeti-office), was time-windowed to ensure the same signal was evaluated for each condition. Acoustic measures of voice including fundamental frequency (fo), jitter, shimmer, harmonic-to-noise ratio (HNR), and cepstral peak prominence (CPP), were generated using a widely used analysis program (Praat Version 6.0.50). The data gathered were compared using a repeated measures analysis of variance. Two separate data sets were used. The set of vowel samples included both pathologic (n = 10) and normal (n = 10), male (n = 5) and female (n = 15) speakers. The set of sentence stimuli ranged in perceived voice quality from normal to severely disordered with an equal number of male (n = 12) and female (n = 12) speakers evaluated. Results The vowel analyses indicated that the jitter, shimmer, HNR, and CPP were significantly different based on microphone choice and shimmer, HNR, and CPP were significantly different based on the recording environment. Analysis of sentences revealed a statistically significant impact of recording environment and microphone type on HNR and CPP. While statistically significant, the differences across the experimental conditions for a subset of the acoustic measures (viz., jitter and CPP) have shown differences that fell within their respective normative ranges. Conclusions Both microphone and recording setting resulted in significant differences across several acoustic measurements. However, a subset of the acoustic measures that were statistically significant across the recording conditions showed small overall differences that are unlikely to have clinical significance in interpretation. For these acoustic measures, the present data suggest that, although a sound-treated setting is ideal for voice sample collection, a smartphone microphone can capture acceptable recordings for acoustic signal analysis.
Collapse
Affiliation(s)
- Benjamin van der Woerd
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
| | - Min Wu
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
| | - Vijay Parsa
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
- Department of Electrical and Computer Engineering, Western University, London, Ontario, Canada
| | - Philip C Doyle
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
- School of Communication Sciences and Disorders, Western University, London, Ontario, Canada
- Department of Otolaryngology-Head and Neck Surgery, Stanford University School of Medicine, CA
| | - Kevin Fung
- Department of Otolaryngology-Head and Neck Surgery, Western University, London, Ontario, Canada
| |
Collapse
|