1
|
Ikuma T, McWhorter AJ, Oral E, Kunduk M. Formant-Aware Spectral Analysis of Sustained Vowels of Pathological Breathy Voice. J Voice 2023:S0892-1997(23)00154-6. [PMID: 37302909 DOI: 10.1016/j.jvoice.2023.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 05/07/2023] [Accepted: 05/08/2023] [Indexed: 06/13/2023]
Abstract
OBJECTIVES This paper reports the effectiveness of formant-aware spectral parameters to predict the perceptual breathiness rating. A breathy voice has a steeper spectral slope and higher turbulent noise than a normal voice. Measuring spectral parameters of acoustic signals over lower formant regions is a known approach to capture the properties related to breathiness. This study examines this approach by testing the contemporary spectral parameters and algorithms within the framework, alternate frequency band designs, and vowel effects. METHODS Sustained vowel recordings (/a/, /i/, and /u/) of speakers with voice disorders in the German Saarbrueken Voice Database were considered (n: 367). Recordings with signal irregularities, such as subharmonics or with roughness perception, were excluded from the study. Four speech language pathologists perceptually rated the recordings for breathiness on a 100-point scale, and their averages were used in the analysis. The acoustic spectra were segmented into four frequency bands according to the vowel formant structures. Five spectral parameters (intraband harmonics-to-noise ratio, HNR; interband harmonics ratio, HHR; interband noise ratio, NNR; and interband glottal-to-noise energy, GNE, ratio) were evaluated in each band to predict the perceptual breathiness rating. Four HNR algorithms were tested. RESULTS Multiple linear regression models of spectral parameters, led by the HNRs, were shown to explain up to 85% of the variance in perceptual breathiness ratings. This performance exceeded that of the acoustic breathiness index (82%). Individually, the HNR over the first two formants best explained the variances in the breathiness (78%), exceeding the smoothed cepstrum peak prominence (74%). The performance of HNR was highly algorithm dependent (10% spread). Some vowel effects were observed in the perceptual rating (higher for /u/), predictability (5% lower for /u/), and model parameter selections. CONCLUSIONS Strong per-vowel breathiness acoustic models were found by segmenting the spectrum to isolate the portion most affected by breathiness.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana.
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana
| | - Evrim Oral
- Biostatistics Program, School of Public Health, Louisiana State University Health Sciences Center, New Orleans, Louisiana
| | - Melda Kunduk
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana; Voice Center, The Our Lady of The Lake Regional Medical Center, Baton Rouge, Louisiana; Dept. of Communication Sciences & Disorders, Louisiana State University, Baton Rouge, Louisiana
| |
Collapse
|
2
|
Ikuma T, Story B, McWhorter AJ, Adkins L, Kunduk M. Harmonics-to-noise ratio estimation with deterministically time-varying harmonic model for pathological voice signals. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1783. [PMID: 36182331 DOI: 10.1121/10.0014177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 09/01/2022] [Indexed: 06/16/2023]
Abstract
The harmonics-to-noise ratio (HNR) and other spectral noise parameters are important in clinical objective voice assessment as they could indicate the presence of nonharmonic phenomena, which are tied to the perception of hoarseness or breathiness. Existing HNR estimators are built on the voice signals to be nearly periodic (fixed over a short period), although voice pathology could induce involuntary slow modulation to void this assumption. This paper proposes the use of a deterministically time-varying harmonic model to improve the HNR measurements. To estimate the time-varying model, a two-stage iterative least squares algorithm is proposed to reduce model overfitting. The efficacy of the proposed HNR estimator is demonstrated with synthetic signals, simulated tremor signals, and recorded acoustic signals. Results indicate that the proposed algorithm can produce consistent HNR measures as the extent and rate of tremor are varied.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Lacey Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Melda Kunduk
- Department of Communication Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| |
Collapse
|
3
|
Aichinger P, Pernkopf F, Schoentgen J. Detection of extra pulses in synthesized glottal area waveforms of dysphonic voices. Biomed Signal Process Control 2019; 50:158-167. [PMID: 30996730 PMCID: PMC6464090 DOI: 10.1016/j.bspc.2019.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Background and objectives The description of production kinematics of dysphonic voices plays an important role in the clinical care of voice disorders. However, high-speed videolaryngoscopy is not routinely used in clinical practice, partly because there is a lack of diagnostic markers that may be obtained from high-speed videos automatically. Aim of the study is to propose and test a procedure that automatically detects extra pulses, which may occur in voiced source signals of pathological voices in addition to cyclic pulses. Material and methods Glottal area waveforms (GAW) are synthesized and used to test a detector for extra pulses. Regarding synthesis, for each GAW a cyclic pulse train is mixed with an extra pulse train, and additive noise. The cyclic pulse trains are varied across GAWs in terms of fundamental frequency, pulse shape, and modulation noise, i.e., jitter and shimmer. The extra pulse trains are varied across GAWs in terms of the height of the extra pulses, and their rates of occurrence. The energy level of the additive noise is also varied. Regarding detection, first, the fundamental frequency is estimated jointly with the cyclic pulse train waveform, second, the modulation noise is estimated, and finally the extra pulse train waveform is estimated. Two versions of the detector are compared, i.e., one that parameterizes the shapes of the cyclic pulses, and one that uses unparameterized pulse shape estimates. Two corpora are used for testing, i.e., one with 100 GAWs containing random extra pulses, and one with 25 GAWs containing extra pulses in the closed phases of each glottal phase representing subharmonic voices. Results and discussion With pulse shape parameterization (PSP) a maximum mean accuracy of 88.3% is achieved when detecting random extra pulses. Without PSP, the maximum mean accuracy reduces to 82.9%. Detection performance decreases if the energy level of additive noise is higher than −25 dB with respect to the energy of the cyclic pulse train, and if the irregularity strength exceeds 0.1. For bicyclic, i.e., subharmonic voices, the approach fails without PSP, whereas with PSP, a mean sensitivity of 87.4% is achieved for subharmonic voices. Conclusion A synthesizer for GAWs containing extra pulses, and a detector for extra pulses are proposed. With PSP, favorable detector performance is observed for not too high levels of additive noise and irregularity strengths. In signals with high noise levels, the detector without PSP outperforms the other one. Detection of extra pulses fails if irregularity strength is large. For subharmonic voices PSP must be used.
Collapse
Affiliation(s)
- P Aichinger
- Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria
| | - F Pernkopf
- Signal Processing and Speech Communication Laboratory, Graz University of Technology, Inffeldgasse 16c/EG, 8010, Graz, Austria
| | - J Schoentgen
- Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Waehringer Guertel 18-20, 1090, Vienna, Austria.,BEAMS (Bio-, Electro- And Mechanical Systems), Faculty of Applied Sciences, Université Libre de Bruxelles, 50, Av. F. D. Roosevelt, B-1050, Brussels, Belgium
| |
Collapse
|
4
|
Voice-Vibratory Assessment With Laryngeal Imaging (VALI) Form: Reliability of Rating Stroboscopy and High-speed Videoendoscopy. J Voice 2017; 31:513.e1-513.e14. [DOI: 10.1016/j.jvoice.2016.12.003] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 11/29/2016] [Accepted: 12/02/2016] [Indexed: 11/19/2022]
|
5
|
High-speed Videolaryngoscopy: Quantitative Parameters of Glottal Area Waveforms and High-speed Kymography in Healthy Individuals. J Voice 2017; 31:282-290. [DOI: 10.1016/j.jvoice.2016.09.026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/22/2016] [Accepted: 09/23/2016] [Indexed: 11/21/2022]
|
6
|
Towards Objective Voice Assessment: The Diplophonia Diagram. J Voice 2017; 31:253.e17-253.e26. [DOI: 10.1016/j.jvoice.2016.06.021] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/25/2014] [Indexed: 11/19/2022]
|
7
|
Laryngeal High-Speed Videoendoscopy: Sensitivity of Objective Parameters towards Recording Frame Rate. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4575437. [PMID: 27990428 PMCID: PMC5136634 DOI: 10.1155/2016/4575437] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 10/10/2016] [Indexed: 11/29/2022]
Abstract
The current use of laryngeal high-speed videoendoscopy in clinic settings involves subjective visual assessment of vocal fold vibratory characteristics. However, objective quantification of vocal fold vibrations for evidence-based diagnosis and therapy is desired, and objective parameters assessing laryngeal dynamics have therefore been suggested. This study investigated the sensitivity of the objective parameters and their dependence on recording frame rate. A total of 300 endoscopic high-speed videos with recording frame rates between 1000 and 15 000 fps were analyzed for a vocally healthy female subject during sustained phonation. Twenty parameters, representing laryngeal dynamics, were computed. Four different parameter characteristics were found: parameters showing no change with increasing frame rate; parameters changing up to a certain frame rate, but then remaining constant; parameters remaining constant within a particular range of recording frame rates; and parameters changing with nearly every frame rate. The results suggest that (1) parameter values are influenced by recording frame rates and different parameters have varying sensitivities to recording frame rate; (2) normative values should be determined based on recording frame rates; and (3) the typically used recording frame rate of 4000 fps seems to be too low to distinguish accurately certain characteristics of the human phonation process in detail.
Collapse
|
8
|
Patel RR. Vibratory onset and offset times in children: A laryngeal imaging study. Int J Pediatr Otorhinolaryngol 2016; 87:11-7. [PMID: 27368436 PMCID: PMC4930831 DOI: 10.1016/j.ijporl.2016.05.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 05/10/2016] [Accepted: 05/12/2016] [Indexed: 10/21/2022]
Abstract
OBJECTIVES The aim of the study was to evaluate the differences in vibratory onset and offset times across age (adult males, adult females, and children) and waveform types (total glottal area waveform, left glottal area waveform, and right glottal area waveform) using high-speed videoendoscopy. METHODS In this prospective study, vibratory onset and offset times were evaluated in a total of 86 participants. Forty-three children (23 girls, 18 boys) between 5 and 11 years and 43 gender matched vocally normal young adults (23 females and 18 males) in the age range (21-45 years) were recruited. Vibratory onset and offset times were calculated in milliseconds from the total, left, and right Glottal Area Waveform (GAW). A two-factor analysis of variance was used to compare the means among the subject groups (children, adult male, and adult female) and waveform type (total GAW, left GAW, right GAW) for onset and offset variables. Post hoc analyses were performed using the Fishers Least Significant Different test with Bonferroni correction for multiple comparisons. RESULTS Children exhibited significantly shorter vibratory onset and offset times compared to adult males and females. Differences in vibratory onset and offset times were not statistically significant between adult males and females. Across all waveform types (i.e. total GAW, left GAW, and right GAW), no statistical significance was observed among the subject groups. CONCLUSION This is the first study reporting vibratory onset and offset times in the pediatric population. The study findings lay the foundation for the development of a large age- and gender-based database of the pediatric population to aid the study of the effects of maturation of vocal fold vibration in adulthood. The findings from this study may also provide the basis for evaluating the impact of numerous lesions on tissue pliability, and thereby has potential utility for the clinical differentiation of various lesions.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University
| |
Collapse
|
9
|
Deliyski DD, Hillman RE, Mehta DD. Laryngeal High-Speed Videoendoscopy: Rationale and Recommendation for Accurate and Consistent Terminology. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2015; 58:1488-92. [PMID: 26375398 PMCID: PMC4686309 DOI: 10.1044/2015_jslhr-s-14-0253] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2014] [Revised: 02/25/2015] [Accepted: 06/09/2015] [Indexed: 05/24/2023]
Abstract
PURPOSE The authors discuss the rationale behind the term laryngeal high-speed videoendoscopy to describe the application of high-speed endoscopic imaging techniques to the visualization of vocal fold vibration. METHOD Commentary on the advantages of using accurate and consistent terminology in the field of voice research is provided. Specific justification is described for each component of the term high-speed videoendoscopy, which is compared and contrasted with alternative terminologies in the literature. RESULTS In addition to the ubiquitous high-speed descriptor, the term endoscopy is necessary to specify the appropriate imaging technology and distinguish among modalities such as ultrasound, magnetic resonance imaging, and nonendoscopic optical imaging. Furthermore, the term video critically indicates the electronic recording of a sequence of optical still images representing scenes in motion, in contrast to strobed images using high-speed photography and non-optical high-speed magnetic resonance imaging. High-speed videoendoscopy thus concisely describes the technology and can be appended by the desired anatomical nomenclature such as laryngeal. CONCLUSIONS Laryngeal high-speed videoendoscopy strikes a balance between conciseness and specificity when referring to the typical high-speed imaging method performed on human participants. Guidance for the creation of future terminology provides clarity and context for current and future experiments and the dissemination of results among researchers.
Collapse
Affiliation(s)
- Dimitar D. Deliyski
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, OH
- University of Cincinnati, OH
| | - Robert E. Hillman
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Boston, MA
- Massachusetts General Hospital Institute of Health Professions, Charlestown, MA
| | - Daryush D. Mehta
- Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Boston, MA
- Massachusetts General Hospital Institute of Health Professions, Charlestown, MA
| |
Collapse
|
10
|
Use Videostrobokymography to Quantitatively Analyze the Vibratory Characteristics Before and After Conservative Medical Treatment of Vocal Fold Leukoplakia. J Voice 2015; 30:215-20. [PMID: 26001502 DOI: 10.1016/j.jvoice.2015.04.015] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 04/22/2015] [Indexed: 11/21/2022]
Abstract
OBJECTIVES To quantitatively analyze the vibratory characteristics of vocal folds before and after conservative treatments to evaluate the outcomes of conservative treatments for vocal fold leukoplakia using videostrobokymography (VSK). STUDY DESIGN This is a prospective study. METHODS Twenty patients and 20 controls were enrolled into the study. All patients received conservative treatments for 3 weeks and received VSK examination before and 3 weeks after the treatments. All controls only received VSK examination once. Vocal fold lengths of 25%, 50%, and 75% were chosen as the line-scan positions to evaluate the vocal fold vibration. Open quotient (OQ) and asymmetry index (AI) were obtained using VSK. RESULTS Significant improvements in the main symptoms including voice hoarseness were found. Videostroboscopic findings showed that the white lesions on the vocal folds almost completely disappeared in all patients, and the vocal fold flexibility returned to normal. All OQs and AIs at each line-scan position in patients before the treatments were larger than those in controls (P < 0.017), whereas all OQs and AIs at each line-scan position decreased 3 weeks after conservative treatments (P < 0.017). No significant differences in OQs and AIs at each line-scan position were detected between patients after the treatments and controls (P > 0.017). CONCLUSIONS VSK could quantitatively evaluate the vibratory characteristics of vocal folds before and after the treatments, and conservative treatment could improve VSK measurements to normal control values, suggesting that VSK is a tool to assess the outcomes of the conservative treatments for vocal fold leukoplakia.
Collapse
|
11
|
Kojima T, Mitchell JR, Garrett CG, Rousseau B. Recovery of vibratory function after vocal fold microflap in a rabbit model. Laryngoscope 2013; 124:481-6. [PMID: 23901003 DOI: 10.1002/lary.24324] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 05/19/2013] [Accepted: 07/03/2013] [Indexed: 11/09/2022]
Abstract
OBJECTIVES/HYPOTHESIS The purpose of this study was to evaluate the return of vibratory function and restoration of vibration amplitude and symmetry after vocal fold microflap surgery. STUDY DESIGN Prospective in vivo animal model. METHODS Microflap surgery was performed on 30 New Zealand white breeder rabbits. The left vocal fold received a 3-mm epithelial incision and mucosal elevation, while the contralateral vocal fold was left intact to serve as an internal control. Quantitative analysis of amplitude ratio and lateral phase difference were measured using high-speed laryngeal imaging at a frame rate of 10,000 frames per second from animals undergoing evoked phonation on postoperative days 0, 1, 3, 5, and 7. RESULTS Quantitative measures revealed a significantly reduced amplitude ratio and lateral phase difference on day 0 after microflap. These impairments of vibratory function on day 0 were associated with separation of the vocal fold's body-cover layer. Amplitude ratio increased significantly by day 3 after microflap, with further increases in vibration amplitude on days 5 and 7. While the amplitude ratio improved significantly on day 3, lateral phase difference decreased significantly on day 3, and returned to normal on days 5 and 7. CONCLUSIONS High-speed laryngeal imaging was used to investigate the natural time course of postmicroflap recovery of vibratory function. Results revealed the restoration of vibration amplitude and lateral phase difference by days 3 to 7 after microflap. The time period of improved vibratory function observed in this study coincides with the end of the well-documented inflammatory phase of vocal fold wound repair. LEVEL OF EVIDENCE N/A.
Collapse
Affiliation(s)
- Tsuyoshi Kojima
- Department of Otolaryngology, Vanderbilt Bill Wilkerson Center, Vanderbilt University School of Medicine, Nashville, Tennessee, U.S.A
| | | | | | | |
Collapse
|