1
|
Mohd Khairuddin KA, Ahmad K, Proehoeman SC, Mohd Ibrahim H, Yan Y. Preliminary Findings of Vocal Fold Vibratory Characteristics of Singers Analyzed by Laryngeal High-Speed Videoendoscopy. J Voice 2024:S0892-1997(24)00173-5. [PMID: 38902142 DOI: 10.1016/j.jvoice.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2024] [Revised: 05/31/2024] [Accepted: 06/01/2024] [Indexed: 06/22/2024]
Abstract
OBJECTIVES This study investigates the vocal fold vibratory dynamics of singers, which are postulated to differ from those of normal speakers due to the singers' regular vocal training. The measurement of vocal fold vibration involved the utilization of laryngeal high-speed videoendoscopy (LHSV) and subsequent LHSV-based analysis. The focus of the present study is to characterize and compare the LHSV-based measures derived from the glottal area waveform (GAW), namely fundamental frequency (F0GAW), glottal perturbation (jitterGAW and shimmerGAW), open quotient (OQGAW), and Nyquist plots, between singers and normal speakers across genders. METHODS Participants comprised 13 singers from a local cultural and heritage academy and 56 normal speakers from a local university, all were evaluated to have normal voices. Each participant underwent LHSV procedures to capture images of vocal fold vibration, which were subsequently analyzed to generate the LHSV-based measures. RESULTS Male singers exhibited lower F0GAW, jitterGAW, shimmerGAW, and OQGAW than female singers. When compared to normal speakers, male singers demonstrated higher F0GAW, and lower jitterGAW and shimmerGAW. No difference in OQGAW was found between male singers and normal speakers. Female singers exhibited lower jitterGAW compared to normal speakers, but no differences were observed in shimmerGAW and OQGAW. The results of Nyquist plots indicated no gender-related associations between types of rim width and among singers. However, for rim pattern, male singers were associated with a higher percentage of clustered rim, suggesting more regular vocal fold vibration, compared to female singers and normal male speakers. CONCLUSIONS Singers, particularly male singers, demonstrate distinct and potentially superior vocal fold vibrations compared to normal speakers, likely attributed to their regular vocal training, resulting in refined vocal fold configurations even during speaking. Despite the limited sample of singers, the study offers valuable insights into the vocal fold vibratory behaviors in singers analyzed using LHSV.
Collapse
Affiliation(s)
- Khairy Anuar Mohd Khairuddin
- Speech Pathology Program, School of Health Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia; Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia.
| | - Kartini Ahmad
- Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | | | - Hasherah Mohd Ibrahim
- Speech Sciences Program, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Yuling Yan
- Department of Bioengineering, School of Engineering, Santa Clara University, Santa Clara, California
| |
Collapse
|
2
|
Malinowski J, Pietruszewska W, Kowalczyk M, Niebudek-Bogusz E. Value of high-speed videoendoscopy as an auxiliary tool in differentiation of benign and malignant unilateral vocal lesions. J Cancer Res Clin Oncol 2024; 150:10. [PMID: 38216796 PMCID: PMC10786956 DOI: 10.1007/s00432-023-05543-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/13/2023] [Indexed: 01/14/2024]
Abstract
PURPOSE The study aimed to assess the relevance of objective vibratory parameters derived from high-speed videolaryngoscopy (HSV) as a supporting tool, to assist clinicians in establishing the initial diagnosis of benign and malignant glottal organic lesions. METHODS The HSV examinations were conducted in 175 subjects: 50 normophonic, 85 subjects with benign vocal fold lesions, and 40 with early glottic cancer; organic lesions were confirmed by histopathologic examination. The parameters, derived from HSV kymography: amplitude, symmetry, and glottal dynamic characteristics, were compared statistically between the groups with the following ROC analysis. RESULTS Among 14 calculated parameters, 10 differed significantly between the groups. Four of them, the average resultant amplitude of the involved vocal fold (AmpInvolvedAvg), average amplitude asymmetry for the whole glottis and its middle third part (AmplAsymAvg; AmplAsymAvg_2/3), and absolute average phase difference (AbsPhaseDiffAvg), showed significant differences between benign and malignant lesions. Amplitude values were decreasing, while asymmetry and phase difference values were increasing with the risk of malignancy. In ROC analysis, the highest AUC was observed for AmpAsymAvg (0.719; p < 0.0001), and next in order was AmpInvolvedAvg (0.70; p = 0.0002). CONCLUSION The golden standard in the diagnosis of organic lesions of glottis remains clinical examination with videolaryngoscopy, confirmed by histopathological examination. Our results showed that measurements of amplitude, asymmetry, and phase of vibrations in malignant vocal fold masses deteriorate significantly in comparison to benign vocal lesions. High-speed videolaryngoscopy could aid their preliminary differentiation noninvasively before histopathological examination; however, further research on larger groups is needed.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland.
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, Lodz, Poland
| |
Collapse
|
3
|
Malinowski J, Pietruszewska W, Stawiski K, Kowalczyk M, Barańska M, Rycerz A, Niebudek-Bogusz E. High-Speed Videoendoscopy Enhances the Objective Assessment of Glottic Organic Lesions: A Case-Control Study with Multivariable Data-Mining Model Development. Cancers (Basel) 2023; 15:3716. [PMID: 37509377 PMCID: PMC10378075 DOI: 10.3390/cancers15143716] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 07/13/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
The aim of the study was to utilize a quantitative assessment of the vibratory characteristics of vocal folds in diagnosing benign and malignant lesions of the glottis using high-speed videolaryngoscopy (HSV). METHODS Case-control study including 100 patients with unilateral vocal fold lesions in comparison to 38 normophonic subjects. Quantitative assessment with the determination of vocal fold oscillation parameters was performed based on HSV kymography. Machine-learning predictive models were developed and validated. RESULTS All calculated parameters differed significantly between healthy subjects and patients with organic lesions. The first predictive model distinguishing any organic lesion patients from healthy subjects reached an area under the curve (AUC) equal to 0.983 and presented with 89.3% accuracy, 97.0% sensitivity, and 71.4% specificity on the testing set. The second model identifying malignancy among organic lesions reached an AUC equal to 0.85 and presented with 80.6% accuracy, 100% sensitivity, and 71.1% specificity on the training set. Important predictive factors for the models were frequency perturbation measures. CONCLUSIONS The standard protocol for distinguishing between benign and malignant lesions continues to be clinical evaluation by an experienced ENT specialist and confirmed by histopathological examination. Our findings did suggest that advanced machine learning models, which consider the complex interactions present in HSV data, could potentially indicate a heightened risk of malignancy. Therefore, this technology could prove pivotal in aiding in early cancer detection, thereby emphasizing the need for further investigation and validation.
Collapse
Affiliation(s)
- Jakub Malinowski
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Wioletta Pietruszewska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Konrad Stawiski
- Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magdalena Kowalczyk
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Magda Barańska
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| | - Aleksander Rycerz
- Department of Biostatistics and Translational Medicine, Medical University of Lodz, 90-419 Lodz, Poland
| | - Ewa Niebudek-Bogusz
- Department of Otolaryngology, Head and Neck Oncology, Medical University of Lodz, 90-419 Lodz, Poland
| |
Collapse
|
4
|
Buckley DP, Abur D, Stepp CE. Normative Values of Cepstral Peak Prominence Measures in Typical Speakers by Sex, Speech Stimuli, and Software Type Across the Life Span. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023; 32:1565-1577. [PMID: 37257202 PMCID: PMC10473385 DOI: 10.1044/2023_ajslp-22-00264] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 12/15/2022] [Accepted: 03/16/2023] [Indexed: 06/02/2023]
Abstract
PURPOSE The purpose of this study was to determine normative values for cepstral peak prominence measures across the life span as a function of sex using clinically relevant stimuli (/ɑ/, /i/, and two sentences of The Rainbow Passage) and two commonly used software types: Praat (Version 6.0.50) and Analysis of Dysphonia in Speech and Voice (ADSV). METHOD One hundred fifty speakers (75 males, 75 females; evenly distributed into three age groups) without voice disorders aged 18-91 years were recorded via headset microphone in a sound-treated booth. Cepstral measures were analyzed using common analysis methods in Praat and ADSV by sex, stimuli, and software type. Kruskal-Wallis tests and post hoc Mood's Median tests for significant factors were performed on cepstral measures to assess the effects of age group, sex, stimuli, and software type. RESULTS The results revealed statistically significant effects of sex, stimuli, and software type on cepstral measures, but no statistical effect of age group on cepstral values. Females had lower average cepstral values compared to males. Across stimuli, the highest average cepstral measure was found for sustained /ɑ/, followed by sustained /i/, and then of the two sentences of The Rainbow Passage. Average cepstral measures in Praat were higher than those from ADSV. CONCLUSIONS The current work did not find a statistical effect of age group on cepstral values; thus, normative cepstral values were reported by sex, stimuli, and software type. Future work should examine the applicability of these normative values for discriminating speakers with and without voice disorders.
Collapse
Affiliation(s)
- Daniel P. Buckley
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Otolaryngology – Head and Neck Surgery, Boston University School of Medicine, MA
| | - Defne Abur
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Computational Linguistics, University of Groningen, the Netherlands
- Research School of Behavioral and Cognitive Neurosciences, University of Groningen, the Netherlands
| | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Otolaryngology – Head and Neck Surgery, Boston University School of Medicine, MA
- Department of Biomedical Engineering, Boston University, MA
| |
Collapse
|
5
|
Fujiki RB, Croegaert-Koch CK, Thibeault SL. Videostroboscopy Versus High-Speed Videoendoscopy: Factors Influencing Ratings of Laryngeal Oscillation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1496-1510. [PMID: 37040690 PMCID: PMC10457078 DOI: 10.1044/2023_jslhr-22-00649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/16/2023] [Accepted: 01/23/2023] [Indexed: 05/11/2023]
Abstract
PURPOSE The purpose of this study was to determine whether patient voice-related diagnosis, severity of dysphonia, and rater's experience influence the relationship between laryngeal oscillation ratings made from videostroboscopic and high-speed videoendoscopic (HSV) exams. METHOD Stroboscopy and HSV exams from 15 patients with adductor spasmodic dysphonia (ADSD) and 15 with benign vocal fold lesions were rated for laryngeal oscillation and closure by 10 licensed speech-language pathologists (SLPs). Raters were divided into low- (< 5 years) and high-experience (> 5 years) groups. Ratings of vocal fold amplitude, mucosal wave, periodicity, phase symmetry, nonvibrating portion of the vocal fold, and glottal closure were examined using an online form adapted from the Voice Vibratory Assessment of Laryngeal Imaging (VALI). RESULTS Stroboscopy and HSV ratings were more strongly positively correlated for patients with benign vocal fold lesions (r between .43 and .75) than for those with ADSD (r between .40 and .68). Differences between stroboscopy and HSV exams were significantly greater for ratings of amplitude, mucosal wave, and periodicity in patients with ADSD than for patients with benign vocal fold lesions. Raters with < 5 years of experience showed significantly greater differences between stroboscopy and HSV ratings of amplitude and nonvibrating portion of the vocal fold for patients with ADSD only. Significantly greater differences between ratings of periodicity and phase symmetry were observed in patients with more severe dysphonia. CONCLUSIONS Differences in laryngeal ratings made between HSV and stroboscopy exams may be influenced by patient diagnosis, severity of dysphonia, and rater experience. Future study is warranted to determine how the differences observed influence clinical diagnosis and outcomes.
Collapse
|
6
|
Differences Among Mixed, Chest, and Falsetto Registers: A Multiparametric Study. J Voice 2023; 37:298.e11-298.e29. [PMID: 33518476 DOI: 10.1016/j.jvoice.2020.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/23/2020] [Accepted: 12/28/2020] [Indexed: 11/23/2022]
Abstract
INTRODUCTION Typical singing registers are the chest and falsetto; however, trained singers have an additional register, namely, the mixed register. The mixed register, which is also called "mixed voice" or "mix," is an important technique for singers, as it can help bridge from the chest voice to falsetto without noticeable voice breaks. OBJECTIVE The present study aims to reveal the nature of the voice-production mechanism of the different registers (chest, mix, and falsetto) using high-speed digital imaging (HSDI), electroglottography (EGG), and acoustic and aerodynamic measurements. STUDY DESIGN Cross-sectional study. METHODS Aerodynamic measurements were acquired for twelve healthy singers (six men and women) during the phonation of a variety of pitches using three registers. HSDI and EGG devices were simultaneously used on three healthy singers (two men and one woman) from which an open quotient (OQ) and speed quotient (SQ) were detected. Audio signals were recorded for five sustained vowels, and a spectral analysis was conducted to determine the amplitude of each harmonic component. Furthermore, the absolute (not relative) value of the glottal volume flow was estimated by integrating data obtained from the HSDI and aerodynamic studies. RESULTS For all singers, the subglottal pressure (PSub) was the highest for the chest in the three registers, and the mean flow rate (MFR) was the highest for the falsetto. Conversely, the PSub of the mix was as low as the falsetto, and the MFR of the mix was as low as the chest. The HSDI analysis showed that the OQ differed significantly among the registers, even when the fundamental frequency was the same; the OQ of the mix was higher than that of the chest but lower than that of the falsetto. The acoustic analysis showed that, for the mix, the harmonic structure was intermediate between the chest and falsetto. The results of the glottal volume-flow analysis revealed that the maximum volume velocity was the least for the mix register at every fundamental frequency. The first and second harmonic (H1-H2) difference of the voice source spectrum was the greatest for the falsetto, then the mix, and finally, the chest. CONCLUSIONS We found differences in the registers in terms of the aeromechanical mechanisms and vibration patterns of the vocal folds. The mixed register proved to have a distinct voice-production mechanism, which can be differentiated from those of the chest or falsetto registers.
Collapse
|
7
|
Deus ABD, Quinino RDC, Santos MAR, Gama ACC. Videokymographic index of glottic function: an analysis of diagnostic accuracy. Codas 2023. [DOI: 10.1590/2317-1782/20212021214en] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
ABSTRACT Purpose To develop the Videokymographic Index of Glottic Function (VIGF), a composite indicator from digital videokymography parameters, captured by high-speed videolaryngoscopy exams of women with and without laryngeal alterations of behavioral etiology. Methods The sample consisted of 92 women aged between 18 and 45 years. Fifty-five (55) women with behavioral dysphonia, presenting with laryngeal and voice alterations, and thirty-seven (37) women without any laryngeal and voice alterations. Voice evaluation was performed by consensus via an auditory-perceptual analysis of the sustained vowel /a/ at a habitual pitch and loudness. Voice classification was obtained by means of a general degree of dysphonia, where G0 indicated neutral voice quality and G1 to G3 indicated altered voice quality. Laryngeal images were captured via digital videokymography analysis of a sustained vowel /i/ at a habitual pitch and loudness. The VIGF was based on the midpoint of the glottal region for analysis. Logistic regression was performed using the MINITAB 19 program. Results Logistic regression was composed of two stages: Stage 1 consisted of the analysis of all variables, where the maximum opening and closed quotient variables showed statistical significance (p-value <0.05) and the model was well adjusted according to the Hosmer-Lemeshow test (p-value=0.794). Stage 2 consisted of the re-analysis of the selected variables, also showing a well-adjusted model (p-value=0.198). The VIGF was defined as follows: VIGF=e^(8.1318-0.2941AbMax-0.0703FechGlo)/1+e^(8.1318-0.2941AbMax-0.0703FechGlo). Conclusion The VIGF demonstrated a cut-off value equal to 0.71. The probability of success was 81.5%, sensitivity 76.4%, and specificity 89.2%.
Collapse
|
8
|
Deus ABD, Quinino RDC, Santos MAR, Gama ACC. Videokymographic index of glottic function: an analysis of diagnostic accuracy. Codas 2022; 35:e20210214. [PMID: 36259820 PMCID: PMC10010432 DOI: 10.1590/2317-1782/20212021214pt] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 02/02/2022] [Indexed: 11/06/2022] Open
Abstract
PURPOSE To develop the Videokymographic Index of Glottic Function (VIGF), a composite indicator from digital videokymography parameters, captured by high-speed videolaryngoscopy exams of women with and without laryngeal alterations of behavioral etiology. METHODS The sample consisted of 92 women aged between 18 and 45 years. Fifty-five (55) women with behavioral dysphonia, presenting with laryngeal and voice alterations, and thirty-seven (37) women without any laryngeal and voice alterations. Voice evaluation was performed by consensus via an auditory-perceptual analysis of the sustained vowel /a/ at a habitual pitch and loudness. Voice classification was obtained by means of a general degree of dysphonia, where G0 indicated neutral voice quality and G1 to G3 indicated altered voice quality. Laryngeal images were captured via digital videokymography analysis of a sustained vowel /i/ at a habitual pitch and loudness. The VIGF was based on the midpoint of the glottal region for analysis. Logistic regression was performed using the MINITAB 19 program. RESULTS Logistic regression was composed of two stages: Stage 1 consisted of the analysis of all variables, where the maximum opening and closed quotient variables showed statistical significance (p-value <0.05) and the model was well adjusted according to the Hosmer-Lemeshow test (p-value=0.794). Stage 2 consisted of the re-analysis of the selected variables, also showing a well-adjusted model (p-value=0.198). The VIGF was defined as follows: VIGF=e^(8.1318-0.2941AbMax-0.0703FechGlo)/1+e^(8.1318-0.2941AbMax-0.0703FechGlo). CONCLUSION The VIGF demonstrated a cut-off value equal to 0.71. The probability of success was 81.5%, sensitivity 76.4%, and specificity 89.2%.
Collapse
Affiliation(s)
- Alice Braga de Deus
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| | - Roberto da Costa Quinino
- Departamento de Estatística, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| | | | - Ana Cristina Côrtes Gama
- Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| |
Collapse
|
9
|
Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study. Sci Rep 2021; 11:20480. [PMID: 34650174 PMCID: PMC8516923 DOI: 10.1038/s41598-021-99948-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/04/2021] [Indexed: 12/03/2022] Open
Abstract
High-Speed Videoendoscopy (HSV) is becoming a robust tool for the assessment of vocal fold vibration in laboratory investigation and clinical practice. We describe the first successful application of flexible High Speed Videoendoscopy with innovative laser light source conducted in clinical settings. The acquired image and simultaneously recorded audio data are compared to the results obtained by means of a rigid endoscope. We demonstrated that the HSV recordings with fiber-optic laryngoscope have enabled obtaining consistently bright, color images suitable for parametrization of vocal fold oscillation similarly as in the case of the HSV data obtained from a rigid laryngoscope. The comparison of period and amplitude perturbation parameters calculated on the basis of image and audio data acquired from flexible and rigid HSV recording objectively confirm that flexible High-Speed Videoendoscopy is a more suitable method for examination of natural phonation. The HSV-based measures generated from this kymographic analysis are arguably a superior representation of the vocal fold vibrations than the acoustic analysis because their quantification is independent of the vocal tract influences. This experimental study has several implications for further research in the field of HSV application in clinical assessment of glottal pathologies nature and its effect on vocal folds vibrations.
Collapse
|
10
|
Oliveira RCCD, Gama ACC, Genilhú PDFL, Santos MAR. High speed digital videolaringoscopy: evaluation of vocal nodules and cysts in women. Codas 2021; 33:e20200095. [PMID: 34008770 DOI: 10.1590/2317-1782/20202020095] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 07/04/2020] [Indexed: 11/22/2022] Open
Abstract
PURPOSE To evaluate and compare the parameters of Digital kymography obtained through the High-speed Videolaryngoscopy of women without laryngeal disorders, of women with vocal fold nodules and of women with vocal cysts. METHODS A cross-sectional observational study in which 60 women aged 18 years and 45 years were selected. Three study groups were formed: 20 women without laryngeal disorder forming the control group (Group 1), 20 women with diagnosis of vocal fold nodules forming Group 2 and 20 women with diagnosis of vocal cysts forming Group 3. Subsequently the participants were evaluated by High-speed Videolaryngoscopy for analysis and comparison of laryngeal images using Digital kymography. The laryngeal parameters processed by the program KIPS® were: minimum, maximum and mean opening; dominant amplitude of the left and right vocal folds; dominant frequency of the right and left vocal folds; and close. RESULTS The analysis of Digital kymography suggests that the presence of the vocal fold nodules and the vocal cysts tend to restrict more to the maximum and minimum opening of the vocal fold and the dominant amplitude of the opening variation in the middle region of the glottis. CONCLUSION Digital kymography parameters were similar in the presence of vocal fold nodules and vocal cysts lesions.
Collapse
Affiliation(s)
- Renata Cristina Cordeiro Diniz Oliveira
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| | - Ana Cristina Côrtes Gama
- Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| | - Patrícia de Freitas Lopes Genilhú
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| | - Marco Aurélio Rocha Santos
- Programa de Pós-graduação em Ciências Fonoaudiológicas, Departamento de Fonoaudiologia, Faculdade de Medicina, Universidade Federal de Minas Gerais - UFMG - Belo Horizonte (MG), Brasil
| |
Collapse
|
11
|
Mohd Khairuddin KA, Ahmad K, Ibrahim HM, Yan Y. Effects of Using Laryngeal High-Speed Videoendoscopy Images Visualizing Partial Views of The Glottis on Measurement Outcomes. J Voice 2020; 36:106-112. [PMID: 32456835 DOI: 10.1016/j.jvoice.2020.04.027] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 04/21/2020] [Accepted: 04/22/2020] [Indexed: 11/29/2022]
Abstract
Ideally, an analysis method for laryngeal high-speed videoendoscopy (LHSV) based on the glottal area waveforms (GAW) requires images of a complete view of the glottis to ensure findings that are representatives of the vibratory behaviors of the whole vocal folds. However, in practice, the preferred images may not be obtained at all times. Often, the only available images that a clinician has to work with consist of a partial view of the glottis. This study aims to examine the effects of using images of a partial view of the glottis (ie, posterior-middle, anterior-middle, or middle) on the LHSV-based measures (ie, fundamental frequency (F0GAW), frequency perturbation (jitterGAW), amplitude perturbation (shimmerGAW), open quotient (OQGAW), and Nyquist plot). The participants consisted of 9 young normophonic females. The procedures involved LHSV recording of the vibration of the vocal folds. The images of the complete view of the glottis were analyzed to obtain the LHSV-based measures. The same images were used to simulate the images of partial views of the glottis by changing the outline of the region of interest to include only either the posterior-middle, anterior-middle, or middle parts of the glottis. The LHSV-based measures from the images of the partial views were then compared to those with the complete view . The results showed that all LHSV-based measures from the images of the posterior-middle view were similar to those of the complete view. However, only the F0GAW, jitterGAW, and shimmerGAW from the images of the anterior-middle and middle views were similar to those of the complete view. Lower OQGAW and different Nyquist plots than those of the complete view were generated by the images of the anterior-middle and middle views. In conclusion, all LHSV-based measures from the images of the posterior-middle view of the glottis, and only the F0GAW, jitterGAW, and shimmerGAW from the images of the anterior-middle and middle views of the glottis reflect the vibratory behaviors of the whole vocal folds. The same conclusion could not be applied to the OQGAW and Nyquist plots of the images of the anterior-middle and middle views of the glottis. A possible effect of the presence or absence of a posterior glottal gap on the findings warrants further confirmation.
Collapse
Affiliation(s)
- Khairy Anuar Mohd Khairuddin
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia; Speech Pathology Program, School of Health Sciences, Universiti Sains Malaysia, Kelantan, Malaysia.
| | - Kartini Ahmad
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Hasherah Mohd Ibrahim
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Yuling Yan
- Department of Bioengineering, School of Engineering, Santa Clara University, California
| |
Collapse
|
12
|
Abstract
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
Collapse
|
13
|
Fehling MK, Grosch F, Schuster ME, Schick B, Lohscheller J. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS One 2020; 15:e0227791. [PMID: 32040514 PMCID: PMC7010264 DOI: 10.1371/journal.pone.0227791] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 12/25/2019] [Indexed: 01/22/2023] Open
Abstract
The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.
Collapse
Affiliation(s)
- Mona Kirstin Fehling
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Fabian Grosch
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Maria Elke Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, München, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| |
Collapse
|
14
|
Mohd Khairuddin KA, Ahmad K, Mohd Ibrahim H, Yan Y. Analysis Method for Laryngeal High-Speed Videoendoscopy: Development of the Criteria for the Measurement Input. J Voice 2019; 35:636-645. [PMID: 31864891 DOI: 10.1016/j.jvoice.2019.12.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 12/03/2019] [Accepted: 12/03/2019] [Indexed: 10/25/2022]
Abstract
Despite its clear advantages, laryngeal high-speed videoendoscopy (LHSV) has not yet been accepted as a routine imaging tool for the evaluation of vocal fold vibration due to the unavailability of methods to effectively analyze the huge number of images from the LHSV recording. Recently, a promising LHSV-based analysis method has been introduced. The ability of this analysis method in studying the vocal fold vibratory behaviors had been substantially demonstrated. However, some practical aspects of its clinical applications still require further attention. Most fundamental is that the criteria for the measurement input ie, a segment of interest (SOI), which has not been fully defined. Particularly, the length of the SOI and the location along the sample, where it needs to be selected require further confirmation. Meanwhile, the analysis using any options of a well-delineated glottal area demands verification. Without clear criteria for the SOI, it is difficult to demonstrate the relevance of this analysis method in clinical voice assessment. Therefore, the aim of the present study is to establish the criteria for the SOI, which involved the investigations on the length of the SOI and the location along the sample, where it needs to be selected, as well as the use of any options of a well-delineated glottal area for analysis. The participants in the present study consisted of 36 young normophonic females. The methods involved LHSV recording of the images of the vibrating vocal folds. The captured images were then analyzed using the method. The LHSV-based measures from the analyses were compared according to the specified procedures of each investigation. Results indicated that 2000 frames should be used as the SOI length. The SOI could be selected at any location along the sample as long as well-delineated glottal areas were observed. With the current findings, a more conclusive measurement protocol is available to ensure reliable LHSV-based measures. The findings further support this analysis method for clinical application, which in turn promote LHSV as a reliable laryngeal imaging tool in clinical setting.
Collapse
Affiliation(s)
- Khairy Anuar Mohd Khairuddin
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia; Speech Pathology Program, School of Health Sciences, Universiti Sains Malaysia, Kelantan, Malaysia.
| | - Kartini Ahmad
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Hasherah Mohd Ibrahim
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Yuling Yan
- Department of Bioengineering, School of Engineering, Santa Clara University, California
| |
Collapse
|
15
|
Pediatric dysphonia: It's not about the nodules. Int J Pediatr Otorhinolaryngol 2019; 125:147-152. [PMID: 31323352 DOI: 10.1016/j.ijporl.2019.06.031] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 06/27/2019] [Accepted: 06/28/2019] [Indexed: 11/21/2022]
Abstract
OBJECTIVE Despite the fact that vocal nodules are the most common cause of chronic dysphonia in children, uncertainty and lack of consensus complicates practically every diagnostic and management decision. Selecting an optimal staging system is fundamental to understanding a disease process, mandatory for uniform reporting, and crucial to predicting natural history and treatment outcomes. The ideal prognostic model for vocal nodules is under intense debate. The purpose of this study was to analyze the predictive power of vocal nodule grade to severity of voice metrics in children. METHODS Seventy-nine patients diagnosed with vocal cord nodules between 2006 and 2012 were drawn from UPMC Children's Hospital of Pittsburgh Voice, Resonance and Swallowing Center Research Registry. Subject age at time of diagnosis, nodule grade, relevant co-morbidities, scores on The Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V), parent-reported Pediatric Voice Handicap Index (pVHI), the phonotraumatic behaviors profile, habitual speaking pitch fundamental frequencies, pitch range, volume intensity, and s/z ratio were recorded and compiled into a de-identified database for analysis. RESULTS Based on the Kruskal-Wallis H Test, there was no statistically significant correlation between nodule grade and total pitch range (p = .21), s/z ratio (p = .50), volume intensity (p = .33), overall CAPE-V Scores (p = .15), or pVHI Scores (p = .29). Chi-squared tests also revealed no significant associations between nodule grade and abnormality in habitual speaking pitch (p = .14 for fundamental frequency while sustaining a vowel sound, p = .37 for fundamental frequency while speaking structured tasks i.e. counting, or p = .76 while speaking in conversation). CONCLUSION The current "gold-standard" for grading vocal nodule size suggests that the nodules themselves are not driving the standard dysphonic metrics that are most commonly collected and monitored in such children. This outcome is consistent with other studies reporting similar findings and was expected based on the inconsistencies in the reported literature to date. By extension, the conventional wisdom of avoiding surgical treatment of vocal nodules in children seems prudent as there is little evidence to suggest that the nodules themselves are "driving" the severity of the dysphonia. Ultimately identifying the true "drivers" of dysphonia in children will suggest alternative therapies that are more specific and directed to the pathophysiology. Most pediatric voice care professionals will welcome such discoveries as those in the front line of patient care are often rendered helpless and frustrated.
Collapse
|
16
|
Li S, Scherer RC, Wan M, Wang S, Song B. Intraglottal Pressure: A Comparison Between Male and Female Larynxes. J Voice 2019; 34:813-822. [PMID: 31311664 DOI: 10.1016/j.jvoice.2019.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 06/08/2019] [Accepted: 06/10/2019] [Indexed: 11/25/2022]
Abstract
Acoustic differences in the phonated sounds made by men and women are related to laryngeal and vocal tract structural differences. This model-based study explored how typical vocal fold differences between males and females affect intraglottal pressure distributions under conditions of different glottal angles and transglottal pressures, and thus how they may affect phonation. The computational code ANSYS Fluent 6.3 was used to obtain the pressure distributions and other aerodynamic parameters for laminar, incompressible flow. Typical values of the vocal fold length, the vertical glottal duct length, and the lateral vocal fold tissue depth were selected both for males and females under conditions of nine typical convergent/divergent glottal angles and three transglottal pressures. There was no coupling of the upstream or downstream vocal tracts, and also no vocal fold contact in these two-dimensional static glottal geometries. Results suggest that males tend to have greater intraglottal pressures for the convergent glottal shape that occurs during glottal opening, and the male glottis offers less flow resistance than the female glottis. These results suggest that the male vocal folds may vibrate more easily (ie, with lower transglottal pressure) but the tissue differences may nullify such an hypothesis. Also, the peak velocities in the glottis were dependent on the transglottal pressure driving the flow and the minimal glottal diameter, which were the same for both the male and female larynxes, rather than on the inferior-superior length of the glottis or the anterior-posterior glottal length. In addition, the tangential forces for larger glottal convergent angles was significantly greater in the female larynx. The entrance loss coefficients, however, were similar between the male and female larynxes, except for the uniform glottis for which the values were larger for the male larynx. The results suggest that the structural differences between male and female vocal folds should be well specified when building computational and physical models of the larynx.
Collapse
Affiliation(s)
- Sheng Li
- College of Science, Xijing University, Xi'an, People's Republic of China
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, Ohio
| | - MingXi Wan
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China.
| | - SuPin Wang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Bo Song
- College of Aeronautical Engineering, Air Force Engineering University, Xi'an, People's Republic of China
| |
Collapse
|
17
|
Birk V, Kniesburges S, Semmler M, Berry DA, Bohr C, Döllinger M, Schützenberger A. Influence of glottal closure on the phonatory process in ex vivo porcine larynges. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:2197. [PMID: 29092569 PMCID: PMC6909995 DOI: 10.1121/1.5007952] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Many cases of disturbed voice signals can be attributed to incomplete glottal closure, vocal fold oscillation asymmetries, and aperiodicity. Often these phenomena occur simultaneously and interact with each other, making a systematic, isolated investigation challenging. Therefore, ex vivo porcine experiments were performed which enable direct control of glottal configurations. Different pre-phonatory glottal gap sizes, adduction levels, and flow rates were adjusted. The resulting glottal closure types were identified in a post-processing step. Finally, the acoustic quality, aerodynamic parameters, and the characteristics of vocal fold oscillation were analyzed in reference to the glottal closure types. Results show that complete glottal closure stabilizes the phonation process indicated through a reduced left-right phase asymmetry, increased amplitude and time periodicity, and an increase in the acoustic quality. Although asymmetry and periodicity parameter variation covers only a small range of absolute values, these small variations have a remarkable influence on the acoustic quality. Due to the fact that these parameters cannot be influenced directly, the authors suggest that the (surgical) reduction of the glottal gap seems to be a promising method to stabilize the phonatory process, which has to be confirmed in future studies.
Collapse
Affiliation(s)
- Veronika Birk
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| | - David A Berry
- Laryngeal Dynamics Laboratory, Division of Head and Neck Surgery, David Geffen School of Medicine at UCLA, 10833 Le Conte Avenue, Los Angeles, California 90095-1624, USA
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstr. 1, 91054 Erlangen, Germany
| |
Collapse
|