1
|
Patel RR, Döllinger M, Jakubaß B, Pinhack H, Katz U, Semmler M. Analyzing Vocal Fold Frequency Dynamics Using High-Speed 3D Laser Video Endoscopy. Laryngoscope 2024; 134:3267-3276. [PMID: 38481073 PMCID: PMC11182720 DOI: 10.1002/lary.31394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 02/24/2024] [Accepted: 02/29/2024] [Indexed: 06/18/2024]
Abstract
OBJECTIVE To examine changes in lateral and vertical vibratory motion along the anterior, middle, and posterior sections of the vocal folds, as a function of vocal frequency variations. METHODS Absolute measurements of vocal fold surface dynamics from high-speed videoendoscopy with custom laser endoscope were made on 23 vocally healthy adults during sustained /i:/ production at 10%, 20%, and 80% of pitch range. The 3D parameters of amplitude (mm), maximum velocity opening/closing (mm/s), and mean velocity opening/closing (mm/s) were computed for the lateral and vertical vibratory motion along the anterior, middle, and posterior sections of the vocal folds. Linear mixed model analysis was conducted to evaluate the differences in (a) vocal frequency levels (high vs. normal vs. low pitch), (b) axis level (vertical vs. lateral), (c) position level (anterior vs. middle vs. posterior), and (d) gender differences (male vs. female). RESULTS Overall, the superior surface vertical motion of the vocal fold is greater compared with the lateral motion, especially in males. Along the superior surface, the mean and maximum closing velocities are greater posteriorly for low pitch. The location (anterior, middle, and posterior) along the superior surface is relevant only for vocal fold closing rather than opening, as the dynamics are different along the various locations. CONCLUSIONS The study highlights the significance of assessing the vertical motion of the superior surface of the vocal fold to understand the complex dynamics of voice production. LEVEL OF EVIDENCE NA Laryngoscope, 134:3267-3276, 2024.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Otolaryngology Head and Neck Surgery, Indiana University, Indianapolis, Indiana, United States
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Bernhard Jakubaß
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Hanna Pinhack
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Ute Katz
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
2
|
Patel RR, Lulich SM, Francisco P. Laryngeal, Respiratory, and Acoustic Characteristics of Vocal Trillo With Simultaneous High-Speed Videoendoscopy, Inductive Plethysmography, and Acoustic Recordings. J Voice 2023:S0892-1997(23)00362-4. [PMID: 38008677 DOI: 10.1016/j.jvoice.2023.11.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2023] [Revised: 11/02/2023] [Accepted: 11/03/2023] [Indexed: 11/28/2023]
Abstract
OBJECTIVE This study aimeed to examine the characteristics of formed and unformed trillo, an essential ornament found in 17th-century Italian vocal music, using simultaneous multimodality voice measurements. PARTICIPANT AND METHODS A 28-year-old female with 12 years of classical voice training and 7 years of advanced training in historical performance produced formed trillo, unformed trillo, oscillating trill, vibrato, and straight tone on the vowel /i/. Simultaneous high-speed videoendoscopy, inductive plethysmography, and acoustic recordings were conducted to examine the laryngeal motion, respiratory kinematics, and output sound characteristics. RESULTS The study findings reveal that trillo in this single participant is not only produced by the periodic adduction/abduction of the vocal fold but also with underlying differences in oscillatory mechanisms and increased glottal flow (use of percent vital capacity) controlled by increased activation of abdominal muscles and/or decreased activation (inspiratory braking) of the diaphragm relative to tidal breathing when compared with straight tone, vibrato, and oscillating trill. The formed trillo differs from the unformed trillo in the oscillatory mechanisms and glottal airflow utilization. CONCLUSIONS The physiological mechanism responsible for trillo is more complex than simply adduction and abduction. Future studies with a greater number of participants are needed to evaluate the mechanisms responsible for the formation of and the auditory-perceptual differences between the formed versus unformed trillo.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana.
| | - Steven M Lulich
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana
| | - Paulina Francisco
- Historical Performance Department, Jacobs School of Music, Indiana University, Bloomington, Indiana
| |
Collapse
|
3
|
Patel RR, Sandage MJ, Golzarri-Arroyo L. High-Speed Videoendoscopic and Acoustic Characteristics of Inspiratory Phonation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:1192-1207. [PMID: 36917802 DOI: 10.1044/2022_jslhr-22-00502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE Given the importance of inspiratory phonation for assessment of vocal fold structure, the aim of this investigation was to evaluate and describe the vocal fold vibratory characteristics of inspiratory phonation using high-speed videoendoscopy in healthy volunteers. The study also examined the empirical relationship between cepstral peak prominence (CPP) and glottal area waveform measurements derived from simultaneous high-speed videoendoscopy and audio recordings. METHOD Vocally healthy adults (33 women, 28 men) volunteered for this investigation and completed high-speed videoendoscopic assessment of vocal fold function for two trials of an expiratory/inspiratory phonation task at normal pitch and normal loudness. Twelve glottal area waveform measures and acoustic CPP values were extracted for analyses. RESULTS Inspiratory phonation resulted in shorter closing time, longer duration of the opening phase, and faster closing phase velocity compared to expiratory phonation. Sex differences were elucidated. CPP changes for inspiratory phonation were predicted by changes in the glottal area index and waveform symmetry index, whereas changes in CPP during expiratory phonation were predicted by changes in asymmetry quotient, glottal area index, and amplitude periodicity. CONCLUSIONS Vocal fold vibratory differences were identified for inspiratory phonation when compared to expiratory phonation, the latter of which has been studied more extensively. This investigation provides important basic inspiratory phonation data to better understand laryngeal physiology in vivo and provides a basic model from which to further study inspiratory phonation in a larger population representing a broader age range. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.22223812.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Mary J Sandage
- Department of Speech, Language & Hearing Sciences, Auburn University, AL
| | | |
Collapse
|
4
|
Differences Among Mixed, Chest, and Falsetto Registers: A Multiparametric Study. J Voice 2023; 37:298.e11-298.e29. [PMID: 33518476 DOI: 10.1016/j.jvoice.2020.12.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/23/2020] [Accepted: 12/28/2020] [Indexed: 11/23/2022]
Abstract
INTRODUCTION Typical singing registers are the chest and falsetto; however, trained singers have an additional register, namely, the mixed register. The mixed register, which is also called "mixed voice" or "mix," is an important technique for singers, as it can help bridge from the chest voice to falsetto without noticeable voice breaks. OBJECTIVE The present study aims to reveal the nature of the voice-production mechanism of the different registers (chest, mix, and falsetto) using high-speed digital imaging (HSDI), electroglottography (EGG), and acoustic and aerodynamic measurements. STUDY DESIGN Cross-sectional study. METHODS Aerodynamic measurements were acquired for twelve healthy singers (six men and women) during the phonation of a variety of pitches using three registers. HSDI and EGG devices were simultaneously used on three healthy singers (two men and one woman) from which an open quotient (OQ) and speed quotient (SQ) were detected. Audio signals were recorded for five sustained vowels, and a spectral analysis was conducted to determine the amplitude of each harmonic component. Furthermore, the absolute (not relative) value of the glottal volume flow was estimated by integrating data obtained from the HSDI and aerodynamic studies. RESULTS For all singers, the subglottal pressure (PSub) was the highest for the chest in the three registers, and the mean flow rate (MFR) was the highest for the falsetto. Conversely, the PSub of the mix was as low as the falsetto, and the MFR of the mix was as low as the chest. The HSDI analysis showed that the OQ differed significantly among the registers, even when the fundamental frequency was the same; the OQ of the mix was higher than that of the chest but lower than that of the falsetto. The acoustic analysis showed that, for the mix, the harmonic structure was intermediate between the chest and falsetto. The results of the glottal volume-flow analysis revealed that the maximum volume velocity was the least for the mix register at every fundamental frequency. The first and second harmonic (H1-H2) difference of the voice source spectrum was the greatest for the falsetto, then the mix, and finally, the chest. CONCLUSIONS We found differences in the registers in terms of the aeromechanical mechanisms and vibration patterns of the vocal folds. The mixed register proved to have a distinct voice-production mechanism, which can be differentiated from those of the chest or falsetto registers.
Collapse
|
5
|
Kruse E, Döllinger M, Schützenberger A, Kist AM. GlottisNetV2: Temporal Glottal Midline Detection Using Deep Convolutional Neural Networks. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2023; 11:137-144. [PMID: 36816097 PMCID: PMC9933989 DOI: 10.1109/jtehm.2023.3237859] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 11/27/2022] [Accepted: 01/04/2023] [Indexed: 11/26/2023]
Abstract
High-speed videoendoscopy is a major tool for quantitative laryngology. Glottis segmentation and glottal midline detection are crucial for computing vocal fold-specific, quantitative parameters. However, fully automated solutions show limited clinical applicability. Especially unbiased glottal midline detection remains a challenging problem. We developed a multitask deep neural network for glottis segmentation and glottal midline detection. We used techniques from pose estimation to estimate the anterior and posterior points in endoscopy images. Neural networks were set up in TensorFlow/Keras and trained and evaluated with the BAGLS dataset. We found that a dual decoder deep neural network termed GlottisNetV2 outperforms the previously proposed GlottisNet in terms of MAPE on the test dataset (1.85% to 6.3%) while converging faster. Using various hyperparameter tunings, we allow fast and directed training. Using temporal variant data on an additional data set designed for this task, we can improve the median prediction accuracy from 2.1% to 1.76% when using 12 consecutive frames and additional temporal filtering. We found that temporal glottal midline detection using a dual decoder architecture together with keypoint estimation allows accurate midline prediction. We show that our proposed architecture allows stable and reliable glottal midline predictions ready for clinical use and analysis of symmetry measures.
Collapse
Affiliation(s)
- Elina Kruse
- Department Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen–Nürnberg (FAU)91052ErlangenGermany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen–Nürnberg (FAU)91054ErlangenGermany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander-University Erlangen–Nürnberg (FAU)91054ErlangenGermany
| | - Andreas M. Kist
- Department Artificial Intelligence in Biomedical EngineeringFriedrich-Alexander-University Erlangen–Nürnberg (FAU)91052ErlangenGermany
| |
Collapse
|
6
|
Yousef AM, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF, Naghibolhosseini M. Spatial Segmentation for Laryngeal High-Speed Videoendoscopy in Connected Speech. J Voice 2023; 37:26-36. [PMID: 33257208 PMCID: PMC8411982 DOI: 10.1016/j.jvoice.2020.10.017] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2020] [Revised: 10/11/2020] [Accepted: 10/23/2020] [Indexed: 01/17/2023]
Abstract
OBJECTIVE This study proposes a new computational framework for automated spatial segmentation of the vocal fold edges in high-speed videoendoscopy (HSV) data during connected speech. This spatio-temporal analytic representation of the vocal folds enables the HSV-based measurement of the glottal area waveform and other vibratory characteristics in the context of running speech. METHODS HSV data were obtained from a vocally normal adult during production of the "Rainbow Passage." An algorithm based on an active contour modeling approach was developed for the analysis of HSV data. The algorithm was applied on a series of HSV kymograms at different intersections of the vocal folds to detect the edges of the vibrating vocal folds across the frames. This edge detection method follows a set of deformation rules for the active contours to capture the edges of the vocal folds through an energy optimization procedure. The detected edges in the kymograms were then registered back to the HSV frames. Subsequently, the glottal area waveform was calculated based on the area of the glottis enclosed by the vocal fold edges in each frame. RESULTS The developed algorithm successfully captured the edges of the vocal folds in the HSV kymograms. This method led to an automated measurement of the glottal area waveform from the HSV frames during vocalizations in connected speech. CONCLUSION The proposed algorithm serves as an automated method for spatial segmentation of the vocal folds in HSV data in connected speech. This study is one of the initial steps toward developing HSV-based measures to study vocal fold vibratory characteristics and voice production mechanisms in norm and disorder in the context of connected speech.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Center for Regenerative Medicine, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Alessandro de Alarcon
- Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio; Department of Otolaryngology Head and Neck Surgery, University of Cincinnati, Ohio
| | - Robert F Orlikoff
- College of Allied Health Sciences, East Carolina University, Greenville, North Carolina
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
7
|
Fast JF, Oltmann A, Spindeldreier S, Ptok M. Computational Analysis of the Droplet-Stimulated Laryngeal Adductor Reflex in High-Speed Sequences. Laryngoscope 2022; 132:2412-2419. [PMID: 35133015 DOI: 10.1002/lary.30041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/28/2021] [Accepted: 01/23/2022] [Indexed: 12/16/2022]
Abstract
OBJECTIVES/HYPOTHESIS The laryngeal adductor reflex (LAR) is an important protective mechanism of the airways. Its physiology is still not completely understood. The available methods for LAR evaluation offer limited reproducibility and/or rely on subjective interpretation. A new approach, termed Microdroplet Impulse Testing of the LAR (MIT-LAR), was recently introduced. Here, the LAR is elicited by a droplet and a laryngoscopic high-speed recording is acquired simultaneously. In the present work, image-processing algorithms for autonomous MIT-LAR sequence analysis were developed. This allowed the automated approximation of kinematic LAR parameters in humans. STUDY DESIGN Development and testing of computational methods. METHODS Computational image processing enabled the autonomous estimation of the glottal area, the glottal angle, and the vocal fold edge distance in MIT-LAR sequences. A suitable analytical representation of these glottal parameters allowed the extraction of seven relevant LAR parameters. The obtained values were compared to the literature. RESULTS A generalized logistic function showed the highest average goodness of fit among four different analytical approaches for each of the glottal parameters. Autonomous sequence analysis yielded bilateral LAR response latencies of (229 ± 116) ms and (182 ± 60) ms for cases of complete and incomplete glottal closure, respectively. The initial/average/maximum angular vocal fold adduction velocity was estimated at (157 ± 115) °s-1 /(891 ± 516) °s-1 /(929 ± 583) °s-1 and (88 ± 53) °s-1 /(421 ± 221) °s-1 /(520 ± 238) °s-1 for complete and incomplete glottal closure, respectively. CONCLUSION The automated extraction of LAR parameters from laryngoscopic high-speed sequences can potentially increase the objectiveness of optical LAR characterization and reduce the associated workload. The proposed methods may thus be helpful for future research on this vital reflex. LEVEL OF EVIDENCE NA Laryngoscope, 132:2412-2419, 2022.
Collapse
Affiliation(s)
- Jacob Friedemann Fast
- Department of Phoniatrics and Pediatric Audiology, Hannover Medical School, Hanover, Germany.,Institute of Mechatronic Systems, Leibniz Universität Hannover, Hanover, Germany
| | - Andra Oltmann
- Institute of Mechatronic Systems, Leibniz Universität Hannover, Hanover, Germany.,Department of Modeling and Simulation, Fraunhofer Research Institution for Individualized and Cell-Based Medical Engineering, Lübeck, Germany
| | - Svenja Spindeldreier
- Institute of Mechatronic Systems, Leibniz Universität Hannover, Hanover, Germany
| | - Martin Ptok
- Department of Phoniatrics and Pediatric Audiology, Hannover Medical School, Hanover, Germany
| |
Collapse
|
8
|
Yousef AM, Deliyski DD, Zacharias SRC, Naghibolhosseini M. Deep-Learning-Based Representation of Vocal Fold Dynamics in Adductor Spasmodic Dysphonia during Connected Speech in High-Speed Videoendoscopy. J Voice 2022:S0892-1997(22)00263-6. [PMID: 36154973 PMCID: PMC10030376 DOI: 10.1016/j.jvoice.2022.08.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Revised: 08/14/2022] [Accepted: 08/17/2022] [Indexed: 11/28/2022]
Abstract
OBJECTIVE Adductor spasmodic dysphonia (AdSD) is a neurogenic dystonia, which causes spasms of the laryngeal muscles. This disorder mainly affects production of connected speech. To understand how AdSD affects vocal fold (VF) movements and hence, the speech signal, it is necessary to study VF kinematics during the running speech. This paper introduces an automated method for analysis of VF vibrations in AdSD using laryngeal high-speed videoendoscopy (HSV) in running speech. METHODS A monochrome HSV system was used to obtain video recordings from vocally normal individuals and AdSD patients during production of the six CAPE-V sentences and the "Rainbow Passage." A deep neural network was designed based on the UNet architecture. The network was developed for glottal area segmentation in HSV data providing a tool for quantitative analysis of VF vibrations in both norm and AdSD. The network was trained and validated using the manually labeled HSV frames. After training the network, the segmentation quality was quantitatively evaluated against visual analysis results of a test dataset including segregated HSV frames and a short sequence of VF vibrations in consecutive frames. RESULTS The developed convolutional network was successfully trained and demonstrated an accurate segmentation on the testing dataset with a mean Intersection over Union (IoU) of 0.81 and a mean Boundary-F1 score of 0.93. Moreover, the visual assessment of the automated technique showed an accurate detection of the glottal edges/area in the HSV data even with challenging image quality and excessive laryngeal maneuvers of AdSD patients during the running speech. CONCLUSION The introduced automated approach provides an accurate representation of the glottal edges/area during connected speech in HSV data for norm and AdSD patients. This method facilitates the development of HSV-based measures to quantify VF dynamics in AdSD. Using HSV to automatically analyze VF vibrations in AdSD can allow for understanding AdSD vocal mechanisms and characteristics.
Collapse
Affiliation(s)
- Ahmed M Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Dimitar D Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan
| | - Stephanie R C Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, Arizona; Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, Arizona
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
9
|
Analysis of Laryngeal High-Speed Videoendoscopy recordings – ROI detection. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
10
|
Zita A, Novozámský A, Zitová B, Šorel M, Herbst CT, Vydrová J, Švec JG. Videokymogram Analyzer Tool: Human–computer comparison. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
11
|
A single latent channel is sufficient for biomedical glottis segmentation. Sci Rep 2022; 12:14292. [PMID: 35995933 PMCID: PMC9395348 DOI: 10.1038/s41598-022-17764-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Accepted: 07/30/2022] [Indexed: 11/23/2022] Open
Abstract
Glottis segmentation is a crucial step to quantify endoscopic footage in laryngeal high-speed videoendoscopy. Recent advances in deep neural networks for glottis segmentation allow for a fully automatic workflow. However, exact knowledge of integral parts of these deep segmentation networks remains unknown, and understanding the inner workings is crucial for acceptance in clinical practice. Here, we show that a single latent channel as a bottleneck layer is sufficient for glottal area segmentation using systematic ablations. We further demonstrate that the latent space is an abstraction of the glottal area segmentation relying on three spatially defined pixel subtypes allowing for a transparent interpretation. We further provide evidence that the latent space is highly correlated with the glottal area waveform, can be encoded with four bits, and decoded using lean decoders while maintaining a high reconstruction accuracy. Our findings suggest that glottis segmentation is a task that can be highly optimized to gain very efficient and explainable deep neural networks, important for application in the clinic. In the future, we believe that online deep learning-assisted monitoring is a game-changer in laryngeal examinations.
Collapse
|
12
|
Yousef AM, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF, Naghibolhosseini M. A Deep Learning Approach for Quantifying Vocal Fold Dynamics During Connected Speech Using Laryngeal High-Speed Videoendoscopy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2098-2113. [PMID: 35605603 PMCID: PMC9567340 DOI: 10.1044/2022_jslhr-21-00540] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 01/30/2022] [Accepted: 02/28/2022] [Indexed: 06/15/2023]
Abstract
PURPOSE Voice disorders are best assessed by examining vocal fold dynamics in connected speech. This can be achieved using flexible laryngeal high-speed videoendoscopy (HSV), which enables us to study vocal fold mechanics with high temporal details. Analysis of vocal fold vibration using HSV requires accurate segmentation of the vocal fold edges. This article presents an automated deep-learning scheme to segment the glottal area in HSV from which the glottal edges are derived during connected speech. METHOD Using a custom-built HSV system, data were obtained from a vocally healthy participant reciting the "Rainbow Passage." A deep neural network was designed for glottal area segmentation in the HSV data. A recently introduced hybrid approach by the authors was utilized as an automated labeling tool to train the network on a set of HSV frames, where the glottis region was automatically annotated during vocal fold vibrations. The network was then tested against manually segmented frames using different metrics, intersection over union (IoU), and Boundary F1 (BF) score, and its performance was assessed on various phonatory events on the HSV sequence. RESULTS The designed network was successfully trained using the hybrid approach, without the need for manual labeling, and tested on the manually labeled data. The performance metrics showed a mean IoU of 0.82 and a mean BF score of 0.96. In addition, the evaluation assessment of the network's performance demonstrated an accurate segmentation of the glottal edges/area even during complex nonstationary phonatory events and when vocal folds were not vibrating, thus overcoming the limitations of the previous hybrid approach that could only be applied to the vibrating vocal folds. CONCLUSIONS The introduced automated scheme guarantees accurate glottis representation in challenging color HSV data with lower image quality and excessive laryngeal maneuvers during all instances of connected speech. This facilitates the future development of HSV-based measures to assess the running vibratory characteristics of the vocal folds in speakers with and without voice disorder. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.19798864.
Collapse
Affiliation(s)
- Ahmed M. Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| | - Dimitar D. Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing
| | - Stephanie R. C. Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ
- Department of Otolaryngology—Head and Neck Surgery, Mayo Clinic, Phoenix, AZ
| | - Alessandro de Alarcon
- Division of Pediatric Otolaryngology, Cincinnati Children's Hospital Medical Center, OH
- Department of Otolaryngology—Head and Neck Surgery, University of Cincinnati, OH
| | - Robert F. Orlikoff
- College of Allied Health Sciences, East Carolina University, Greenville, NC
| | | |
Collapse
|
13
|
Yao P, Usman M, Chen YH, German A, Andreadis K, Mages K, Rameau A. Applications of Artificial Intelligence to Office Laryngoscopy: A Scoping Review. Laryngoscope 2021; 132:1993-2016. [PMID: 34582043 DOI: 10.1002/lary.29886] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 09/15/2021] [Accepted: 09/17/2021] [Indexed: 01/16/2023]
Abstract
OBJECTIVES/HYPOTHESIS This scoping review aims to provide a broad overview of the applications of artificial intelligence (AI) to office laryngoscopy to identify gaps in knowledge and guide future research. STUDY DESIGN Scoping Review. METHODS Searches for studies on AI and office laryngoscopy were conducted in five databases. Title and abstract and then full-text screening were performed. Primary research studies published in English of any date were included. Studies were summarized by: AI applications, targeted conditions, imaging modalities, author affiliations, and dataset characteristics. RESULTS Studies focused on vocal fold vibration analysis (43%), lesion recognition (24%), and vocal fold movement determination (19%). The most frequently automated tasks were recognition of vocal fold nodules (19%), polyp (14%), paralysis (11%), paresis (8%), and cyst (7%). Imaging modalities included high-speed laryngeal videos (45%), stroboscopy (29%), and narrow band imaging endoscopy (7%). The body of literature was primarily authored by science, technology, engineering, and math (STEM) specialists (76%) with only 30 studies (31%) involving co-authorship by STEM specialists and otolaryngologists. Datasets were mostly from single institution (84%) and most commonly originated from Germany (23%), USA (16%), Spain (9%), Italy (8%), and China (8%). Demographic information was only reported in 39 studies (40%), with age and sex being the most commonly reported, whereas race/ethnicity and gender were not reported in any studies. CONCLUSION More interdisciplinary collaboration between STEM and otolaryngology research teams improved demographic reporting especially of race and ethnicity to ensure broad representation, and larger and more geographically diverse datasets will be crucial to future research on AI in office laryngoscopy. LEVEL OF EVIDENCE N/A Laryngoscope, 2021.
Collapse
Affiliation(s)
- Peter Yao
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Moon Usman
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Yu H Chen
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Alexander German
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Katerina Andreadis
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Keith Mages
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| | - Anaïs Rameau
- Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, New York, New York, U.S.A
| |
Collapse
|
14
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
15
|
Patel RR, Sandage MJ, Kluess H, Plexico LW. High-Speed Characterization of Vocal Fold Vibrations in Normally Cycling and Postmenopausal Women: Randomized Double-Blind Analyses. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1869-1888. [PMID: 33971105 PMCID: PMC8740695 DOI: 10.1044/2021_jslhr-20-00706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Purpose The aim of this study was to examine the influence of menstrual cycle phases (follicular, ovulatory, luteal, and ischemic) and hormone levels (estradiol, testosterone, progesterone, and neuropeptide Y) on vocal fold vibrations in reproductive and postmenopausal women. Method Glottal area waveforms were extracted from high-speed videoendoscopy during sustained phonation, inhalation phonation, and voice onset/offset in the reproductive (n = 15) and postmenopausal (n = 13) groups. Linear mixed-model analysis was conducted to evaluate hormone levels and high-speed videoendoscopy outcome variables between the reproductive and postmenopausal groups. In the reproductive group, simple linear regression and multiple regression were conducted to determine the effects of hormones on the dependent variables. Results Group differences between reproductive and postmenopausal women were identified for stiffness index, oscillatory onset time, and oscillatory offset time. Neuropeptide Y hormone in the ischemic phase significantly predicted changes in the reproductive group for some dependent variables; however, the relationship varied for sustained phonation and inhalation phonation. Conclusion These findings provide preliminary evidence that vocal fold vibrations in the reproductive group are different predominantly in the ischemic phase due to neuropeptide Y changes.
Collapse
Affiliation(s)
- Rita R. Patel
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington
| | - Mary J. Sandage
- Department of Speech, Language, and Hearing Sciences, Auburn University, AL
| | | | - Laura W. Plexico
- Department of Speech, Language, and Hearing Sciences, Auburn University, AL
| |
Collapse
|
16
|
Yousef AM, Deliyski DD, Zacharias SRC, de Alarcon A, Orlikoff RF, Naghibolhosseini M. A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech. APPLIED SCIENCES-BASEL 2021; 11. [PMID: 33717604 PMCID: PMC7954580 DOI: 10.3390/app11031179] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Investigating the phonatory processes in connected speech from high-speed videoendoscopy (HSV) demands the accurate detection of the vocal fold edges during vibration. The present paper proposes a new spatio-temporal technique to automatically segment vocal fold edges in HSV data during running speech. The HSV data were recorded from a vocally normal adult during a reading of the “Rainbow Passage.” The introduced technique was based on an unsupervised machine-learning (ML) approach combined with an active contour modeling (ACM) technique (also known as a hybrid approach). The hybrid method was implemented to capture the edges of vocal folds on different HSV kymograms, extracted at various cross-sections of vocal folds during vibration. The k-means clustering method, an ML approach, was first applied to cluster the kymograms to identify the clustered glottal area and consequently provided an initialized contour for the ACM. The ACM algorithm was then used to precisely detect the glottal edges of the vibrating vocal folds. The developed algorithm was able to accurately track the vocal fold edges across frames with low computational cost and high robustness against image noise. This algorithm offers a fully automated tool for analyzing the vibratory features of vocal folds in connected speech.
Collapse
Affiliation(s)
- Ahmed M. Yousef
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
| | - Dimitar D. Deliyski
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
| | - Stephanie R. C. Zacharias
- Head and Neck Regenerative Medicine Program, Mayo Clinic, Scottsdale, AZ 85259, and Department of Otolaryngology-Head and Neck Surgery, Mayo Clinic, Phoenix, AZ 85054, USA
| | - Alessandro de Alarcon
- Division of Pediatric Otolaryngology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229, and Department of Otolaryngology—Head and Neck Surgery, University of Cincinnati College of Medicine, Cincinnati, OH 45267, USA
| | - Robert F. Orlikoff
- College of Allied Health Sciences, East Carolina University, Greenville, NC 27834, USA
| | - Maryam Naghibolhosseini
- Department of Communicative Sciences and Disorders, Michigan State University, East Lansing, MI 48824, USA
- Correspondence: ; Tel.: +1-517-884-2256
| |
Collapse
|
17
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
18
|
Turkmen HI, Karsligil ME, Kocak I. Visible Vessels of Vocal Folds: Can they have a Diagnostic Role? Curr Med Imaging 2020; 15:785-795. [PMID: 32008546 DOI: 10.2174/1573405614666180604083854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Revised: 02/16/2018] [Accepted: 02/21/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND Challenges in visual identification of laryngeal disorders lead researchers to investigate new opportunities to help clinical examination. This paper presents an efficient and simple method which extracts and assesses blood vessels on vocal fold tissue in order to serve medical diagnosis. METHODS The proposed vessel segmentation approach has been designed in order to overcome difficulties raised by design specifications of videolaryngostroboscopy and anatomic structure of vocal fold vasculature. The limited number of medical studies on vocal fold vasculature point out that the direction of blood vessels and amount of vasculature are discriminative features for vocal fold disorders. Therefore, we extracted the features of vessels on the basis of these studies. We represent vessels as vascular vectors and suggest a vector field based measurement that quantifies the orientation pattern of blood vessels towards vocal fold pathologies. RESULTS In order to demonstrate the relationship between vessel structure and vocal fold disorders, we performed classification of vocal fold disorders by using only vessel features. A binary tree of Support Vector Machine (SVM) has been exploited for classification. Average recall of proposed vessel extraction method was calculated as 0.82 while healthy, sulcus vocalis, laryngitis classification accuracy of 0.75 was achieved. CONCLUSION Obtained success rates showed the efficiency of vocal fold vessels in serving as an indicator of laryngeal diseases.
Collapse
Affiliation(s)
- Hafiza Irem Turkmen
- Computer Engineering Department, Faculty of Electrical & Electronics Engineering, Yildiz Technical University, Istanbul, Turkey
| | - Mine Elif Karsligil
- Computer Engineering Department, Faculty of Electrical & Electronics Engineering, Yildiz Technical University, Istanbul, Turkey
| | - Ismail Kocak
- Otorhinolaryngology Department, Faculty of Medicine, Okan University, Istanbul, Turkey
| |
Collapse
|
19
|
Patel RR, Sundberg J, Gill B, Lã FMB. Glottal Airflow and Glottal Area Waveform Characteristics of Flow Phonation in Untrained Vocally Healthy Adults. J Voice 2020; 36:140.e1-140.e21. [PMID: 32868146 DOI: 10.1016/j.jvoice.2020.07.037] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 07/28/2020] [Accepted: 07/30/2020] [Indexed: 11/20/2022]
Abstract
OBJECTIVE To examine flow phonation characteristics with regard to vocal fold vibration and voice source properties in vocally healthy adults using multimodality voice measurements across various phonation types (breathy, neutral, flow, and pressed) and loudness conditions (typical, loud, and soft). PARTICIPANTS AND METHODS Vocal fold vibration, airflow, acoustic, and subglottal pressure was analyzed in 13 untrained voices (six female and seven male). Participants repeated the syllable / pæ:/ using breathy, neutral, flow, and pressed phonation during typical, loud, and soft loudness conditions. Glottal area (GA) waveforms were extracted from high-speed videoendoscopy; glottal flow was derived from inverse filtering the airflow or the audio signal; and subglottal pressure was measured as the intraoral pressure during /p/ occlusion. RESULTS Changes in phonation type and loudness conditions resulted in systematic variations across the relative peak closing velocity derived from the GA waveform for both males and females. Amplitude quotient derived from the flow glottogram varied across phonation types for males. CONCLUSION Multimodality evaluation using the GA waveform and the inverse filtered waveforms revealed a complex pattern that varied as a function of phonation types and loudness conditions across males and females. Emerging findings from this study suggests that future large-scale studies should focus on spatial and temporal features of closing speed and closing duration for differentiating flow phonation from other phonation types in untrained adults with and without voice disorders.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana.
| | - Johan Sundberg
- Division of Speech, Music, and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Brian Gill
- Voice Department, Indiana University, Bloomington, Indiana
| | - Filipa M B Lã
- Department of Didactics, School Organization and Special Didactics, Faculty of Education, The National Distance Education University (UNED), Madrid, Spain
| |
Collapse
|
20
|
Mohd Khairuddin KA, Ahmad K, Mohd Ibrahim H, Yan Y. Description of the Features and Vibratory Behaviors of the Nyquist Plot Analyzed From Laryngeal High-Speed Videoendoscopy Images. J Voice 2020; 36:582.e11-582.e22. [PMID: 32861565 DOI: 10.1016/j.jvoice.2020.07.036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/25/2020] [Accepted: 07/27/2020] [Indexed: 11/17/2022]
Abstract
Facilitative playback-based subjective measures offer a more reliable evaluation of the vocal fold vibration than those derived from direct inspection of video playback. One of the measures is a Nyquist plot, which presents the analyzed cycle-to-cycle vibratory information in a graphical form. While the potential is evident, the information of the features of the Nyquist plot, which the evaluation is based on, is still incomplete. The current identified features and their vibratory behaviors may be inadequate to guarantee accurate interpretation of the findings. The present study aims to address this issue by examining the features of the Nyquist plot and their vibratory behaviors. A total of 56 young normophonic speakers, that is, 20 males and 36 females were recruited as the participants. Each of them underwent laryngeal high-speed videoendoscopy to record the images of the vocal fold vibration, which were then analyzed to generate the Nyquist plots. The features were identified by inspecting the properties of the plot points forming the Nyquist plots. For each identified feature, its vibratory behaviors were examined. The results revealed four features: rim contour depicting the longitudinal phase difference; left edge shape signifying the glottal configuration, phase closure, and closed phase duration; rim width and rim pattern visualizing the regularity of glottal areas and the regularity of the intracycle variations, respectively. The findings present a more complete reference of the features and their vibratory behaviors that is pertinent for the Nyquist plot interpretation.
Collapse
Affiliation(s)
- Khairy Anuar Mohd Khairuddin
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia; Speech Pathology Program, School of Health Sciences, Universiti Sains Malaysia, Kelantan, Malaysia.
| | - Kartini Ahmad
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Hasherah Mohd Ibrahim
- Speech Sciences Program, Centre for Rehabilitation and Special Needs, Faculty of Health Sciences, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia
| | - Yuling Yan
- Department of Bioengineering, School of Engineering, Santa Clara University, California, USA
| |
Collapse
|
21
|
Murtola T, Alku P. Indicators of anterior-posterior phase difference in glottal opening measured from natural production of vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL141. [PMID: 32873022 DOI: 10.1121/10.0001722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2020] [Accepted: 07/22/2020] [Indexed: 06/11/2023]
Abstract
Voiced speech is generated by the glottal flow interacting with vocal fold vibrations. However, the details of vibrations in the anterior-posterior direction (the so-called zipper-effect) and their correspondence with speech and other glottal signals are not fully understood due to challenges in direct measurements of vocal fold vibrations. In this proof-of-concept study, the potential of four parameters extracted from high-speed videoendoscopy (HSV), electroglottography, and speech signals to indicate the presence of a zipper-type glottal opening is investigated. Comparison with manual labeling of the HSV videos highlighted the importance of multiple parameter-signal pairs in indicating the presence of a zipper-type glottal opening.
Collapse
Affiliation(s)
- Tiina Murtola
- Department of Signal Processing and Acoustics, Aalto University, Espoo, ,
| | - Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Espoo, ,
| |
Collapse
|
22
|
Belagali V, Rao M V A, Gopikishore P, Krishnamurthy R, Ghosh PK. Two step convolutional neural network for automatic glottis localization and segmentation in stroboscopic videos. BIOMEDICAL OPTICS EXPRESS 2020; 11:4695-4713. [PMID: 32923072 PMCID: PMC7449707 DOI: 10.1364/boe.396252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2020] [Revised: 07/16/2020] [Accepted: 07/16/2020] [Indexed: 06/11/2023]
Abstract
Precise analysis of the vocal fold vibratory pattern in a stroboscopic video plays a key role in the evaluation of voice disorders. Automatic glottis segmentation is one of the preliminary steps in such analysis. In this work, it is divided into two subproblems namely, glottis localization and glottis segmentation. A two step convolutional neural network (CNN) approach is proposed for the automatic glottis segmentation. Data augmentation is carried out using two techniques : 1) Blind rotation (WB), 2) Rotation with respect to glottis orientation (WO). The dataset used in this study contains stroboscopic videos of 18 subjects with Sulcus vocalis, in which the glottis region is annotated by three speech language pathologists (SLPs). The proposed two step CNN approach achieves an average localization accuracy of 90.08% and a mean dice score of 0.65.
Collapse
Affiliation(s)
- Varun Belagali
- Computer Science and Engineering, RV College of Engineering, Bangalore 560059, India
| | - Achuth Rao M V
- Electrical Engineering, Indian Institute of Science, Bangalore 560012, India
| | | | - Rahul Krishnamurthy
- Department of Audiology and Speech Language Pathology, Kasturba Medical College, Mangalore, Manipal Academy of Higher Education, Manipal, India
| | | |
Collapse
|
23
|
Schlegel P, Kniesburges S, Dürr S, Schützenberger A, Döllinger M. Machine learning based identification of relevant parameters for functional voice disorders derived from endoscopic high-speed recordings. Sci Rep 2020; 10:10517. [PMID: 32601277 PMCID: PMC7324600 DOI: 10.1038/s41598-020-66405-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 05/20/2020] [Indexed: 11/13/2022] Open
Abstract
In voice research and clinical assessment, many objective parameters are in use. However, there is no commonly used set of parameters that reflect certain voice disorders, such as functional dysphonia (FD); i.e. disorders with no visible anatomical changes. Hence, 358 high-speed videoendoscopy (HSV) recordings (159 normal females (NF), 101 FD females (FDF), 66 normal males (NM), 32 FD males (FDM)) were analyzed. We investigated 91 quantitative HSV parameters towards their significance. First, 25 highly correlated parameters were discarded. Second, further 54 parameters were discarded by using a LogitBoost decision stumps approach. This yielded a subset of 12 parameters sufficient to reflect functional dysphonia. These parameters separated groups NF vs. FDF and NM vs. FDM with fair accuracy of 0.745 or 0.768, respectively. Parameters solely computed from the changing glottal area waveform (1D-function called GAW) between the vocal folds were less important than parameters describing the oscillation characteristics along the vocal folds (2D-function called Phonovibrogram). Regularity of GAW phases and peak shape, harmonic structure and Phonovibrogram-based vocal fold open and closing angles were mainly important. This study showed the high degree of redundancy of HSV-voice-parameters but also affirms the need of multidimensional based assessment of clinical data.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany.
| | - Stefan Kniesburges
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Stephan Dürr
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
24
|
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S, Echternach M, Johnson AM, Kniesburges S, Kunduk M, Maryn Y, Schützenberger A, Verguts M, Döllinger M. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci Data 2020; 7:186. [PMID: 32561845 PMCID: PMC7305104 DOI: 10.1038/s41597-020-0526-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 05/15/2020] [Indexed: 02/06/2023] Open
Abstract
Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USA, we have created BAGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. The BAGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Aaron M Johnson
- NYU Voice Center, Department of Otolaryngology - Head and Neck Surgery, New York University School of Medicine, New York, New York, USA
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Youri Maryn
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Speech, Language and Hearing sciences, University of Ghent, Ghent, Belgium
- Faculty of Education, Health and Social Work, University College Ghent, Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Monique Verguts
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| |
Collapse
|
25
|
Abstract
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
Collapse
|
26
|
Fehling MK, Grosch F, Schuster ME, Schick B, Lohscheller J. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS One 2020; 15:e0227791. [PMID: 32040514 PMCID: PMC7010264 DOI: 10.1371/journal.pone.0227791] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 12/25/2019] [Indexed: 01/22/2023] Open
Abstract
The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.
Collapse
Affiliation(s)
- Mona Kirstin Fehling
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Fabian Grosch
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Maria Elke Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, München, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| |
Collapse
|
27
|
Passive Upper Airway Thermoregulation and High-Speed Assessment for Conventional versus Menthol Cigarette: Implications for Laryngeal Physiology. J Voice 2020; 34:25-32. [DOI: 10.1016/j.jvoice.2018.07.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 07/24/2018] [Accepted: 07/25/2018] [Indexed: 11/23/2022]
|
28
|
Drioli C, Foresti GL. Fitting a biomechanical model of the folds to high-speed video data through bayesian estimation. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
29
|
Jeffrey Kuo CF, Li YC, Weng WH, Pinos Leon KB, Chu YH. Applied image processing techniques in video laryngoscope for occult tumor detection. Biomed Signal Process Control 2020. [DOI: 10.1016/j.bspc.2019.101633] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
30
|
Maryn Y, Verguts M, Demarsin H, van Dinther J, Gomez P, Schlegel P, Döllinger M. Intersegmenter Variability in High-Speed Laryngoscopy-Based Glottal Area Waveform Measures. Laryngoscope 2019; 130:E654-E661. [PMID: 31840827 DOI: 10.1002/lary.28475] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/26/2019] [Indexed: 12/31/2022]
Abstract
OBJECTIVES/HYPOTHESIS High-speed videoendoscopy (HSV) has potential to objectively quantify vibratory vocal fold characteristics during phonation. Glottal Analysis Tools (GAT) version 2018, developed in Erlangen, Germany, is software for determining various glottal area waveform (GAW) quantities. Before having GAT analyze HSV videos, segmenters have to define glottis manually across videos in a semiautomatic segmentation protocol. Such interventions are hypothesized to induce variability of subsequent GAW measure computation across segmenters and may attenuate GAT measures' reliability to a certain point. This study explored intersegmenter variability in GAT's GAW measures based on semiautomatic image processing. STUDY DESIGN Cohort study of rater reliability. METHODS In total, 20 HSV videos from normophonic and dysphonic subjects with various laryngeal disorders were selected for this study and segmented by three trained segmenters. They separately segmented glottis areas in the same frame sets of the videos. Upon analysis of GAW, GAT offers 46 measures related to topologic GAW dynamic characteristics, GAW periodicity and perturbation characteristics, and GAW harmonic components. To address GAT's reliability, intersegmenter-based variability in these measures was examined with intraclass correlation coefficient (ICC). RESULTS In general, ICC behavior of the 46 GAW measures across three raters was highly acceptable. ICC of one parameter was moderate (0.5 < ICC < 0.75), good for seven parameters (0.75 < ICC < 0.9), and excellent for 38 parameters (0.9 < ICC). CONCLUSIONS Overall, high ICC values confirm clinical applicability of GAT for objective and quantitative assessment of HSV. Small intersegmenter differences with actual small parameter differences suggest that manual or semiautomatic segmentation in GAT does not noticeably influence clinical assessment outcome. To guarantee the software's performance, we suggest segmentation training before clinical application. LEVEL OF EVIDENCE 2b Laryngoscope, 130:E654-E661, 2020.
Collapse
Affiliation(s)
- Youri Maryn
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium.,Department of Speech, Language, and Hearing Sciences, University of Ghent, Ghent, Belgium.,Faculty of Education, Health, and Social Work, University College of Ghent, Ghent, Belgium.,Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium.,Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium.,Phonanium, Lokeren, Belgium
| | - Monique Verguts
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium.,Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Hannelore Demarsin
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium
| | - Joost van Dinther
- Department of Otorhinolaryngology-Head and Neck Surgery, European Institute for Otorhinolaryngology-Head and Neck Surgery, GasthuisZusters Antwerpen Sint-Augustinus, Wilrijk/Antwerp, Belgium
| | - Pablo Gomez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
31
|
Alku P, Murtola T, Malinen J, Geneid A, Vilkman E. Skewing of the glottal flow with respect to the glottal area measured in natural production of vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2501. [PMID: 31671985 DOI: 10.1121/1.5129121] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2019] [Accepted: 09/24/2019] [Indexed: 06/10/2023]
Abstract
In the production of voiced speech, glottal flow skewing refers to the tilting of the glottal flow pulses to the right, often characterized as a delay of the peak, compared to the glottal area. In the past four decades, several studies have addressed this phenomenon using modeling of voice production with analog circuits and computer simulations. However, previous studies measuring flow skewing in natural production of speech are sparse and they contain little quantitative data about the degree of skewing between flow and area. In the current study, flow skewing was measured from the natural production of 40 vowel utterances produced by 10 speakers. Glottal flow was measured from speech using glottal inverse filtering and glottal area was captured with high-speed videoendoscopy. The estimated glottal flow and area waveforms were parameterized with four robust parameters that measure pulse skewness quantitatively. Statistical tests obtained for all four parameters showed that the flow pulse was significantly more skewed to the right than the area pulse. Hence, this study corroborates the existence of flow skewing using measurements from natural speech production. In addition, the study yields quantitative data about pulse skewness in simultaneous measured glottal flow and area in natural production of speech.
Collapse
Affiliation(s)
- Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, Espoo, FI-00076, Finland
| | - Tiina Murtola
- Department of Signal Processing and Acoustics, Aalto University, Espoo, FI-00076, Finland
| | - Jarmo Malinen
- Department of Mathematics and Systems Analysis, Aalto University, Espoo, FI-00076, Finland
| | - Ahmed Geneid
- Department of Otorhinolaryngology and Phoniatrics-Head and Neck Surgery, Helsinki University Hospital and University of Helsinki, Helsinki, FI-00240, Finland
| | - Erkki Vilkman
- Department of Otorhinolaryngology and Phoniatrics-Head and Neck Surgery, Helsinki University Hospital and University of Helsinki, Helsinki, FI-00240, Finland
| |
Collapse
|
32
|
Turkmen HI, Karsligil ME. Advanced computing solutions for analysis of laryngeal disorders. Med Biol Eng Comput 2019; 57:2535-2552. [DOI: 10.1007/s11517-019-02031-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Accepted: 08/13/2019] [Indexed: 11/29/2022]
|
33
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256#suppl] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
34
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1492. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 08/07/2019] [Accepted: 08/09/2019] [Indexed: 06/10/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
35
|
Diaz-Cadiz M, McKenna VS, Vojtech JM, Stepp CE. Adductory Vocal Fold Kinematic Trajectories During Conventional Versus High-Speed Videoendoscopy. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:1685-1706. [PMID: 31181175 PMCID: PMC6808372 DOI: 10.1044/2019_jslhr-s-18-0405] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Objective Prephonatory vocal fold angle trajectories may supply useful information about the laryngeal system but were examined in previous studies using sigmoidal curves fit to data collected at 30 frames per second (fps). Here, high-speed videoendoscopy (HSV) was used to investigate the impacts of video frame rate and sigmoidal fitting strategy on vocal fold adductory patterns for voicing onsets. Method Twenty-five participants with healthy voices performed /ifi/ sequences under flexible nasendoscopy at 1,000 fps. Glottic angles were extracted during adduction for voicing onset; resulting vocal fold trajectories (i.e., changes in glottic angle over time) were down-sampled to simulate different frame rate conditions (30-1,000 fps). Vocal fold adduction data were fit with asymmetric sigmoids using 5 fitting strategies with varying parameter restrictions. Adduction trajectories and maximum adduction velocities were compared between the fits and the actual HSV data. Adduction trajectory errors between HSV data and fits were evaluated using root-mean-square error and maximum angular velocity error. Results Simulated data were generally well fit by sigmoid models; however, when compared to the actual 1,000-fps data, sigmoid fits were found to overestimate maximum angle velocities. Errors decreased as frame rate increased, reaching a plateau by 120 fps. Conclusion In healthy adults, vocal fold kinematic behavior during adduction is generally sigmoidal, although such fits can produce substantial errors when data are acquired at frame rates lower than 120 fps.
Collapse
Affiliation(s)
- Manuel Diaz-Cadiz
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
| | | | - Jennifer M. Vojtech
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
| | - Cara E. Stepp
- Department of Speech, Language, and Hearing Sciences, Boston University, MA
- Department of Biomedical Engineering, Boston University, MA
- Department of Otolaryngology–Head and Neck Surgery, Boston University School of Medicine, MA
| |
Collapse
|
36
|
Influence of spatial camera resolution in high-speed videoendoscopy on laryngeal parameters. PLoS One 2019; 14:e0215168. [PMID: 31009488 PMCID: PMC6476512 DOI: 10.1371/journal.pone.0215168] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 03/27/2019] [Indexed: 11/19/2022] Open
Abstract
In laryngeal high-speed videoendoscopy (HSV) the area between the vibrating vocal folds during phonation is of interest, being referred to as glottal area waveform (GAW). Varying camera resolution may influence parameters computed on the GAW and hence hinder the comparability between examinations. This study investigates the influence of spatial camera resolution on quantitative vocal fold vibratory function parameters obtained from the GAW. In total 40 HSV recordings during sustained phonation (20 healthy males and 20 healthy females) were investigated. A clinically used Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512×256 pixels was applied. This initial resolution was reduced by pixel averaging to (1) a resolution of 256×128 and (2) to a resolution of 128×64 pixels, yielding three sets of recordings. The GAW was extracted and in total 50 vocal fold vibratory parameters representing different features of the GAW were computed. Statistical analyses using SPSS Statistics, version 21, was performed. 15 Parameters showing strong mathematical dependencies with other parameters were excluded from the main analysis but are given in the Supporting Information. Data analysis revealed clear influence of spatial resolution on GAW parameters. Fundamental period measures and period perturbation measures were the least affected. Amplitude perturbation measures and mechanical measures were most strongly influenced. Most glottal dynamic characteristics and symmetry measures deviated significantly. Most energy perturbation measures changed significantly in males but were mostly unaffected in females. In females 18 of 35 remaining parameters (51%) and in males 22 parameters (63%) changed significantly between spatial resolutions. This work represents the first step in studying the impact of video resolution on quantitative HSV parameters. Clear influences of spatial camera resolution on computed parameters were found. The study results suggest avoiding the use of the most strongly affected parameters. Further, the use of cameras with high resolution is recommended to analyze GAW measures in HSV data.
Collapse
|
37
|
Lin J, Walsted ES, Backer V, Hull JH, Elson DS. Quantification and Analysis of Laryngeal Closure From Endoscopic Videos. IEEE Trans Biomed Eng 2019; 66:1127-1136. [DOI: 10.1109/tbme.2018.2867636] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
38
|
Gómez P, Semmler M, Schützenberger A, Bohr C, Döllinger M. Low-light image enhancement of high-speed endoscopic videos using a convolutional neural network. Med Biol Eng Comput 2019; 57:1451-1463. [DOI: 10.1007/s11517-019-01965-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Accepted: 02/20/2019] [Indexed: 12/31/2022]
|
39
|
Bilal N, Selcuk T, Sarica S, Alkan A, Orhan İ, Doganer A, Sagiroglu S, Kılıc MA. Voice Acoustic Analysis of Pediatric Vocal Nodule Patients Using Ratios Calculated With Biomedical Image Segmentation. J Voice 2019; 33:195-203. [DOI: 10.1016/j.jvoice.2017.11.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2017] [Revised: 11/16/2017] [Accepted: 11/16/2017] [Indexed: 10/18/2022]
|
40
|
Gómez P, Schützenberger A, Kniesburges S, Bohr C, Döllinger M. Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework. Biomech Model Mechanobiol 2017; 17:777-792. [DOI: 10.1007/s10237-017-0992-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/30/2017] [Indexed: 11/28/2022]
|
41
|
Arbeiter M, Petermann S, Hoppe U, Bohr C, Doellinger M, Ziethe A. Analysis of the Auditory Feedback and Phonation in Normal Voices. Ann Otol Rhinol Laryngol 2017; 127:89-98. [DOI: 10.1177/0003489417744567] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Mareike Arbeiter
- Department of Phoniatrics and Pediatric Audiology, ENT clinic, University hospital Erlangen, Medical school, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Simon Petermann
- Department of Phoniatrics and Pediatric Audiology, ENT clinic, University hospital Erlangen, Medical school, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Ulrich Hoppe
- Department of Audiology, ENT clinic, University hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Christopher Bohr
- Department of Phoniatrics and Pediatric Audiology, ENT clinic, University hospital Erlangen, Medical school, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Michael Doellinger
- Department of Phoniatrics and Pediatric Audiology, ENT clinic, University hospital Erlangen, Medical school, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| | - Anke Ziethe
- Department of Phoniatrics and Pediatric Audiology, ENT clinic, University hospital Erlangen, Medical school, Friedrich-Alexander-University Erlangen-Nürnberg, Germany
| |
Collapse
|
42
|
Herbst CT, Hampala V, Garcia M, Hofer R, Svec JG. Hemi-laryngeal Setup for Studying Vocal Fold Vibration in Three Dimensions. J Vis Exp 2017. [PMID: 29286438 DOI: 10.3791/55303] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
The voice of humans and most non-human mammals is generated in the larynx through self-sustaining oscillation of the vocal folds. Direct visual documentation of vocal fold vibration is challenging, particularly in non-human mammals. As an alternative, excised larynx experiments provide the opportunity to investigate vocal fold vibration under controlled physiological and physical conditions. However, the use of a full larynx merely provides a top view of the vocal folds, excluding crucial portions of the oscillating structures from observation during their interaction with aerodynamic forces. This limitation can be overcome by utilizing a hemi-larynx setup where one half of the larynx is mid-sagittally removed, providing both a superior and a lateral view of the remaining vocal fold during self-sustained oscillation. Here, a step-by-step guide for the anatomical preparation of hemi-laryngeal structures and their mounting on the laboratory bench is given. Exemplary phonation of the hemi-larynx preparation is documented with high-speed video data captured by two synchronized cameras (superior and lateral views), showing three-dimensional vocal fold motion and corresponding time-varying contact area. The documentation of the hemi-larynx setup in this publication will facilitate application and reliable repeatability in experimental research, providing voice scientists with the potential to better understand the biomechanics of voice production.
Collapse
Affiliation(s)
- Christian T Herbst
- Voice Research Lab, Department of Biophysics, Faculty of Science, Palacky University Olomouc; Laboratory of Bio-Acoustics, Dept. of Cognitive Biology, University of Vienna;
| | - Vit Hampala
- Voice Research Lab, Department of Biophysics, Faculty of Science, Palacky University Olomouc
| | - Maxime Garcia
- Laboratory of Bio-Acoustics, Dept. of Cognitive Biology, University of Vienna; ENES Lab, NEURO-PSI,CNRS UMR 9197, Université Lyon/Saint-Etienne, France
| | - Riccardo Hofer
- Laboratory of Bio-Acoustics, Dept. of Cognitive Biology, University of Vienna
| | - Jan G Svec
- Voice Research Lab, Department of Biophysics, Faculty of Science, Palacky University Olomouc
| |
Collapse
|
43
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
44
|
Voice-Vibratory Assessment With Laryngeal Imaging (VALI) Form: Reliability of Rating Stroboscopy and High-speed Videoendoscopy. J Voice 2017; 31:513.e1-513.e14. [DOI: 10.1016/j.jvoice.2016.12.003] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 11/29/2016] [Accepted: 12/02/2016] [Indexed: 11/19/2022]
|
45
|
High-speed Videolaryngoscopy: Quantitative Parameters of Glottal Area Waveforms and High-speed Kymography in Healthy Individuals. J Voice 2017; 31:282-290. [DOI: 10.1016/j.jvoice.2016.09.026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/22/2016] [Accepted: 09/23/2016] [Indexed: 11/21/2022]
|
46
|
Andrade-Miranda G, Henrich Bernardoni N, Godino-Llorente JI. Synthesizing the motion of the vocal folds using optical flow based techniques. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.01.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
47
|
Volgger V, Felicio A, Lohscheller J, Englhard AS, Al-Muzaini H, Betz CS, Schuster ME. Evaluation of the combined use of narrow band imaging and high-speed imaging to discriminate laryngeal lesions. Lasers Surg Med 2017; 49:609-618. [PMID: 28231400 DOI: 10.1002/lsm.22652] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2017] [Indexed: 02/05/2023]
Abstract
BACKGROUND AND OBJECTIVE Laryngeal lesions are usually investigated by microlaryngoscopy, biopsy, and histopathology. This study aimed to evaluate the combined use of Narrow Band Imaging (NBI) and High-Speed Imaging (HSI) in the differentiation of glottic lesions in awake patients. STUDY DESIGN Prospective diagnostic study. MATERIALS AND METHODS Thirty-six awake patients with 41 glottic lesions were investigated with both NBI and HSI, and the suspected diagnoses were compared to the histopathological results of tissue biopsies taken during subsequent microlaryngoscopies. Of the 41 lesions, 28 were primary lesions and 13 recurrent lesions after previous laryngeal pathologies. RESULTS Sensitivity, specificity, positive predictive value, and negative predictive value in the differentiation between benign/premalignant and malignant lesions with both NBI and HSI accounted to 100.0%, 79.4%, 50.0%, and 100.0%. Sensitivities and specificities were 100.0% and 85.7% for HSI alone, and 100.0% and 79.4% for NBI alone. Regarding only primary lesions the results were generally better with sensitivities and specificities of 100% and 81% for NBI, 100% and 84.2% for HSI and 100% and 85.7% for the combination of both methods, respectively. CONCLUSION NBI and HSI both seem to be promising adjunct tools in the differentiation of various laryngeal lesions in awake patients with high sensitivities. Specificities, however, were moderate but could be increased when using NBI and HSI in combination in a subgroup of patients with only primary lesions. Although both methods still have limitations they might ameliorate the evaluation of suspicious laryngeal lesions in the future and could possibly spare patients from repeated invasive tissue biopsies. Lasers Surg. Med. 49:609-618, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Veronika Volgger
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Axelle Felicio
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Jörg Lohscheller
- Department of Informatics, Trier University of Applied Sciences, Schneidershof, 54208, Trier, Germany
| | - Anna S Englhard
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Hanan Al-Muzaini
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Christian S Betz
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Maria E Schuster
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| |
Collapse
|
48
|
Oscillatory Onset and Offset in Young Vocally Healthy Adults Across Various Measurement Methods. J Voice 2017; 31:512.e17-512.e24. [PMID: 28169095 DOI: 10.1016/j.jvoice.2016.12.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Revised: 12/01/2016] [Accepted: 12/02/2016] [Indexed: 11/20/2022]
Abstract
OBJECTIVE This study aimed to investigate the relationship between (1) oscillatory onset-offset time across various approaches that use different measurement criteria and (2) oscillatory onset and offset times in vocally healthy young adults. METHOD Oscillatory onset-offset times were obtained from 71 vocally normal adults, using high-speed videoendoscopy. Comparisons between the different onset methods involved measurement of the oscillatory onset time (OOT), voice initiation period (VIP), and the phonation onset time (POT), and for offset methods involved computation of the oscillatory offset time (OOToff) and the phonation offset time. RESULTS Correlation of the OOT with the VIP was 0.240 (P = 0.04) and with the POT form glottal area waveform was 0.248 (P = 0.04); however, correlation between the VIP and the POT glottal area waveform was 0.661 (P < 0.001). For offset, there was a moderate correlation (rS = 0.503, P < 0.001) across OOToff and vocal offset period. The onset time was longest for the OOT followed by the VIP and the POT. There was no correlation between onset and offset for all methods. CONCLUSIONS A framework for quantification of oscillatory onset-offset time was developed for /hi/ tasks, which can be used for future measurements of disordered voice. A positive relationship was observed between VIP and POT and between OOToff and vocal offset period. There was a nonlinear relationship between the OOT, VIP, and POT measures. Onset-offset times are strongly influenced by the calculation method used, the pros and cons of which are discussed in this paper. Vibratory onset and offset represent physiologically different phenomena.
Collapse
|
49
|
Aichinger P, Roesner I, Leonhard M, Schneider-Stickler B, Denk-Linnert DM, Bigenzahn W, Fuchs AK, Hagmüller M, Kubin G. Comparison of an audio-based and a video-based approach for detecting diplophonia. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2014.10.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
50
|
Laryngeal High-Speed Videoendoscopy: Sensitivity of Objective Parameters towards Recording Frame Rate. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4575437. [PMID: 27990428 PMCID: PMC5136634 DOI: 10.1155/2016/4575437] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/06/2016] [Accepted: 10/10/2016] [Indexed: 11/29/2022]
Abstract
The current use of laryngeal high-speed videoendoscopy in clinic settings involves subjective visual assessment of vocal fold vibratory characteristics. However, objective quantification of vocal fold vibrations for evidence-based diagnosis and therapy is desired, and objective parameters assessing laryngeal dynamics have therefore been suggested. This study investigated the sensitivity of the objective parameters and their dependence on recording frame rate. A total of 300 endoscopic high-speed videos with recording frame rates between 1000 and 15 000 fps were analyzed for a vocally healthy female subject during sustained phonation. Twenty parameters, representing laryngeal dynamics, were computed. Four different parameter characteristics were found: parameters showing no change with increasing frame rate; parameters changing up to a certain frame rate, but then remaining constant; parameters remaining constant within a particular range of recording frame rates; and parameters changing with nearly every frame rate. The results suggest that (1) parameter values are influenced by recording frame rates and different parameters have varying sensitivities to recording frame rate; (2) normative values should be determined based on recording frame rates; and (3) the typically used recording frame rate of 4000 fps seems to be too low to distinguish accurately certain characteristics of the human phonation process in detail.
Collapse
|