1
|
Dadras AA, Aichinger P. Deep Learning-Based Detection of Glottis Segmentation Failures. Bioengineering (Basel) 2024; 11:443. [PMID: 38790311 PMCID: PMC11118004 DOI: 10.3390/bioengineering11050443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Revised: 04/23/2024] [Accepted: 04/26/2024] [Indexed: 05/26/2024] Open
Abstract
Medical image segmentation is crucial for clinical applications, but challenges persist due to noise and variability. In particular, accurate glottis segmentation from high-speed videos is vital for voice research and diagnostics. Manual searching for failed segmentations is labor-intensive, prompting interest in automated methods. This paper proposes the first deep learning approach for detecting faulty glottis segmentations. For this purpose, faulty segmentations are generated by applying both a poorly performing neural network and perturbation procedures to three public datasets. Heavy data augmentations are added to the input until the neural network's performance decreases to the desired mean intersection over union (IoU). Likewise, the perturbation procedure involves a series of image transformations to the original ground truth segmentations in a randomized manner. These data are then used to train a ResNet18 neural network with custom loss functions to predict the IoU scores of faulty segmentations. This value is then thresholded with a fixed IoU of 0.6 for classification, thereby achieving 88.27% classification accuracy with 91.54% specificity. Experimental results demonstrate the effectiveness of the presented approach. Contributions include: (i) a knowledge-driven perturbation procedure, (ii) a deep learning framework for scoring and detecting faulty glottis segmentations, and (iii) an evaluation of custom loss functions.
Collapse
Affiliation(s)
| | - Philipp Aichinger
- Speech and Hearing Science Lab, Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Medical University of Vienna, Währinger Gürtel 18-20, 1090 Vienna, Austria;
| |
Collapse
|
2
|
Näger C, Kniesburges S, Tur B, Schoder S, Becker S. An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering (Basel) 2023; 10:1343. [PMID: 38135934 PMCID: PMC10740801 DOI: 10.3390/bioengineering10121343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/12/2023] [Accepted: 11/19/2023] [Indexed: 12/24/2023] Open
Abstract
In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle image velocimetry (PIV) in a synthetic larynx model for several configurations with different vocal tract lengths. Based on the obtained velocity fields, acoustic source terms were computed. Additionally, the sound radiation into the far field was recorded via microphone measurements and the vocal fold oscillation via high-speed camera recordings. The PIV measurements revealed that near a vocal tract resonance frequency fR, the vocal fold oscillation frequency fo (and therefore also the flow field's fundamental frequency) jumps onto fR. This is accompanied by a substantial relative increase in aeroacoustic sound generation efficiency. Furthermore, the measurements show that fo-fR-coupling increases vocal efficiency, signal-to-noise ratio, harmonics-to-noise ratio and cepstral peak prominence. At the same time, the glottal volume flow needed for stable vocal fold oscillation decreases strongly. All of this results in an improved voice quality and phonation efficiency so that a person phonating with fo-fR-coupling can phonate longer and with better voice quality.
Collapse
Affiliation(s)
- Christoph Näger
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Bogac Tur
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Stefan Schoder
- Aeroacoustics and Vibroacoustics Group, Institute of Fundamentals and Theory in Electrical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Austria;
| | - Stefan Becker
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| |
Collapse
|
3
|
Tur B, Gühring L, Wendler O, Schlicht S, Drummer D, Kniesburges S. Effect of Ligament Fibers on Dynamics of Synthetic, Self-Oscillating Vocal Folds in a Biomimetic Larynx Model. Bioengineering (Basel) 2023; 10:1130. [PMID: 37892860 PMCID: PMC10604794 DOI: 10.3390/bioengineering10101130] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 09/13/2023] [Accepted: 09/25/2023] [Indexed: 10/29/2023] Open
Abstract
Synthetic silicone larynx models are essential for understanding the biomechanics of physiological and pathological vocal fold vibrations. The aim of this study is to investigate the effects of artificial ligament fibers on vocal fold vibrations in a synthetic larynx model, which is capable of replicating physiological laryngeal functions such as elongation, abduction, and adduction. A multi-layer silicone model with different mechanical properties for the musculus vocalis and the lamina propria consisting of ligament and mucosa was used. Ligament fibers of various diameters and break resistances were cast into the vocal folds and tested at different tension levels. An electromechanical setup was developed to mimic laryngeal physiology. The measurements included high-speed video recordings of vocal fold vibrations, subglottal pressure and acoustic. For the evaluation of the vibration characteristics, all measured values were evaluated and compared with parameters from ex and in vivo studies. The fundamental frequency of the synthetic larynx model was found to be approximately 200-520 Hz depending on integrated fiber types and tension levels. This range of the fundamental frequency corresponds to the reproduction of a female normal and singing voice range. The investigated voice parameters from vocal fold vibration, acoustics, and subglottal pressure were within normal value ranges from ex and in vivo studies. The integration of ligament fibers leads to an increase in the fundamental frequency with increasing airflow, while the tensioning of the ligament fibers remains constant. In addition, a tension increase in the fibers also generates a rise in the fundamental frequency delivering the physiological expectation of the dynamic behavior of vocal folds.
Collapse
Affiliation(s)
- Bogac Tur
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Lucia Gühring
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Olaf Wendler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Samuel Schlicht
- Institute of Polymer Technology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Am Weichselgarten 10, 91058 Erlangen, Germany
| | - Dietmar Drummer
- Institute of Polymer Technology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Am Weichselgarten 10, 91058 Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| |
Collapse
|
4
|
Semmler M, Lasar S, Kremer F, Reinwald L, Wittig F, Peters G, Schraut T, Wendler O, Seyferth S, Schützenberger A, Dürr S. Extent and Effect of Covering Laryngeal Structures with Synthetic Laryngeal Mucus via Two Different Administration Techniques. J Voice 2023:S0892-1997(23)00228-X. [PMID: 37648625 DOI: 10.1016/j.jvoice.2023.07.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/20/2023] [Accepted: 07/21/2023] [Indexed: 09/01/2023]
Abstract
OBJECTIVE The first goal of this study was to investigate the coverage of laryngeal structures using two potential administration techniques for synthetic mucus: inhalation and lozenge ingestion. As a second research question, the study investigated the potential effects of these techniques on standardized voice assessment parameters. METHODS Fluorescein was added to throat lozenges and to an inhalation solution to visualize the coverage of laryngeal structures through blue light imaging. The study included 70 vocally healthy subjects. Fifty subjects underwent administration via lozenge ingestion and 20 subjects performed the inhalation process. For the first research question, the recordings from the blue light imaging system were categorized to compare the extent of coverage on individual laryngeal structures objectively. Secondly, a standardized voice evaluation protocol was performed before and after each administration to determine any measurable effects of typical voice parameters. RESULTS The administration via inhalation demonstrated complete coverage of all laryngeal structures, including the vocal folds, ventricular folds, and arytenoid cartilages, as visualized by the fluorescent dye. In contrast, the application of the lozenge predominantly covered the pharynx and laryngeal surface toward the aryepiglottic fold, but not the inferior structures. All in all, the comparison before and after administration showed no clear effect, although a minor deterioration of the acoustic signal was noted in the shimmer and cepstral peak prominence after the inhalation. CONCLUSIONS Our findings indicate that the inhalation process is a more effective technique for covering deeper laryngeal structures such as the vocal folds and ventricular folds with synthetic mucus. This knowledge enables further in vivo studies on the role of laryngeal mucus in phonation in general, and how it can be substituted or supplemented for patients with reduced glandular activity as well as for heavy voice users.
Collapse
Affiliation(s)
- Marion Semmler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Sarina Lasar
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Franziska Kremer
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Laura Reinwald
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Fiori Wittig
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Gregor Peters
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Tobias Schraut
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Olaf Wendler
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stefan Seyferth
- Department of Chemistry and Pharmacy, Chair of Pharmaceutics, Friedrich-Alexander-University Erlangen-Nürnberg, Cauerstr. 4, 91058 Erlangen, Germany.
| | - Anne Schützenberger
- University Hospital Erlangen, Medical School, Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head & Neck Surgery, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany.
| | - Stephan Dürr
- University Hospital Regensburg, Department of Otorhinolaryngology, Division of Phoniatrics and Pediatric Audiology, Franz-Josef-Strauß-Allee 11, 93053 Regensburg, Germany.
| |
Collapse
|
5
|
Pelka F, Ensthaler M, Wendler O, Kniesburges S, Schützenberger A, Semmler M. Mechanical Parameters Based on High-Speed Videoendoscopy of the Vocal Folds in Patients With Ectodermal Dysplasia. J Voice 2023:S0892-1997(23)00084-X. [PMID: 36973131 DOI: 10.1016/j.jvoice.2023.02.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/21/2023] [Accepted: 02/21/2023] [Indexed: 03/29/2023]
Abstract
OBJECTIVE Patients suffering from ectodermal dysplasia (ED), which is an inherited disorder in the development of the ectodermal structures, have a significantly reduced expression of teeth, hair, sweat glands, and salivary glands in the respiratory tract including the larynx. Previous studies within the framework of the present project showed a significantly reduced saliva production and an impairment of the acoustic outcome in ED patients compared to the control group. However, until now, no statistically significant difference between EDs and controls could be found regarding vocal fold dynamics in the high-speed videoendoscopy (HSV) recordings using representative parameters on closure, symmetry, and periodicity. The aim of this study is to examine the role of tissue characteristics by means of objective mechanical parameters derived from HSV recordings. METHODS This study includes 28 ED patients and 42 controls (no ED, healthy voice). The vocal fold oscillations were recorded by high-speed videoendoscopy (HSV@4kHz). Based on the dynamical measures of the glottal area waveform (GAW), objective glottal dynamic parameters associated with tissue properties like flexibility and stiffness were computed. RESULTS The present evaluation displays a significant difference between male ED patients and male controls concerning the HSV-based mechanical parameters indicating reduced stiffness and increased deformability for the vocal folds of male ED patients. In contrast to strongly amplitude-dependent parameters, the primarily velocity-based parameters showed no statistically significant deviation. CONCLUSIONS The presented data provides the first promising indication toward the underlying causes on the laryngeal level leading to the voice conspicuities in ED patients. The significant difference concerning the mechanical parameters suggests a different composition of the extracellular matrix of the tissue of the vocal folds of ED patients compared to controls.
Collapse
Affiliation(s)
- Franziska Pelka
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Maria Ensthaler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Olaf Wendler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology Head & Neck Surgery, University Hospital Erlangen, Medical School, Erlangen, Germany.
| |
Collapse
|
6
|
Peters G, Jakubaß B, Weidenfeller K, Kniesburges S, Böhringer D, Wendler O, Mueller SK, Gostian AO, Berry DA, Döllinger M, Semmler M. Synthetic mucus for an ex vivo phonation setup: Creation, application, and effect on excised porcine larynges. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3245. [PMID: 36586828 PMCID: PMC9729017 DOI: 10.1121/10.0015364] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 09/23/2022] [Accepted: 11/06/2022] [Indexed: 06/17/2023]
Abstract
Laryngeal mucus hydrates and lubricates the deformable tissue of the vocal folds and acts as a boundary layer with the airflow from the lungs. However, the effects of the mucus' viscoelasticity on phonation remain widely unknown and mucus has not yet been established in experimental procedures of voice research. In this study, four synthetic mucus samples were created on the basis of xanthan with focus on physiological frequency-dependent viscoelastic properties, which cover viscosities and elasticities over 2 orders of magnitude. An established ex vivo experimental setup was expanded by a reproducible and controllable application method of synthetic mucus. The application method and the suitability of the synthetic mucus samples were successfully verified by fluorescence evidence on the vocal folds even after oscillation experiments. Subsequently, the impact of mucus viscoelasticity on the oscillatory dynamics of the vocal folds, the subglottal pressure, and acoustic signal was investigated with 24 porcine larynges (2304 datasets). Despite the large differences of viscoelasticity, the phonatory characteristics remained stable with only minor statistically significant differences. Overall, this study increased the level of realism in the experimental setup for replication of the phonatory process enabling further research on pathological mucus and exploration of therapeutic options.
Collapse
Affiliation(s)
- Gregor Peters
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Bernhard Jakubaß
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Katrin Weidenfeller
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - David Böhringer
- Biophysics Group, Department of Physics, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91052 Erlangen, Germany
| | - Olaf Wendler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Sarina K Mueller
- Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Antoniu-Oreste Gostian
- Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at University of California Los Angeles, Los Angeles, California 90024, USA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
7
|
Döllinger M, Schraut T, Henrich LA, Chhetri D, Echternach M, Johnson AM, Kunduk M, Maryn Y, Patel RR, Samlan R, Semmler M, Schützenberger A. Re-Training of Convolutional Neural Networks for Glottis Segmentation in Endoscopic High-Speed Videos. APPLIED SCIENCES (BASEL, SWITZERLAND) 2022; 12:9791. [PMID: 37583544 PMCID: PMC10427138 DOI: 10.3390/app12199791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Endoscopic high-speed video (HSV) systems for visualization and assessment of vocal fold dynamics in the larynx are diverse and technically advancing. To consider resulting "concepts shifts" for neural network (NN)-based image processing, re-training of already trained and used NNs is necessary to allow for sufficiently accurate image processing for new recording modalities. We propose and discuss several re-training approaches for convolutional neural networks (CNN) being used for HSV image segmentation. Our baseline CNN was trained on the BAGLS data set (58,750 images). The new BAGLS-RT data set consists of additional 21,050 images from previously unused HSV systems, light sources, and different spatial resolutions. Results showed that increasing data diversity by means of preprocessing already improves the segmentation accuracy (mIoU + 6.35%). Subsequent re-training further increases segmentation performance (mIoU + 2.81%). For re-training, finetuning with dynamic knowledge distillation showed the most promising results. Data variety for training and additional re-training is a helpful tool to boost HSV image segmentation quality. However, when performing re-training, the phenomenon of catastrophic forgetting should be kept in mind, i.e., adaption to new data while forgetting already learned knowledge.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Tobias Schraut
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Lea A. Henrich
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Dinesh Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), 80331 Munich, Germany
| | - Aaron M. Johnson
- NYU Voice Center, Department of Otolaryngology–Head and Neck Surgery, New York University, Grossman School of Medicine, New York, NY 10001, USA
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, LA 70801, USA
| | - Youri Maryn
- Department of Speech, Language and Hearing Sciences, University of Ghent, 9000 Ghent, Belgium
| | - Rita R. Patel
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IA 47401, USA
| | - Robin Samlan
- Department of Speech, Language, & Hearing Sciences, University of Arizona, Tucson, AZ 85641, USA
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhino-laryngology Head & Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, 91054 Erlangen, Germany
| |
Collapse
|
8
|
Zita A, Novozámský A, Zitová B, Šorel M, Herbst CT, Vydrová J, Švec JG. Videokymogram Analyzer Tool: Human–computer comparison. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2022.103878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
9
|
Lehoux S, Popeil L, Švec JG. Laryngeal and Acoustic Analysis of Chest and Head Registers Extended Across a Three-Octave Range: A Case Study. J Voice 2022:S0892-1997(22)00053-4. [PMID: 35504793 DOI: 10.1016/j.jvoice.2022.02.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 02/09/2022] [Accepted: 02/11/2022] [Indexed: 10/18/2022]
Abstract
Voice registers are assumed to be related to different laryngeal adjustments, but objective evidence has been insufficient. While chest register is usually associated with the lower pitch range, and head register with the higher pitch range, here we investigated a professional singer who claimed an ability to produce both these registers at every pitch, throughout her entire singing range. The singer performed separated phonations alternating between the two registers (further called chest-like and head-like) at all pitches from C3 (131 Hz) to C6 (1047 Hz). We monitored the vocal fold vibrations using high-speed video endoscopy and electroglottography. The microphone sound was recorded and used for blind listening tests performed by the three authors (insiders) and by six "naive" participants (outsiders). The outsiders correctly identified the registers in 64% of the cases, and the insiders in 89% of the cases. Objective analysis revealed larger closed quotient and vertical phase differences for the chest-like register within the lower range below G4 (<392 Hz), and also a larger closed quotient at the membranous glottis within the higher range above Bb4 (>466 Hz), but not between Ab4-A4 (415-440 Hz). The normalized amplitude quotient was consistently lower in the chest-like register throughout the entire range. The results indicate that that the singer employed subtle laryngeal control mechanisms for the chest-like and head-like phonations on top of the traditionally recognized low-pitched chest and high-pitched head register phenomena. Across all pitches, the chest-like register was produced with more rapid glottal closure that was usually, but not necessarily, accompanied also by stronger adduction of membranous glottis. These register changes were not always easily perceivable by listeners, however.
Collapse
Affiliation(s)
- Sarah Lehoux
- Voice Research Lab, Department of Experimental Physics, Faculty of Science, Palacký University, Olomouc, Czech Republic
| | | | - Jan G Švec
- Voice Research Lab, Department of Experimental Physics, Faculty of Science, Palacký University, Olomouc, Czech Republic.
| |
Collapse
|
10
|
Comparative analysis of high-speed videolaryngoscopy images and sound data simultaneously acquired from rigid and flexible laryngoscope: a pilot study. Sci Rep 2021; 11:20480. [PMID: 34650174 PMCID: PMC8516923 DOI: 10.1038/s41598-021-99948-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 10/04/2021] [Indexed: 12/03/2022] Open
Abstract
High-Speed Videoendoscopy (HSV) is becoming a robust tool for the assessment of vocal fold vibration in laboratory investigation and clinical practice. We describe the first successful application of flexible High Speed Videoendoscopy with innovative laser light source conducted in clinical settings. The acquired image and simultaneously recorded audio data are compared to the results obtained by means of a rigid endoscope. We demonstrated that the HSV recordings with fiber-optic laryngoscope have enabled obtaining consistently bright, color images suitable for parametrization of vocal fold oscillation similarly as in the case of the HSV data obtained from a rigid laryngoscope. The comparison of period and amplitude perturbation parameters calculated on the basis of image and audio data acquired from flexible and rigid HSV recording objectively confirm that flexible High-Speed Videoendoscopy is a more suitable method for examination of natural phonation. The HSV-based measures generated from this kymographic analysis are arguably a superior representation of the vocal fold vibrations than the acoustic analysis because their quantification is independent of the vocal tract influences. This experimental study has several implications for further research in the field of HSV application in clinical assessment of glottal pathologies nature and its effect on vocal folds vibrations.
Collapse
|
11
|
Kist AM, Dürr S, Schützenberger A, Döllinger M. OpenHSV: an open platform for laryngeal high-speed videoendoscopy. Sci Rep 2021; 11:13760. [PMID: 34215788 PMCID: PMC8253769 DOI: 10.1038/s41598-021-93149-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 06/03/2021] [Indexed: 11/22/2022] Open
Abstract
High-speed videoendoscopy is an important tool to study laryngeal dynamics, to quantify vocal fold oscillations, to diagnose voice impairments at laryngeal level and to monitor treatment progress. However, there is a significant lack of an open source, expandable research tool that features latest hardware and data analysis. In this work, we propose an open research platform termed OpenHSV that is based on state-of-the-art, commercially available equipment and features a fully automatic data analysis pipeline. A publicly available, user-friendly graphical user interface implemented in Python is used to interface the hardware. Video and audio data are recorded in synchrony and are subsequently fully automatically analyzed. Video segmentation of the glottal area is performed using efficient deep neural networks to derive glottal area waveform and glottal midline. Established quantitative, clinically relevant video and audio parameters were implemented and computed. In a preliminary clinical study, we recorded video and audio data from 28 healthy subjects. Analyzing these data in terms of image quality and derived quantitative parameters, we show the applicability, performance and usefulness of OpenHSV. Therefore, OpenHSV provides a valid, standardized access to high-speed videoendoscopy data acquisition and analysis for voice scientists, highlighting its use as a valuable research tool in understanding voice physiology. We envision that OpenHSV serves as basis for the next generation of clinical HSV systems.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany. .,Department of Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nürnberg, Henkestr. 91, 91054, Erlangen, Germany.
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Waldstr. 1, 91054, Erlangen, Germany
| |
Collapse
|
12
|
Echternach M, Herbst CT, Köberlein M, Story B, Döllinger M, Gellrich D. Are source-filter interactions detectable in classical singing during vowel glides? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:4565. [PMID: 34241428 DOI: 10.1121/10.0005432] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 06/03/2021] [Indexed: 06/13/2023]
Abstract
In recent studies, it has been assumed that vocal tract formants (Fn) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒo) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of Fn/nƒo and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected Fn/nƒo intersections. In contrast, ƒo adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒo and Fn. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Christian T Herbst
- Antonio Salieri Department of Vocal Studies and Vocal Research in Music Education, University of Music and Performing Arts Vienna, Vienna, Austria
| | - Marie Köberlein
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| | - Brad Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology Head and Neck Surgery, University Hospital Erlangen, Medical School Waldstrasse 1, Erlangen, 91054, Germany
| | - Donata Gellrich
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Marchioninistrasse 15, Munich, 81377, Germany
| |
Collapse
|
13
|
Semmler M, Berry DA, Schützenberger A, Döllinger M. Fluid-structure-acoustic interactions in an ex vivo porcine phonation model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1657. [PMID: 33765793 PMCID: PMC7952141 DOI: 10.1121/10.0003602] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Revised: 01/29/2021] [Accepted: 02/07/2021] [Indexed: 05/02/2023]
Abstract
In the clinic, many diagnostic and therapeutic procedures focus on the oscillation patterns of the vocal folds (VF). Dynamic characteristics of the VFs, such as symmetry, periodicity, and full glottal closure, are considered essential features for healthy phonation. However, the relevance of these individual factors in the complex interaction between the airflow, laryngeal structures, and the resulting acoustics has not yet been quantified. Sustained phonation was induced in nine excised porcine larynges without vocal tract (supraglottal structures had been removed above the ventricular folds). The multimodal setup was designed to simultaneously control and monitor key aspects of phonation in the three essential parts of the larynx. More specifically, measurements will comprise (1) the subglottal pressure signal, (2) high-speed recordings in the glottal plane, and (3) the acoustic signal in the supraglottal region. The automated setup regulates glottal airflow, asymmetric arytenoid adduction, and the pre-phonatory glottal gap. Statistical analysis revealed a beneficial influence of VF periodicity and glottal closure on the signal quality of the subglottal pressure and the supraglottal acoustics, whereas VF symmetry only had a negligible influence. Strong correlations were found between the subglottal and supraglottal signal quality, with significant improvement of the acoustic quality for high levels of periodicity and glottal closure.
Collapse
Affiliation(s)
- Marion Semmler
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - David A Berry
- Laryngeal Dynamics Laboratory, Department of Head and Neck Surgery, David Geffen School of Medicine, UCLA, Los Angeles, California 90024, USA
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology at the Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Medical School at Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| |
Collapse
|
14
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
15
|
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S, Echternach M, Johnson AM, Kniesburges S, Kunduk M, Maryn Y, Schützenberger A, Verguts M, Döllinger M. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci Data 2020; 7:186. [PMID: 32561845 PMCID: PMC7305104 DOI: 10.1038/s41597-020-0526-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 05/15/2020] [Indexed: 02/06/2023] Open
Abstract
Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USA, we have created BAGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. The BAGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Aaron M Johnson
- NYU Voice Center, Department of Otolaryngology - Head and Neck Surgery, New York University School of Medicine, New York, New York, USA
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Youri Maryn
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Speech, Language and Hearing sciences, University of Ghent, Ghent, Belgium
- Faculty of Education, Health and Social Work, University College Ghent, Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Monique Verguts
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| |
Collapse
|