1
|
Darvish M, Kist AM. A Generative Method for a Laryngeal Biosignal. J Voice 2024:S0892-1997(24)00019-5. [PMID: 38395653 DOI: 10.1016/j.jvoice.2024.01.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 01/26/2024] [Accepted: 01/26/2024] [Indexed: 02/25/2024]
Abstract
The Glottal Area Waveform (GAW) is an important component in quantitative clinical voice assessment, providing valuable insights into vocal fold function. In this study, we introduce a novel method employing Variational Autoencoders (VAEs) to generate synthetic GAWs. Our approach enables the creation of synthetic GAWs that closely replicate real-world data, offering a versatile tool for researchers and clinicians. We elucidate the process of manipulating the VAE latent space using the Glottal Opening Vector (GlOVe). The GlOVe allows precise control over the synthetic closure and opening of the vocal folds. By utilizing the GlOVe, we generate synthetic laryngeal biosignals. These biosignals accurately reflect vocal fold behavior, allowing for the emulation of realistic glottal opening changes. This manipulation extends to the introduction of arbitrary oscillations in the vocal folds, closely resembling real vocal fold oscillations. The range of factor coefficient values enables the generation of diverse biosignals with varying frequencies and amplitudes. Our results demonstrate that this approach yields highly accurate laryngeal biosignals, with the Normalized Mean Absolute Error values for various frequencies ranging from 9.6 ⋅ 10-3 to 1.20 ⋅ 10-2 for different experimented frequencies, alongside a remarkable training effectiveness, reflected in reductions of up to approximately 89.52% in key loss components. This proposed method may have implications for downstream speech synthesis and phonetics research, offering the potential for advanced and natural-sounding speech technologies.
Collapse
Affiliation(s)
- Mahdi Darvish
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Andreas M Kist
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| |
Collapse
|
2
|
Peterson QA, Fei T, Sy LE, Froeschke LL, Mendelsohn AH, Berke GS, Peterson DA. Correlating Perceptual Voice Quality in Adductor Spasmodic Dysphonia With Computer Vision Assessment of Glottal Geometry Dynamics. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3695-3708. [PMID: 36130065 PMCID: PMC9927624 DOI: 10.1044/2022_jslhr-22-00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
PURPOSE This study examined the relationship between voice quality and glottal geometry dynamics in patients with adductor spasmodic dysphonia (ADSD). METHOD An objective computer vision and machine learning system was developed to extract glottal geometry dynamics from nasolaryngoscopic video recordings for 78 patients with ADSD. General regression models were used to examine the relationship between overall voice quality and 15 variables that capture glottal geometry dynamics derived from the computer vision system. Two experts in ADSD independently rated voice quality for two separate voice tasks for every patient, yielding four different voice quality rating models. RESULTS All four of the regression models exhibited positive correlations with clinical assessments of voice quality (R 2s = .30-.34, Spearman rho = .55-.61, all with p < .001). Seven to 10 variables were included in each model. There was high overlap in the variables included between the four models, and the sign of the correlation with voice quality was consistent for each variable across all four regression models. CONCLUSION We found specific glottal geometry dynamics that correspond to voice quality in ADSD.
Collapse
Affiliation(s)
- Quinn A. Peterson
- Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo
| | - Teng Fei
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | - Lauren E. Sy
- Department of Cognitive Science, University of California, San Diego, La Jolla
| | | | - Abie H. Mendelsohn
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - Gerald S. Berke
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles
| | - David A. Peterson
- Institute for Neural Computation, University of California, San Diego, La Jolla
| |
Collapse
|
3
|
Schlegel P, Kist AM, Kunduk M, Dürr S, Döllinger M, Schützenberger A. Interdependencies between acoustic and high-speed videoendoscopy parameters. PLoS One 2021; 16:e0246136. [PMID: 33529244 PMCID: PMC7853476 DOI: 10.1371/journal.pone.0246136] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/13/2021] [Indexed: 02/06/2023] Open
Abstract
In voice research, uncovering relations between the oscillating vocal folds, being the sound source of phonation, and the resulting perceived acoustic signal are of great interest. This is especially the case in the context of voice disorders, such as functional dysphonia (FD). We investigated 250 high-speed videoendoscopy (HSV) recordings with simultaneously recorded acoustic signals (124 healthy females, 60 FD females, 44 healthy males, 22 FD males). 35 glottal area waveform (GAW) parameters and 14 acoustic parameters were calculated for each recording. Linear and non-linear relations between GAW and acoustic parameters were investigated using Pearson correlation coefficients (PCC) and distance correlation coefficients (DCC). Further, norm values for parameters obtained from 250 ms long sustained phonation data (vowel /i/) were provided. 26 PCCs in females (5.3%) and 8 in males (1.6%) were found to be statistically significant (|corr.| ≥ 0.3). Only minor differences were found between PCCs and DCCs, indicating presence of weak non-linear dependencies between parameters. Fundamental frequency was involved in the majority of all relevant PCCs between GAW and acoustic parameters (19 in females and 7 in males). The most distinct difference between correlations in females and males was found for the parameter Period Variability Index. The study shows only weak relations between investigated acoustic and GAW-parameters. This indicates that the reduction of the complex 3D glottal dynamics to the 1D-GAW may erase laryngeal dynamic characteristics that are reflected within the acoustic signal. Hence, other GAW parameters, 2D-, 3D-laryngeal dynamics and vocal tract parameters should be further investigated towards potential correlations to the acoustic signal.
Collapse
Affiliation(s)
- Patrick Schlegel
- Department of Head & Neck Surgery, David Geffen School of Medicine, University of California Los Angeles (UCLA), Los Angeles, California, United States of America
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
- * E-mail:
| | - Andreas M. Kist
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Melda Kunduk
- Dep. of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, United States of America
| | - Stephan Dürr
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Michael Döllinger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Dep. of Otorhinolaryngology, Div. of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
4
|
Fitting synthetic to clinical kymographic images for deriving kinematic vocal fold parameters: Application to left-right vibratory phase differences. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
5
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
6
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256#suppl] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
7
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1492. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 08/07/2019] [Accepted: 08/09/2019] [Indexed: 06/10/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
8
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
9
|
Hadwin PJ, Peterson SD. An extended Kalman filter approach to non-stationary Bayesian estimation of reduced-order vocal fold model parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2909. [PMID: 28464670 DOI: 10.1121/1.4981240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The Bayesian framework for parameter inference provides a basis from which subject-specific reduced-order vocal fold models can be generated. Previously, it has been shown that a particle filter technique is capable of producing estimates and associated credibility intervals of time-varying reduced-order vocal fold model parameters. However, the particle filter approach is difficult to implement and has a high computational cost, which can be barriers to clinical adoption. This work presents an alternative estimation strategy based upon Kalman filtering aimed at reducing the computational cost of subject-specific model development. The robustness of this approach to Gaussian and non-Gaussian noise is discussed. The extended Kalman filter (EKF) approach is found to perform very well in comparison with the particle filter technique at dramatically lower computational cost. Based upon the test cases explored, the EKF is comparable in terms of accuracy to the particle filter technique when greater than 6000 particles are employed; if less particles are employed, the EKF actually performs better. For comparable levels of accuracy, the solution time is reduced by 2 orders of magnitude when employing the EKF. By virtue of the approximations used in the EKF, however, the credibility intervals tend to be slightly underpredicted.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| |
Collapse
|
10
|
Aichinger P, Roesner I, Leonhard M, Schneider-Stickler B, Denk-Linnert DM, Bigenzahn W, Fuchs AK, Hagmüller M, Kubin G. Comparison of an audio-based and a video-based approach for detecting diplophonia. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2014.10.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
11
|
Sorokin VN, Leonov AS. Determination of a vocal source by the spectral ratio method. PATTERN RECOGNITION AND IMAGE ANALYSIS 2017. [DOI: 10.1134/s105466181701014x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
12
|
Hadwin PJ, Galindo GE, Daun KJ, Zañartu M, Erath BD, Cataldo E, Peterson SD. Non-stationary Bayesian estimation of parameters from a body cover model of the vocal folds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2683. [PMID: 27250162 PMCID: PMC10423076 DOI: 10.1121/1.4948755] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 04/15/2016] [Accepted: 04/22/2016] [Indexed: 05/09/2023]
Abstract
The evolution of reduced-order vocal fold models into clinically useful tools for subject-specific diagnosis and treatment hinges upon successfully and accurately representing an individual patient in the modeling framework. This, in turn, requires inference of model parameters from clinical measurements in order to tune a model to the given individual. Bayesian analysis is a powerful tool for estimating model parameter probabilities based upon a set of observed data. In this work, a Bayesian particle filter sampling technique capable of estimating time-varying model parameters, as occur in complex vocal gestures, is introduced. The technique is compared with time-invariant Bayesian estimation and least squares methods for determining both stationary and non-stationary parameters. The current technique accurately estimates the time-varying unknown model parameter and maintains tight credibility bounds. The credibility bounds are particularly relevant from a clinical perspective, as they provide insight into the confidence a clinician should have in the model predictions.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Gabriel E Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Kyle J Daun
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, New York 13699, USA
| | - Edson Cataldo
- Applied Mathematics Department, Graduate Program in Electrical and Telecommunications Engineering (PPGEET), Universidade Federal Fluminense, Niteroi, Rio de Janeiro, CEP24020-140, Brazil
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
13
|
|
14
|
Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J. A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms. Artif Intell Med 2015; 66:15-28. [PMID: 26597002 DOI: 10.1016/j.artmed.2015.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 09/28/2015] [Accepted: 10/20/2015] [Indexed: 12/01/2022]
Abstract
OBJECTIVE This work presents a computer-based approach to analyze the two-dimensional vocal fold dynamics of endoscopic high-speed videos, and constitutes an extension and generalization of a previously proposed wavelet-based procedure. While most approaches aim for analyzing sustained phonation conditions, the proposed method allows for a clinically adequate analysis of both dynamic as well as sustained phonation paradigms. MATERIALS AND METHODS The analysis procedure is based on a spatio-temporal visualization technique, the phonovibrogram, that facilitates the documentation of the visible laryngeal dynamics. From the phonovibrogram, a low-dimensional set of features is computed using a principle component analysis strategy that quantifies the type of vibration patterns, irregularity, lateral symmetry and synchronicity, as a function of time. Two different test bench data sets are used to validate the approach: (I) 150 healthy and pathologic subjects examined during sustained phonation. (II) 20 healthy and pathologic subjects that were examined twice: during sustained phonation and a glissando from a low to a higher fundamental frequency. In order to assess the discriminative power of the extracted features, a Support Vector Machine is trained to distinguish between physiologic and pathologic vibrations. The results for sustained phonation sequences are compared to the previous approach. Finally, the classification performance of the stationary analyzing procedure is compared to the transient analysis of the glissando maneuver. RESULTS For the first test bench the proposed procedure outperformed the previous approach (proposed feature set: accuracy: 91.3%, sensitivity: 80%, specificity: 97%, previous approach: accuracy: 89.3%, sensitivity: 76%, specificity: 96%). Comparing the classification performance of the second test bench further corroborates that analyzing transient paradigms provides clear additional diagnostic value (glissando maneuver: accuracy: 90%, sensitivity: 100%, specificity: 80%, sustained phonation: accuracy: 75%, sensitivity: 80%, specificity: 70%). CONCLUSIONS The incorporation of parameters describing the temporal evolvement of vocal fold vibration clearly improves the automatic identification of pathologic vibration patterns. Furthermore, incorporating a dynamic phonation paradigm provides additional valuable information about the underlying laryngeal dynamics that cannot be derived from sustained conditions. The proposed generalized approach provides a better overall classification performance than the previous approach, and hence constitutes a new advantageous tool for an improved clinical diagnosis of voice disorders.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
| | - Maria Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, Marchioninistr. 13, 81366 München, Germany
| | - Dietmar J Hecker
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany
| |
Collapse
|
15
|
Unger J, Lohscheller J, Reiter M, Eder K, Betz CS, Schuster M. A Noninvasive Procedure for Early-Stage Discrimination of Malignant and Precancerous Vocal Fold Lesions Based on Laryngeal Dynamics Analysis. Cancer Res 2014; 75:31-9. [DOI: 10.1158/0008-5472.can-14-1458] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
Fulcher LP, Scherer RC, Anderson NV. Entrance loss coefficients and exit coefficients for a physical model of the glottis with convergent angles. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:1312. [PMID: 25190404 PMCID: PMC4165224 DOI: 10.1121/1.4887477] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 06/23/2014] [Accepted: 06/26/2014] [Indexed: 06/03/2023]
Abstract
Pressure distributions were obtained for 5°, 10°, and 20° convergent angles with a static physical model (M5) of the glottis. Measurements were made for minimal glottal diameters from d = 0.005-0.32 cm with a range of transglottal pressures of interest for phonation. Entrance loss coefficients were calculated at the glottal entrance for each minimal diameter and transglottal pressure to measure how far the flows in this region deviate from Bernoulli flow. Exit coefficients were also calculated to determine the presence and magnitude of pressure recovery near the glottal exit. The entrance loss coefficients for the three convergent angles vary from values near 2.3-3.4 for d = 0.005 cm to values near 0.6 for d = 0.32 cm. These coefficients extend the tables of entrance loss and exit coefficients obtained for the uniform glottis according to Fulcher, Scherer, and Powell [J. Acoust. Soc. Am. 129, 1548-1553 (2011)].
Collapse
Affiliation(s)
- Lewis P Fulcher
- Department of Physics and Astronomy, Bowling Green State University, Bowling Green, Ohio 43403
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, Ohio 43403
| | - Nicholas V Anderson
- Department of Physics and Astronomy, Bowling Green State University, Bowling Green, Ohio 43403
| |
Collapse
|
17
|
Hüttner B, Luegmair G, Patel RR, Ziethe A, Eysholdt U, Bohr C, Sebova I, Semmler M, Döllinger M. Development of a time-dependent numerical model for the assessment of non-stationary pharyngoesophageal tissue vibrations after total laryngectomy. Biomech Model Mechanobiol 2014; 14:169-84. [PMID: 24861998 DOI: 10.1007/s10237-014-0597-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 05/14/2014] [Indexed: 11/29/2022]
Abstract
Laryngeal cancer due to, e.g., extensive smoking and/or alcohol consumption can necessitate the excision of the entire larynx. After such a total laryngectomy, the voice generating structures are lost and with that the quality of life of the concerning patients is drastically reduced. However, the vibrations of the remaining tissue in the so called pharyngoesophageal (PE) segment can be applied as alternative sound generator. Tissue, scar, and geometric aspects of the PE-segment determine the postoperative substitute voice characteristic, being highly important for the future live of the patient. So far, PE-dynamics are simulated by a biomechanical model which is restricted to stationary vibrations, i.e., variations in pitch and amplitude cannot be handled. In order to investigate the dynamical range of PE-vibrations, knowledge about the temporal processes during substitute voice production is of crucial interest. Thus, time-dependent model parameters are suggested in order to quantify non-stationary PE-vibrations and drawing conclusions on the temporal characteristics of tissue stiffness, oscillating mass, pressure, and geometric distributions within the PE-segment. To adapt the numerical model to the PE-vibrations, an automatic, block-based optimization procedure is applied, comprising a combined global and local optimization approach. The suggested optimization procedure is validated with 75 synthetic data sets, simulating non-stationary oscillations of differently shaped PE-segments. The application to four high-speed recordings is shown and discussed. The correlation between model and PE-dynamics is ≥ 97%.
Collapse
Affiliation(s)
- Björn Hüttner
- Department of Phoniatrics and Pediatric Audiology, Medical School, University Hospital Erlangen, Bohlenplatz 21, 91054 , Erlangen, Germany,
| | | | | | | | | | | | | | | | | |
Collapse
|
18
|
Patel R, Dubrovskiy D, Döllinger M. Characterizing vibratory kinematics in children and adults with high-speed digital imaging. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:S674-86. [PMID: 24686982 PMCID: PMC7315516 DOI: 10.1044/2014_jslhr-s-12-0278] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
PURPOSE The aim of this study is to quantify and identify characteristic vibratory motion in typically developing prepubertal children and young adults using high-speed digital imaging. METHOD The vibrations of the vocal folds were recorded from 27 children (ages 5-9 years) and 35 adults (ages 21-45 years), with high speed at 4,000 frames per second for sustained phonation. Kinematic features of amplitude periodicity, time periodicity, phase asymmetry, spatial symmetry, and glottal gap index were analyzed from the glottal area waveform across mean and standard deviation (i.e., intercycle variability) for each measure. RESULTS Children exhibited lower mean amplitude periodicity compared to men and women and lower time periodicity compared to men. Children and women exhibited greater variability in amplitude periodicity, time periodicity, phase asymmetry, and glottal gap index compared to men. Women had lower mean values of amplitude periodicity and time periodicity compared to men. CONCLUSION Children differed both spatially but more temporally in vocal fold motion, suggesting the need for the development of children-specific kinematic norms. Results suggest more uncontrolled vibratory motion in children, reflecting changes in the vocal fold layered structure and aero-acoustic source mechanisms.
Collapse
|
19
|
Bohr C, Kraeck A, Eysholdt U, Ziethe A, Döllinger M. Quantitative analysis of organic vocal fold pathologies in females by high-speed endoscopy. Laryngoscope 2013; 123:1686-93. [PMID: 23649746 DOI: 10.1002/lary.23783] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2012] [Revised: 08/23/2012] [Accepted: 09/18/2012] [Indexed: 11/11/2022]
Abstract
OBJECTIVES/HYPOTHESIS Quantitative analysis of endoscopic high-speed video recordings of vocal fold vibrations has been growing in importance in recent years. The videos have mainly been analyzed using subjective evaluation, but this is examiner dependent, and the results show inadequate interobserver agreement. The aims of this study were therefore to identify appropriate objective parameters for analyzing high-speed recordings to differentiate healthy voice production from organic disorders. STUDY DESIGN METHODS A total of 152 females were examined, divided into 77 healthy and 75 with four different pathological conditions: laryngeal epithelial thickening, Reinke edema, vocal fold polyps, and vocal fold cysts. Vocal fold vibrations were recorded with a high-speed camera (4,000 Hz, 256 × 256 pixels) during sustained phonation. Parameters computed from the glottal area waveform (GAW) and from phonovibrogram (PVG) were analyzed. Multiparametric linear discriminant analysis was performed to classify pathological conditions versus the healthy group. RESULTS Twenty of 44 parameters were identified that are capable of distinguishing between the individual types of pathology. PVG parameters showed better performance than GAW parameters. Parameters representing vibrational periodicity via standard deviation showed better performance than absolute parameters. In addition, linear discriminant analysis achieved reliable differentiation between healthy and pathological vocal fold vibrations: 72% for the five-class problem (all groups separately) and 88% for the two-class problem (healthy vs. all pathologies taken as one class). CONCLUSIONS The study succeeded in defining objective parameters for analyzing endoscopic high-speed videos and suggesting first parameters for differentiation between healthy dynamics and dynamics of organic pathologies.
Collapse
Affiliation(s)
- Christopher Bohr
- Department of Otorhinolaryngology, Head and Neck Surgery, Erlangen University Hospital, Erlangen, Germany.
| | | | | | | | | |
Collapse
|
20
|
Inwald EC, Döllinger M, Schuster M, Eysholdt U, Bohr C. Multiparametric Analysis of Vocal Fold Vibrations in Healthy and Disordered Voices in High-Speed Imaging. J Voice 2011; 25:576-90. [DOI: 10.1016/j.jvoice.2010.04.004] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
21
|
Yang A, Stingl M, Berry DA, Lohscheller J, Voigt D, Eysholdt U, Dollinger M. Computation of physiological human vocal fold parameters by mathematical optimization of a biomechanical model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:948-64. [PMID: 21877808 PMCID: PMC3195891 DOI: 10.1121/1.3605551] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
With the use of an endoscopic, high-speed camera, vocal fold dynamics may be observed clinically during phonation. However, observation and subjective judgment alone may be insufficient for clinical diagnosis and documentation of improved vocal function, especially when the laryngeal disease lacks any clear morphological presentation. In this study, biomechanical parameters of the vocal folds are computed by adjusting the corresponding parameters of a three-dimensional model until the dynamics of both systems are similar. First, a mathematical optimization method is presented. Next, model parameters (such as pressure, tension and masses) are adjusted to reproduce vocal fold dynamics, and the deduced parameters are physiologically interpreted. Various combinations of global and local optimization techniques are attempted. Evaluation of the optimization procedure is performed using 50 synthetically generated data sets. The results show sufficient reliability, including 0.07 normalized error, 96% correlation, and 91% accuracy. The technique is also demonstrated on data from human hemilarynx experiments, in which a low normalized error (0.16) and high correlation (84%) values were achieved. In the future, this technique may be applied to clinical high-speed images, yielding objective measures with which to document improved vocal function of patients with voice disorders.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
22
|
Fulcher LP, Scherer RC, Powell T. Pressure distributions in a static physical model of the uniform glottis: entrance and exit coefficients. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1548-1553. [PMID: 21428518 PMCID: PMC3078031 DOI: 10.1121/1.3514424] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2010] [Revised: 06/10/2010] [Accepted: 10/11/2010] [Indexed: 05/28/2023]
Abstract
Pressure distributions for the uniform glottis were obtained with a static physical model (M5). Glottal diameters of d=0.005, 0.0075, 0.01, 0.02, 0.04, 0.08, 0.16, and 0.32 cm were used with a range of phonatory transglottal pressures. At each pressure and diameter, entrance loss and exit coefficients were determined. In general, both coefficients decreased in value as the transglottal pressure or the diameter increased. Entrance loss coefficients ranged from 0.69 to 17.6. Use of these coefficients with the measured flow rates in straightforward equations accurately reproduced the pressure distributions within the glottis and along the inferior vocal fold surface.
Collapse
Affiliation(s)
- Lewis P Fulcher
- Department of Physics and Astronomy, Bowling Green State University, Bowling Green, Ohio 43403, USA.
| | | | | |
Collapse
|
23
|
Zhang Y, Regner MF, Jiang JJ. Theoretical modeling and experimental high-speed imaging of elongated vocal folds. IEEE Trans Biomed Eng 2010; 58:2725-31. [PMID: 21118763 DOI: 10.1109/tbme.2010.2095012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this paper, the role of vocal fold elongation in governing glottal movement dynamics was theoretically and experimentally investigated. A theoretical model was first proposed to incorporate vocal fold elongation into the two-mass model. This model predicted the direct and nondirect components of the glottal time series as a function of vocal fold elongation. Furthermore, high-speed digital imaging was applied in excised larynx experiments to visualize vocal fold vibrations with variable vocal fold elongation from -10% to 50% and subglottal pressures of 18- and 24-cm H(2)O. Comparison between theoretical model simulations and experimental observations showed good agreement. A relative maximum was seen in the nondirect component of glottal area, suggesting that an optimal elongation could maximize the vocal fold vibratory power. However, sufficiently large vocal fold elongations caused the nondirect component to approach zero and the direct component to approach a constant. These results showed that vocal fold elongation plays an important role in governing the dynamics of glottal area movement and validated the applicability of the proposed theoretical model and high-speed imaging to investigate laryngeal activity.
Collapse
Affiliation(s)
- Yu Zhang
- Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Oceanography and Environmental Science, Xiamen University, Xiamen 361005, China.
| | | | | |
Collapse
|
24
|
Classification of functional voice disorders based on phonovibrograms. Artif Intell Med 2010; 49:51-9. [DOI: 10.1016/j.artmed.2010.01.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Revised: 08/20/2009] [Accepted: 01/10/2010] [Indexed: 11/17/2022]
|
25
|
Yang A, Lohscheller J, Berry DA, Becker S, Eysholdt U, Voigt D, Döllinger M. Biomechanical modeling of the three-dimensional aspects of human vocal fold dynamics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1014-31. [PMID: 20136223 PMCID: PMC3137461 DOI: 10.1121/1.3277165] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 10/15/2009] [Accepted: 11/24/2009] [Indexed: 05/23/2023]
Abstract
Human voice originates from the three-dimensional (3D) oscillations of the vocal folds. In previous studies, biomechanical properties of vocal fold tissues have been predicted by optimizing the parameters of simple two-mass-models to fit its dynamics to the high-speed imaging data from the clinic. However, only lateral and longitudinal displacements of the vocal folds were considered. To extend previous studies, a 3D mass-spring, cover-model is developed, which predicts the 3D vibrations of the entire medial surface of the vocal fold. The model consists of five mass planes arranged in vertical direction. Each plane contains five longitudinal, mass-spring, coupled oscillators. Feasibility of the model is assessed using a large body of dynamical data previously obtained from excised human larynx experiments, in vivo canine larynx experiments, physical models, and numerical models. Typical model output was found to be similar to existing findings. The resulting model enables visualization of the 3D dynamics of the human vocal folds during phonation for both symmetric and asymmetric vibrations.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Bohlenplatz 21, 91054 Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
26
|
Zhang Y, Krausert CR, Kelly MP, Jiang JJ. Typing vocal fold vibratory patterns in excised larynx experiments via digital kymography. Ann Otol Rhinol Laryngol 2009; 118:598-605. [PMID: 19746760 DOI: 10.1177/000348940911800812] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
OBJECTIVES Signal typing is central to the understanding of vocal fold vibratory patterns. Digital kymography (DKG) allows the direct observation of vocal fold vibratory patterns, and therefore, using DKG for vibratory signal typing may provide a useful complement to traditional signal typing techniques. METHODS Video data collected from 20 larynges excised from mongrel dogs were observed with DKG in order to find examples of type 1 (nearly periodic), type 2 (subharmonic), and type 3 (aperiodic) vibratory patterns. The time series, frequency spectra, and correlation dimensions were calculated for each signal type. RESULTS The type 1 pattern showed a periodic time series of glottal edges and a discrete frequency spectrum. The type 2 vibratory pattern displayed a time series of alternating high- and low-amplitude waves and a frequency spectrum that included a subharmonic (F0/2) frequency component. Regular and symmetric vibratory patterns were observed in the type 1 and type 2 patterns. The type 3 vibratory pattern was characterized by an aperiodic time series of glottal edges, a broadband frequency spectrum, and irregular and asymmetric vibratory patterns. The correlation dimension estimates increased from type 1 to type 2 to type 3. CONCLUSIONS Imaging with DKG demonstrated an ability to assign a signal type to various laryngeal vibrations. Signal typing techniques utilizing direct observation of the vocal folds could be useful in determining valid methods for the analysis of vocal fold vibrations.
Collapse
Affiliation(s)
- Yu Zhang
- Department of Surgery, Division of Otolaryngology-Head and Neck Surgery, University of Wisconsin-Madison School of Medicine and Public Health, Madison, Wisconsin, USA
| | | | | | | |
Collapse
|
27
|
Eysholdt U, Döllinger M. Die Physik der Stimme und ihre medizinischen Folgen. CHEM-ING-TECH 2008. [DOI: 10.1002/cite.200750806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|