1
|
Donhauser J, Tur B, Döllinger M. Neural network-based estimation of biomechanical vocal fold parameters. Front Physiol 2024; 15:1282574. [PMID: 38449783 PMCID: PMC10916882 DOI: 10.3389/fphys.2024.1282574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/09/2024] [Indexed: 03/08/2024] Open
Abstract
Vocal fold (VF) vibrations are the primary source of human phonation. High-speed video (HSV) endoscopy enables the computation of descriptive VF parameters for assessment of physiological properties of laryngeal dynamics, i.e., the vibration of the VFs. However, underlying biomechanical factors responsible for physiological and disordered VF vibrations cannot be accessed. In contrast, physically based numerical VF models reveal insights into the organ's oscillations, which remain inaccessible through endoscopy. To estimate biomechanical properties, previous research has fitted subglottal pressure-driven mass-spring-damper systems, as inverse problem to the HSV-recorded VF trajectories, by global optimization of the numerical model. A neural network trained on the numerical model may be used as a substitute for computationally expensive optimization, yielding a fast evaluating surrogate of the biomechanical inverse problem. This paper proposes a convolutional recurrent neural network (CRNN)-based architecture trained on regression of a physiological-based biomechanical six-mass model (6 MM). To compare with previous research, the underlying biomechanical factor "subglottal pressure" prediction was tested against 288 HSV ex vivo porcine recordings. The contributions of this work are two-fold: first, the presented CRNN with the 6 MM handles multiple trajectories along the VFs, which allows for investigations on local changes in VF characteristics. Second, the network was trained to reproduce further important biomechanical model parameters like VF mass and stiffness on synthetic data. Unlike in a previous work, the network in this study is therefore an entire surrogate of the inverse problem, which allowed for explicit computation of the fitted model using our approach. The presented approach achieves a best-case mean absolute error (MAE) of 133 Pa (13.9%) in subglottal pressure prediction with 76.6% correlation on experimental data and a re-estimated fundamental frequency MAE of 15.9 Hz (9.9%). In-detail training analysis revealed subglottal pressure as the most learnable parameter. With the physiological-based model design and advances in fast parameter prediction, this work is a next step in biomechanical VF model fitting and the estimation of laryngeal kinematics.
Collapse
Affiliation(s)
- Jonas Donhauser
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | | | | |
Collapse
|
2
|
Zhang Z. Voice Feature Selection to Improve Performance of Machine Learning Models for Voice Production Inversion. J Voice 2023; 37:479-485. [PMID: 33849760 PMCID: PMC8502179 DOI: 10.1016/j.jvoice.2021.03.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 11/19/2022]
Abstract
OBJECTIVE Estimation of physiological control parameters of the vocal system from the produced voice outcome has important applications in clinical management of voice disorders . Previously we developed a simulation-based neural network for estimation of vocal fold geometry, mechanical properties, and subglottal pressure from voice outcome features that characterize the acoustics of the produced voice. The goals of this study are to (1) explore the possibility of improving the estimation accuracy of physiological control parameters by including voice outcome features characterizing vocal fold vibration; and (2) identify voice feature sets that optimize both estimation accuracy and robustness to measurement noise. METHODS Feedforward neural networks are trained to solve the inversion problem of estimating the physiological control parameters of a three-dimensional body-cover vocal fold model from different sets of voice outcome features that characterize the simulated voice acoustics, glottal flow, and vocal fold vibration. A sensitivity analysis is then performed to evaluate the contribution of individual voice features to the overall performance of the neural networks in estimating the physiologic control parameters. RESULTS AND CONCLUSIONS While including voice outcome features characterizing vocal fold vibration increases estimation accuracy, it also reduces the network's robustness to measurement noise, due to high sensitivity of network performance to voice outcome features measuring the absolute amplitudes of the glottal flow and area waveforms, which are also difficult to measure accurately in practical applications. By excluding such glottal flow-based features and replacing glottal area-based features by their normalized counterparts, we are able to significantly improve both estimation accuracy and robustness to noise. We further show that similar estimation accuracy and robustness can be achieved with an even smaller set of voice outcome features by excluding features of small sensitivity.
Collapse
Affiliation(s)
- Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, 31-24 Rehabilitation Center, Los Angeles, California.
| |
Collapse
|
3
|
Long-term performance assessment of fully automatic biomedical glottis segmentation at the point of care. PLoS One 2022; 17:e0266989. [PMID: 36129922 PMCID: PMC9491538 DOI: 10.1371/journal.pone.0266989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 07/25/2022] [Indexed: 12/04/2022] Open
Abstract
Deep Learning has a large impact on medical image analysis and lately has been adopted for clinical use at the point of care. However, there is only a small number of reports of long-term studies that show the performance of deep neural networks (DNNs) in such an environment. In this study, we measured the long-term performance of a clinically optimized DNN for laryngeal glottis segmentation. We have collected the video footage for two years from an AI-powered laryngeal high-speed videoendoscopy imaging system and found that the footage image quality is stable across time. Next, we determined the DNN segmentation performance on lossy and lossless compressed data revealing that only 9% of recordings contain segmentation artifacts. We found that lossy and lossless compression is on par for glottis segmentation, however, lossless compression provides significantly superior image quality. Lastly, we employed continual learning strategies to continuously incorporate new data into the DNN to remove the aforementioned segmentation artifacts. With modest manual intervention, we were able to largely alleviate these segmentation artifacts by up to 81%. We believe that our suggested deep learning-enhanced laryngeal imaging platform consistently provides clinically sound results, and together with our proposed continual learning scheme will have a long-lasting impact on the future of laryngeal imaging.
Collapse
|
4
|
Hadwin PJ, Erath BD, Peterson SD. The influence of flow model selection on finite element model parameter estimation using Bayesian inference. JASA EXPRESS LETTERS 2021; 1:045204. [PMID: 34136884 PMCID: PMC8182970 DOI: 10.1121/10.0004260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
Recently, Bayesian estimation coupled with finite element modeling has been demonstrated as a viable tool for estimating vocal fold material properties from kinematic information obtained via high-speed video recordings. In this article, the sensitivity of the parameter estimations to the employed fluid model is explored by considering Bernoulli and one-dimensional viscous fluid flow models. Simulation results indicate that prescribing an ad hoc separation location for the Bernoulli flow model can lead to large estimate biases, whereas including the separation location as an estimated parameter leads to results comparable to that of the viscous fluid flow model.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, New York 13699, USA , ,
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
5
|
Ghasemzadeh H, Deliyski DD, Hillman RE, Mehta DD. Method for Horizontal Calibration of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy. APPLIED SCIENCES (BASEL, SWITZERLAND) 2021; 11:822. [PMID: 33628469 PMCID: PMC7899170 DOI: 10.3390/app11020822] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in-vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements. METHOD A set of circular grids were recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length. RESULT A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative. CONCLUSION The presented method can achieve sub-millimeter accuracy in horizontal measurement. SIGNIFICANCE Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed.
Collapse
Affiliation(s)
- Hamzeh Ghasemzadeh
- “Department of Communicative Sciences and Disorders” and “Department of Computational Mathematics Science and Engineering”, Michigan State University, East Lansing, Michigan, USA
| | - Dimitar D. Deliyski
- “Department of Communicative Sciences and Disorders”, Michigan State University, East Lansing, Michigan, USA
| | - Robert E. Hillman
- “MGH Institute of Health Professions”, “Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital”, “Department of Surgery, Harvard Medical School”, and “Speech and Hearing Bioscience and Technology, Division of Medical Sciences”, Harvard Medical School, Boston, MA, USA
| | - Daryush D. Mehta
- “MGH Institute of Health Professions”, “Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital”, “Department of Surgery, Harvard Medical School”, and “Speech and Hearing Bioscience and Technology, Division of Medical Sciences”, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Fitting synthetic to clinical kymographic images for deriving kinematic vocal fold parameters: Application to left-right vibratory phase differences. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
8
|
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S, Echternach M, Johnson AM, Kniesburges S, Kunduk M, Maryn Y, Schützenberger A, Verguts M, Döllinger M. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci Data 2020; 7:186. [PMID: 32561845 PMCID: PMC7305104 DOI: 10.1038/s41597-020-0526-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 05/15/2020] [Indexed: 02/06/2023] Open
Abstract
Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USA, we have created BAGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. The BAGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Aaron M Johnson
- NYU Voice Center, Department of Otolaryngology - Head and Neck Surgery, New York University School of Medicine, New York, New York, USA
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Youri Maryn
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Speech, Language and Hearing sciences, University of Ghent, Ghent, Belgium
- Faculty of Education, Health and Social Work, University College Ghent, Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Monique Verguts
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| |
Collapse
|
9
|
Zhang Z. Estimation of vocal fold physiology from voice acoustics using machine learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL264. [PMID: 32237804 PMCID: PMC7075716 DOI: 10.1121/10.0000927] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 03/01/2020] [Accepted: 03/03/2020] [Indexed: 05/27/2023]
Abstract
The goal of this study is to estimate vocal fold geometry, stiffness, position, and subglottal pressure from voice acoustics, toward clinical and other voice technology applications. Unlike previous voice inversion research that often uses lumped-element models of phonation, this study explores the feasibility of voice inversion using data generated from a three-dimensional voice production model. Neural networks are trained to estimate vocal fold properties and subglottal pressure from voice features extracted from the simulation data. Results show reasonably good estimation accuracy, particularly for vocal fold properties with a consistent global effect on voice production, and reasonable agreement with excised human larynx experiment.
Collapse
Affiliation(s)
- Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, 31-24 Rehab Center, 1000 Veteran Avenue, Los Angeles, California 90095-1794,
| |
Collapse
|
10
|
Abstract
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
Collapse
|
11
|
Zhang Y, Zheng X, Xue Q. A Deep Neural Network Based Glottal Flow Model for Predicting Fluid-Structure Interactions during Voice Production. APPLIED SCIENCES-BASEL 2020; 10. [PMID: 34306737 PMCID: PMC8299989 DOI: 10.3390/app10020705] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
This paper proposes a machine-learning based reduced-order model that can provide fast and accurate prediction of the glottal flow during voice production. The model is based on the Bernoulli equation with a viscous loss term predicted by a deep neural network (DNN) model. The training data of the DNN model is a Navier-Stokes (N-S) equation-based three-dimensional simulation of glottal flows in various glottal shapes generated by a synthetic shape function, which can be obtained by superimposing the instantaneous modal displacements during vibration on the prephonatory geometry of the glottal shape. The input parameters of the DNN model are the geometric and flow parameters extracted from discretized cross sections of the glottal shapes and the output target is the corresponding flow resistance coefficient. With this trained DNN-Bernoulli model, the flow resistance coefficient as well as the flow rate and pressure distribution in any given glottal shape generated by the synthetic shape function can be predicted. The model is further coupled with a finite-element method based solid dynamics solver for simulating fluid-structure interactions (FSI). The prediction performance of the model for both static shape and FSI simulations is evaluated by comparing the solutions to those obtained by the Bernoulli and N-S model. The model shows a good prediction performance in accuracy and efficiency, suggesting a promise for future clinical use.
Collapse
|
12
|
Drioli C, Foresti GL. Fitting a biomechanical model of the folds to high-speed video data through bayesian estimation. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
13
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256#suppl] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
14
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1492. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 08/07/2019] [Accepted: 08/09/2019] [Indexed: 06/10/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
15
|
Bayesian Inference of Vocal Fold Material Properties from Glottal Area Waveforms Using a 2D Finite Element Model. APPLIED SCIENCES-BASEL 2019; 9. [PMID: 34046213 PMCID: PMC8153513 DOI: 10.3390/app9132735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Bayesian estimation has been previously demonstrated as a viable method for developing subject-specific vocal fold models from observations of the glottal area waveform. These prior efforts, however, have been restricted to lumped-element fitting models and synthetic observation data. The indirect relationship between the lumped-element parameters and physical tissue properties renders extracting the latter from the former difficult. Herein we propose a finite element fitting model, which treats the vocal folds as a viscoelastic deformable body comprised of three layers. Using the glottal area waveforms generated by self-oscillating silicone vocal folds we directly estimate the elastic moduli, density, and other material properties of the silicone folds using a Bayesian importance sampling approach. Estimated material properties agree with the “ground truth” experimental values to within 3% for most parameters. By considering cases with varying subglottal pressure and medial compression we demonstrate that the finite element model coupled with Bayesian estimation is sufficiently sensitive to distinguish between experimental configurations. Additional information not available experimentally, namely, contact pressures, are extracted from the developed finite element models. The contact pressures are found to increase with medial compression and subglottal pressure, in agreement with expectation.
Collapse
|
16
|
The relationship between biomechanics of pharyngoesophageal segment and tracheoesophageal phonation. Sci Rep 2019; 9:9722. [PMID: 31278355 PMCID: PMC6611845 DOI: 10.1038/s41598-019-46223-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 06/13/2019] [Indexed: 12/19/2022] Open
Abstract
This study examined the relationship between biomechanical features of the pharyngoesophageal (PE) segment, acoustic characteristics of tracheoesophageal (TE) phonation, and patients’ satisfaction with TE phonation. Fifteen patients using TE phonation after total laryngectomy completed the Voice Symptom Scale (VoiSS) and underwent acoustic voice analysis for cepstral peak prominence (CPP) and relative intensity. High resolution manometry (HRM) combined with videofluoroscopy was used to evaluate PE segment pressure and calculate the pressure gradient (ΔP), which was the pressure difference between the upper oesophagus and a point two centimetres above the vibrating PE segment. The upper oesophageal sphincter (UOS) minimal diameters were measured by Endolumenal Functional Lumen Imaging Probe (EndoFLIP). HRM detected rapid pressure changes at the level of the 4th – 6th cervical vertebra. CPP, relative intensity, and ΔP were significant predictors of satisfactory TE phonation. ΔP was a significant predictor of CPP and intensity. Minimal UOS diameter was a significant predictor of relative intensity of TE phonation. In two patients with unsuccessful TE phonation, endoscopic dilatation subsequently restored TE phonation. These findings suggest that sufficient ΔP and large UOS diameter are required for satisfactory TE phonation. Endoscopic dilatation increasing UOS diameter may provide a new approach to treat unsuccessful TE phonation.
Collapse
|
17
|
Estimating Vocal Fold Contact Pressure from Raw Laryngeal High-Speed Videoendoscopy Using a Hertz Contact Model. APPLIED SCIENCES-BASEL 2019; 9. [PMID: 34267956 PMCID: PMC8279006 DOI: 10.3390/app9112384] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The development of trauma-induced lesions of the vocal folds (VFs) has been linked to a high collision pressure on the VF surface. However, there are no direct methods for the clinical assessment of VF collision, thus limiting the objective assessment of these disorders. In this study, we develop a video processing technique to directly quantify the mechanical impact of the VFs using solely laryngeal kinematic data. The technique is based on an edge tracking framework that estimates the kinematic sequence of each VF edge with a Kalman filter approach and a Hertzian impact model to predict the contact force during the collision. The proposed formulation overcomes several limitations of prior efforts since it uses a more relevant VF contact geometry, it does not require calibrated physical dimensions, it is normalized by the tissue properties, and it applies a correction factor for using a superior view only. The proposed approach is validated against numerical models, silicone vocal fold models, and prior studies. A case study with high-speed videoendoscopy recordings provides initial insights between the sound pressure level and contact pressure. Thus, the proposed method has a high potential in clinical practice and could also be adapted to operate with laryngeal stroboscopic systems. A method to directly estimate the contact pressure of the vocal folds using uncalibrated laryngeal kinematic data is presented. The approach is promising in enhancing the objective assessment of vocal function in clinical settings, especially for studying same-subject variations.
Collapse
|
18
|
Manríquez R, Peterson SD, Prado P, Orio P, Galindo GE, Zañartu M. Neurophysiological Muscle Activation Scheme for Controlling Vocal Fold Models. IEEE Trans Neural Syst Rehabil Eng 2019; 27:1043-1052. [PMID: 30908260 PMCID: PMC6557719 DOI: 10.1109/tnsre.2019.2906030] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A physiologically-based scheme that incorporates inherent neurological fluctuations in the activation of intrinsic laryngeal muscles into a lumped-element vocal fold model is proposed. Herein, muscles are activated through a combination of neural firing rate and recruitment of additional motor units, both of which have stochastic components. The mathematical framework and underlying physiological assumptions are described, and the effects of the fluctuations are tested via a parametric analysis using a body-cover model of the vocal folds for steady-state sustained vowels. The inherent muscle activation fluctuations have a bandwidth that varies with the firing rate, yielding both low and high-frequency components. When applying the proposed fluctuation scheme to the voice production model, changes in the dynamics of the system can be observed, ranging from fluctuations in the fundamental frequency to unstable behavior near bifurcation regions. The resulting coefficient of variation of the model parameters is not uniform with muscle activation. The stochastic components of muscle activation influence both the fine structure variability and the ability to achieve a target value for pitch control. These components can have a significant impact on the vocal fold parameters, as well as the outputs of the voice production model. Good agreement was found when contrasting the proposed scheme with prior experimental studies accounting for variability in vocal fold posturing and spectral characteristics of the muscle activation signal. The proposed scheme constitutes a novel and physiologically-based approach for controlling lumped-element models for normal voice production and can be extended to explore neuropathological conditions.
Collapse
Affiliation(s)
- Rodrigo Manríquez
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Sean D. Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario, Canada
| | - Pavel Prado
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Patricio Orio
- Instituto de Neurociencia and Centro Interdisciplinario de Neurociencia de Valparaíso, Universidad de Valparaíso, Valparaíso, Chile
| | - Gabriel E. Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
19
|
Gómez P, Schützenberger A, Semmler M, Döllinger M. Laryngeal Pressure Estimation With a Recurrent Neural Network. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2018; 7:2000111. [PMID: 30680252 PMCID: PMC6331197 DOI: 10.1109/jtehm.2018.2886021] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 10/24/2018] [Accepted: 11/30/2018] [Indexed: 11/24/2022]
Abstract
Quantifying the physical parameters of voice production is essential for understanding the process of phonation and can aid in voice research and diagnosis. As an alternative to invasive measurements, they can be estimated by formulating an inverse problem using a numerical forward model. However, high-fidelity numerical models are often computationally too expensive for this. This paper presents a novel approach to train a long short-term memory network to estimate the subglottal pressure in the larynx at massively reduced computational cost using solely synthetic training data. We train the network on synthetic data from a numerical two-mass model and validate it on experimental data from 288 high-speed ex vivo video recordings of porcine vocal folds from a previous study. The training requires significantly fewer model evaluations compared with the previous optimization approach. On the test set, we maintain a comparable performance of 21.2% versus previous 17.7% mean absolute percentage error in estimating the subglottal pressure. The evaluation of one sample requires a vanishingly small amount of computation time. The presented approach is able to maintain estimation accuracy of the subglottal pressure at significantly reduced computational cost. The methodology is likely transferable to estimate other parameters and training with other numerical models. This improvement should allow the adoption of more sophisticated, high-fidelity numerical models of the larynx. The vast speedup is a critical step to enable a future clinical application and knowledge of parameters such as the subglottal pressure will aid in diagnosis and treatment selection.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| |
Collapse
|
20
|
Pathological Voice Source Analysis System Using a Flow Waveform-Matched Biomechanical Model. Appl Bionics Biomech 2018; 2018:3158439. [PMID: 30057647 PMCID: PMC6051280 DOI: 10.1155/2018/3158439] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Accepted: 05/24/2018] [Indexed: 11/24/2022] Open
Abstract
Voice production occurs through vocal cord and vibration coupled to glottal airflow. Vocal cord lesions affect the vocal system and lead to voice disorders. In this paper, a pathological voice source analysis system is designed. This study integrates nonlinear dynamics with an optimized asymmetric two-mass model to explore nonlinear characteristics of vocal cord vibration, and changes in acoustic parameters, such as fundamental frequency, caused by distinct subglottal pressure and varying degrees of vocal cord paralysis are analyzed. Various samples of sustained vowel /a/ of normal and pathological voices were extracted from MEEI (Massachusetts Eye and Ear Infirmary) database. A fitting procedure combining genetic particle swarm optimization and a quasi-Newton method was developed to optimize the biomechanical model parameters and match the targeted voice source. Experimental results validate the applicability of the proposed model to reproduce vocal cord vibration with high accuracy, and show that paralyzed vocal cord increases the model coupling stiffness.
Collapse
|
21
|
Towards Fully Automated Determination of Laryngeal Adductor Reflex Latencies through High-Speed Laryngoscopy Image Processing. BILDVERARBEITUNG FÜR DIE MEDIZIN 2018 2018. [DOI: 10.1007/978-3-662-56537-7_41] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
22
|
Gómez P, Schützenberger A, Kniesburges S, Bohr C, Döllinger M. Physical parameter estimation from porcine ex vivo vocal fold dynamics in an inverse problem framework. Biomech Model Mechanobiol 2017; 17:777-792. [DOI: 10.1007/s10237-017-0992-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/30/2017] [Indexed: 11/28/2022]
|
23
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
24
|
Galindo GE, Peterson SD, Erath BD, Castro C, Hillman RE, Zañartu M. Modeling the Pathophysiology of Phonotraumatic Vocal Hyperfunction With a Triangular Glottal Model of the Vocal Folds. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:2452-2471. [PMID: 28837719 PMCID: PMC5831616 DOI: 10.1044/2017_jslhr-s-16-0412] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 04/19/2017] [Indexed: 05/08/2023]
Abstract
PURPOSE Our goal was to test prevailing assumptions about the underlying biomechanical and aeroacoustic mechanisms associated with phonotraumatic lesions of the vocal folds using a numerical lumped-element model of voice production. METHOD A numerical model with a triangular glottis, posterior glottal opening, and arytenoid posturing is proposed. Normal voice is altered by introducing various prephonatory configurations. Potential compensatory mechanisms (increased subglottal pressure, muscle activation, and supraglottal constriction) are adjusted to restore an acoustic target output through a control loop that mimics a simplified version of auditory feedback. RESULTS The degree of incomplete glottal closure in both the membranous and posterior portions of the folds consistently leads to a reduction in sound pressure level, fundamental frequency, harmonic richness, and harmonics-to-noise ratio. The compensatory mechanisms lead to significantly increased vocal-fold collision forces, maximum flow-declination rate, and amplitude of unsteady flow, without significantly altering the acoustic output. CONCLUSION Modeling provided potentially important insights into the pathophysiology of phonotraumatic vocal hyperfunction by demonstrating that compensatory mechanisms can counteract deterioration in the voice acoustic signal due to incomplete glottal closure, but this also leads to high vocal-fold collision forces (reflected in aerodynamic measures), which significantly increases the risk of developing phonotrauma.
Collapse
Affiliation(s)
- Gabriel E. Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Sean D. Peterson
- Mechanical and Mechatronics Engineering, University of Waterloo, Ontario, Canada
| | - Byron D. Erath
- Department of Mechanical & Aeronautical Engineering, Clarkson University, Potsdam, NY
| | - Christian Castro
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
- School of Speech and Hearing Sciences, Universidad de Valparaíso, Chile
| | - Robert E. Hillman
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston
- Harvard Medical School, Boston, MA
- MGH Institute of Health Professions, Boston, MA
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
25
|
Granados A, Brunskog J. An optical flow-based state-space model of the vocal folds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:EL543. [PMID: 28618804 DOI: 10.1121/1.4983628] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
High-speed movies of the vocal fold vibration are valuable data to reveal vocal fold features for voice pathology diagnosis. This work presents a suitable Bayesian model and a purely theoretical discussion for further development of a framework for continuum biomechanical features estimation. A linear and Gaussian nonstationary state-space model is proposed and thoroughly discussed. The evolution model is based on a self-sustained three-dimensional finite element model of the vocal folds, and the observation model involves a dense optical flow algorithm. The results show that the method is able to capture different deformation patterns between the computed optical flow and the finite element deformation, controlled by the choice of the model tissue parameters.
Collapse
Affiliation(s)
- Alba Granados
- Acoustic Technology, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby DK-2800, Denmark ,
| | - Jonas Brunskog
- Acoustic Technology, Department of Electrical Engineering, Technical University of Denmark, Kongens Lyngby DK-2800, Denmark ,
| |
Collapse
|
26
|
Hadwin PJ, Peterson SD. An extended Kalman filter approach to non-stationary Bayesian estimation of reduced-order vocal fold model parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2909. [PMID: 28464670 DOI: 10.1121/1.4981240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The Bayesian framework for parameter inference provides a basis from which subject-specific reduced-order vocal fold models can be generated. Previously, it has been shown that a particle filter technique is capable of producing estimates and associated credibility intervals of time-varying reduced-order vocal fold model parameters. However, the particle filter approach is difficult to implement and has a high computational cost, which can be barriers to clinical adoption. This work presents an alternative estimation strategy based upon Kalman filtering aimed at reducing the computational cost of subject-specific model development. The robustness of this approach to Gaussian and non-Gaussian noise is discussed. The extended Kalman filter (EKF) approach is found to perform very well in comparison with the particle filter technique at dramatically lower computational cost. Based upon the test cases explored, the EKF is comparable in terms of accuracy to the particle filter technique when greater than 6000 particles are employed; if less particles are employed, the EKF actually performs better. For comparable levels of accuracy, the solution time is reduced by 2 orders of magnitude when employing the EKF. By virtue of the approximations used in the EKF, however, the credibility intervals tend to be slightly underpredicted.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| |
Collapse
|
27
|
Volgger V, Felicio A, Lohscheller J, Englhard AS, Al-Muzaini H, Betz CS, Schuster ME. Evaluation of the combined use of narrow band imaging and high-speed imaging to discriminate laryngeal lesions. Lasers Surg Med 2017; 49:609-618. [PMID: 28231400 DOI: 10.1002/lsm.22652] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/04/2017] [Indexed: 02/05/2023]
Abstract
BACKGROUND AND OBJECTIVE Laryngeal lesions are usually investigated by microlaryngoscopy, biopsy, and histopathology. This study aimed to evaluate the combined use of Narrow Band Imaging (NBI) and High-Speed Imaging (HSI) in the differentiation of glottic lesions in awake patients. STUDY DESIGN Prospective diagnostic study. MATERIALS AND METHODS Thirty-six awake patients with 41 glottic lesions were investigated with both NBI and HSI, and the suspected diagnoses were compared to the histopathological results of tissue biopsies taken during subsequent microlaryngoscopies. Of the 41 lesions, 28 were primary lesions and 13 recurrent lesions after previous laryngeal pathologies. RESULTS Sensitivity, specificity, positive predictive value, and negative predictive value in the differentiation between benign/premalignant and malignant lesions with both NBI and HSI accounted to 100.0%, 79.4%, 50.0%, and 100.0%. Sensitivities and specificities were 100.0% and 85.7% for HSI alone, and 100.0% and 79.4% for NBI alone. Regarding only primary lesions the results were generally better with sensitivities and specificities of 100% and 81% for NBI, 100% and 84.2% for HSI and 100% and 85.7% for the combination of both methods, respectively. CONCLUSION NBI and HSI both seem to be promising adjunct tools in the differentiation of various laryngeal lesions in awake patients with high sensitivities. Specificities, however, were moderate but could be increased when using NBI and HSI in combination in a subgroup of patients with only primary lesions. Although both methods still have limitations they might ameliorate the evaluation of suspicious laryngeal lesions in the future and could possibly spare patients from repeated invasive tissue biopsies. Lasers Surg. Med. 49:609-618, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Veronika Volgger
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Axelle Felicio
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Jörg Lohscheller
- Department of Informatics, Trier University of Applied Sciences, Schneidershof, 54208, Trier, Germany
| | - Anna S Englhard
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Hanan Al-Muzaini
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Christian S Betz
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| | - Maria E Schuster
- Department of Otorhinolaryngology, Head and Neck Surgery, Klinikum der Universität München, 81377, Munich, Germany
| |
Collapse
|
28
|
Zhang Z. Mechanics of human voice production and control. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2614. [PMID: 27794319 PMCID: PMC5412481 DOI: 10.1121/1.4964509] [Citation(s) in RCA: 166] [Impact Index Per Article: 20.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
As the primary means of communication, voice plays an important role in daily life. Voice also conveys personal information such as social status, personal traits, and the emotional state of the speaker. Mechanically, voice production involves complex fluid-structure interaction within the glottis and its control by laryngeal muscle activation. An important goal of voice research is to establish a causal theory linking voice physiology and biomechanics to how speakers use and control voice to communicate meaning and personal information. Establishing such a causal theory has important implications for clinical voice management, voice training, and many speech technology applications. This paper provides a review of voice physiology and biomechanics, the physics of vocal fold vibration and sound production, and laryngeal muscular control of the fundamental frequency of voice, vocal intensity, and voice quality. Current efforts to develop mechanical and computational models of voice production are also critically reviewed. Finally, issues and future challenges in developing a causal theory of voice production and perception are discussed.
Collapse
Affiliation(s)
- Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, 31-24 Rehabilitation Center, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA
| |
Collapse
|
29
|
Abstract
Objectives: Kymographic imaging through videokymography has been recognized as a convenient, novel way to display laryngeal behavior, yet little systematic research has been done to map the relevant features displayed in such images. Here we have aimed at specification of these features to enable systematic visual characterization and categorization of vocal fold vibratory patterns in voice disorders. Methods: A cross-sectional, descriptive design was used. We selected 45 subjects and extracted 100 videokymographic images from the archive of more than 7,000 videokymographic examinations of subjects with a wide range of voice disorders. The images showed a large variety of vocal fold vibratory behaviors during sustained phonations. We visually identified the prominent features that distinguished the vibration patterns across the images. Results: We divided the findings into 10 feature categories. They included refined traditional features (eg, mucosal waves), as well as additional features that are obscured in strobolaryngoscopy (eg, different types of irregularities, left-right frequency differences, shapes of lateral and medial peaks, cycle aberrations). Conclusions: The variations in the identified features reveal different behavioral origins of voice disorders. The findings open new possibilities for objective documentation and for monitoring vocal fold behavior in clinical practice through kymographic imaging.
Collapse
Affiliation(s)
- Jan G Svec
- Groningen Voice Research Laboratory, Dept of Biomedical Engineering, University Medical Center Groningen, University of Groningen, Antonius Deusinglaan 1, NL 9713 AV Groningen, the Netherlands
| | | | | |
Collapse
|
30
|
Hadwin PJ, Galindo GE, Daun KJ, Zañartu M, Erath BD, Cataldo E, Peterson SD. Non-stationary Bayesian estimation of parameters from a body cover model of the vocal folds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2683. [PMID: 27250162 PMCID: PMC10423076 DOI: 10.1121/1.4948755] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 04/15/2016] [Accepted: 04/22/2016] [Indexed: 05/09/2023]
Abstract
The evolution of reduced-order vocal fold models into clinically useful tools for subject-specific diagnosis and treatment hinges upon successfully and accurately representing an individual patient in the modeling framework. This, in turn, requires inference of model parameters from clinical measurements in order to tune a model to the given individual. Bayesian analysis is a powerful tool for estimating model parameter probabilities based upon a set of observed data. In this work, a Bayesian particle filter sampling technique capable of estimating time-varying model parameters, as occur in complex vocal gestures, is introduced. The technique is compared with time-invariant Bayesian estimation and least squares methods for determining both stationary and non-stationary parameters. The current technique accurately estimates the time-varying unknown model parameter and maintains tight credibility bounds. The credibility bounds are particularly relevant from a clinical perspective, as they provide insight into the confidence a clinician should have in the model predictions.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Gabriel E Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Kyle J Daun
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, New York 13699, USA
| | - Edson Cataldo
- Applied Mathematics Department, Graduate Program in Electrical and Telecommunications Engineering (PPGEET), Universidade Federal Fluminense, Niteroi, Rio de Janeiro, CEP24020-140, Brazil
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
31
|
Panek D, Skalski A, Zielinski T, Deliyski DD. Voice pathology classification based on High-Speed Videoendoscopy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:735-8. [PMID: 26736367 DOI: 10.1109/embc.2015.7318467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This work presents a method for automatical and objective classification of patients with healthy and pathological vocal fold vibration impairments using High-Speed Videoendoscopy of the larynx. We used an image segmentation and extraction of a novel set of numerical parameters describing the spatio-temporal dynamics of vocal folds to classification according to the normal and pathological cases and achieved 73,3% cross-validation classification accuracy. This approach is promising to develop an automatic diagnosis tool of voice disorders.
Collapse
|
32
|
Unger J, Schuster M, Hecker DJ, Schick B, Lohscheller J. A generalized procedure for analyzing sustained and dynamic vocal fold vibrations from laryngeal high-speed videos using phonovibrograms. Artif Intell Med 2015; 66:15-28. [PMID: 26597002 DOI: 10.1016/j.artmed.2015.10.002] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 09/28/2015] [Accepted: 10/20/2015] [Indexed: 12/01/2022]
Abstract
OBJECTIVE This work presents a computer-based approach to analyze the two-dimensional vocal fold dynamics of endoscopic high-speed videos, and constitutes an extension and generalization of a previously proposed wavelet-based procedure. While most approaches aim for analyzing sustained phonation conditions, the proposed method allows for a clinically adequate analysis of both dynamic as well as sustained phonation paradigms. MATERIALS AND METHODS The analysis procedure is based on a spatio-temporal visualization technique, the phonovibrogram, that facilitates the documentation of the visible laryngeal dynamics. From the phonovibrogram, a low-dimensional set of features is computed using a principle component analysis strategy that quantifies the type of vibration patterns, irregularity, lateral symmetry and synchronicity, as a function of time. Two different test bench data sets are used to validate the approach: (I) 150 healthy and pathologic subjects examined during sustained phonation. (II) 20 healthy and pathologic subjects that were examined twice: during sustained phonation and a glissando from a low to a higher fundamental frequency. In order to assess the discriminative power of the extracted features, a Support Vector Machine is trained to distinguish between physiologic and pathologic vibrations. The results for sustained phonation sequences are compared to the previous approach. Finally, the classification performance of the stationary analyzing procedure is compared to the transient analysis of the glissando maneuver. RESULTS For the first test bench the proposed procedure outperformed the previous approach (proposed feature set: accuracy: 91.3%, sensitivity: 80%, specificity: 97%, previous approach: accuracy: 89.3%, sensitivity: 76%, specificity: 96%). Comparing the classification performance of the second test bench further corroborates that analyzing transient paradigms provides clear additional diagnostic value (glissando maneuver: accuracy: 90%, sensitivity: 100%, specificity: 80%, sustained phonation: accuracy: 75%, sensitivity: 80%, specificity: 70%). CONCLUSIONS The incorporation of parameters describing the temporal evolvement of vocal fold vibration clearly improves the automatic identification of pathologic vibration patterns. Furthermore, incorporating a dynamic phonation paradigm provides additional valuable information about the underlying laryngeal dynamics that cannot be derived from sustained conditions. The proposed generalized approach provides a better overall classification performance than the previous approach, and hence constitutes a new advantageous tool for an improved clinical diagnosis of voice disorders.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany.
| | - Maria Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, Marchioninistr. 13, 81366 München, Germany
| | - Dietmar J Hecker
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Kirrbergerstr., 66424 Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, 54293 Trier, Germany
| |
Collapse
|
33
|
Hüttner B, Luegmair G, Patel RR, Ziethe A, Eysholdt U, Bohr C, Sebova I, Semmler M, Döllinger M. Development of a time-dependent numerical model for the assessment of non-stationary pharyngoesophageal tissue vibrations after total laryngectomy. Biomech Model Mechanobiol 2014; 14:169-84. [PMID: 24861998 DOI: 10.1007/s10237-014-0597-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 05/14/2014] [Indexed: 11/29/2022]
Abstract
Laryngeal cancer due to, e.g., extensive smoking and/or alcohol consumption can necessitate the excision of the entire larynx. After such a total laryngectomy, the voice generating structures are lost and with that the quality of life of the concerning patients is drastically reduced. However, the vibrations of the remaining tissue in the so called pharyngoesophageal (PE) segment can be applied as alternative sound generator. Tissue, scar, and geometric aspects of the PE-segment determine the postoperative substitute voice characteristic, being highly important for the future live of the patient. So far, PE-dynamics are simulated by a biomechanical model which is restricted to stationary vibrations, i.e., variations in pitch and amplitude cannot be handled. In order to investigate the dynamical range of PE-vibrations, knowledge about the temporal processes during substitute voice production is of crucial interest. Thus, time-dependent model parameters are suggested in order to quantify non-stationary PE-vibrations and drawing conclusions on the temporal characteristics of tissue stiffness, oscillating mass, pressure, and geometric distributions within the PE-segment. To adapt the numerical model to the PE-vibrations, an automatic, block-based optimization procedure is applied, comprising a combined global and local optimization approach. The suggested optimization procedure is validated with 75 synthetic data sets, simulating non-stationary oscillations of differently shaped PE-segments. The application to four high-speed recordings is shown and discussed. The correlation between model and PE-dynamics is ≥ 97%.
Collapse
Affiliation(s)
- Björn Hüttner
- Department of Phoniatrics and Pediatric Audiology, Medical School, University Hospital Erlangen, Bohlenplatz 21, 91054 , Erlangen, Germany,
| | | | | | | | | | | | | | | | | |
Collapse
|
34
|
Unger J, Hecker DJ, Kunduk M, Schuster M, Schick B, Lohscheller J. Quantifying spatiotemporal properties of vocal fold dynamics based on a multiscale analysis of phonovibrograms. IEEE Trans Biomed Eng 2014; 61:2422-33. [PMID: 24771562 DOI: 10.1109/tbme.2014.2318774] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In order to objectively assess the laryngeal vibratory behavior, endoscopic high-speed cameras capture several thousand frames per second of the vocal folds during phonation. However, judging all inherent clinically relevant features is a challenging task and requires well-founded expert knowledge. In this study, an automated wavelet-based analysis of laryngeal high-speed videos based on phonovibrograms is presented. The phonovibrogram is an image representation of the spatiotemporal pattern of vocal fold vibration and constitutes the basis for a computer-based analysis of laryngeal dynamics. The features extracted from the wavelet transform are shown to be closely related to a basic set of video-based measurements categorized by the European Laryngological Society for a subjective assessment of pathologic voices. The wavelet-based analysis further offers information about irregularity and lateral asymmetry and asynchrony. It is demonstrated in healthy and pathologic subjects as well as for a surgical group that was examined before and after the removal of a vocal fold polyp. The features were found to not only classify glottal closure characteristics but also quantify the impact of pathologies on the vibratory behavior. The interpretability and the discriminative power of the proposed feature set show promising relevance for a computer-assisted diagnosis and classification of voice disorders.
Collapse
|
35
|
Patel R, Dubrovskiy D, Döllinger M. Characterizing vibratory kinematics in children and adults with high-speed digital imaging. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2014; 57:S674-86. [PMID: 24686982 PMCID: PMC7315516 DOI: 10.1044/2014_jslhr-s-12-0278] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
PURPOSE The aim of this study is to quantify and identify characteristic vibratory motion in typically developing prepubertal children and young adults using high-speed digital imaging. METHOD The vibrations of the vocal folds were recorded from 27 children (ages 5-9 years) and 35 adults (ages 21-45 years), with high speed at 4,000 frames per second for sustained phonation. Kinematic features of amplitude periodicity, time periodicity, phase asymmetry, spatial symmetry, and glottal gap index were analyzed from the glottal area waveform across mean and standard deviation (i.e., intercycle variability) for each measure. RESULTS Children exhibited lower mean amplitude periodicity compared to men and women and lower time periodicity compared to men. Children and women exhibited greater variability in amplitude periodicity, time periodicity, phase asymmetry, and glottal gap index compared to men. Women had lower mean values of amplitude periodicity and time periodicity compared to men. CONCLUSION Children differed both spatially but more temporally in vocal fold motion, suggesting the need for the development of children-specific kinematic norms. Results suggest more uncontrolled vibratory motion in children, reflecting changes in the vocal fold layered structure and aero-acoustic source mechanisms.
Collapse
|
36
|
Tang S, Zhang Y, Qin X, Wang S, Wan M. Measuring body layer vibration of vocal folds by high-frame-rate ultrasound synchronized with a modified electroglottograph. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:528-538. [PMID: 23862828 DOI: 10.1121/1.4807652] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
The body-cover concept suggests that the vibration of body layer is an indispensable component of vocal fold vibration. To quantify this vibration, a synchronized system composed of a high-frame-rate ultrasound and a modified electroglottograph (EGG) was employed in this paper to simultaneously image the body layer vibration and record the vocal fold vibration phase information during natural phonations. After data acquisition, the displacements of in vivo body layer vibrations were measured from the ultrasonic radio frequency data, and the temporal reconstruction method was used to enhance the measurement accuracy. Results showed that the modified EGG, the waveform and characteristic points of which were identical to the conventional EGG, resolved the position conflict between the ultrasound transducer and EGG electrodes. The location and range of the vibrating body layer in the estimated displacement image were more clear and discernible than in the ultrasonic B-mode image. Quantitative analysis for vibration features of the body layer demonstrated that the body layer moved as a unit in the superior-inferior direction during the phonation of normal chest registers.
Collapse
Affiliation(s)
- Shanshan Tang
- The Key Laboratory of Biomedical Information Engineering of the Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi' an Jiaotong University, Xi' an 710049, People's Republic of China
| | | | | | | | | |
Collapse
|
37
|
Unger J, Meyer T, Doellinger M, Hecker DJ, Schick B, Lohscheller J. A wavelet-based approach for a continuous analysis of phonovibrograms. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2013; 2012:4410-3. [PMID: 23366905 DOI: 10.1109/embc.2012.6346944] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recently, endoscopic high-speed laryngoscopy has been established for commercial use and constitutes a state-of-the-art technique to examine vocal fold dynamics. Despite overcoming many limitations of commonly applied stroboscopy it has not gained widespread clinical application, yet. A major drawback is a missing methodology of extracting valuable features to support visual assessment or computer-aided diagnosis. In this paper a compact and descriptive feature set is presented. The feature extraction routines are based on two-dimensional color graphs called phonovibrograms (PVG). These graphs contain the full spatio-temporal pattern of vocal fold dynamics and are therefore suited to derive features that comprehensively describe the vibration pattern of vocal folds. Within our approach, clinically relevant features such as glottal closure type, symmetry and periodicity are quantified in a set of 10 descriptive features. The suitability for classification tasks is shown using a clinical data set comprising 50 healthy and 50 paralytic subjects. A classification accuracy of 93.2% has been achieved.
Collapse
Affiliation(s)
- Jakob Unger
- Department of Computer Science, University of Applied Science Trier, Trier, Germany
| | | | | | | | | | | |
Collapse
|
38
|
Manfredi C, Bocchi L, Cantarella G, Peretti G. Videokymographic image processing: Objective parameters and user-friendly interface. Biomed Signal Process Control 2012. [DOI: 10.1016/j.bspc.2011.02.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
39
|
Yang A, Berry DA, Kaltenbacher M, Döllinger M. Three-dimensional biomechanical properties of human vocal folds: parameter optimization of a numerical model to match in vitro dynamics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1378-90. [PMID: 22352511 PMCID: PMC3292609 DOI: 10.1121/1.3676622] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/10/2011] [Revised: 12/14/2011] [Accepted: 12/21/2011] [Indexed: 05/24/2023]
Abstract
The human voice signal originates from the vibrations of the two vocal folds within the larynx. The interactions of several intrinsic laryngeal muscles adduct and shape the vocal folds to facilitate vibration in response to airflow. Three-dimensional vocal fold dynamics are extracted from in vitro hemilarynx experiments and fitted by a numerical three-dimensional-multi-mass-model (3DM) using an optimization procedure. In this work, the 3DM dynamics are optimized over 24 experimental data sets to estimate biomechanical vocal fold properties during phonation. Accuracy of the optimization is verified by low normalized error (0.13 ± 0.02), high correlation (83% ± 2%), and reproducible subglottal pressure values. The optimized, 3DM parameters yielded biomechanical variations in tissue properties along the vocal fold surface, including variations in both the local mass and stiffness of vocal folds. That is, both mass and stiffness increased along the superior-to-inferior direction. These variations were statistically analyzed under different experimental conditions (e.g., an increase in tension as a function of vocal fold elongation and an increase in stiffness and a decrease in mass as a function of glottal airflow). The study showed that physiologically relevant vocal fold tissue properties, which cannot be directly measured during in vivo human phonation, can be captured using this 3D-modeling technique.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Bohlenplatz 21, 91054 Erlangen, Germany.
| | | | | | | |
Collapse
|
40
|
Yang A, Stingl M, Berry DA, Lohscheller J, Voigt D, Eysholdt U, Dollinger M. Computation of physiological human vocal fold parameters by mathematical optimization of a biomechanical model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:948-64. [PMID: 21877808 PMCID: PMC3195891 DOI: 10.1121/1.3605551] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
With the use of an endoscopic, high-speed camera, vocal fold dynamics may be observed clinically during phonation. However, observation and subjective judgment alone may be insufficient for clinical diagnosis and documentation of improved vocal function, especially when the laryngeal disease lacks any clear morphological presentation. In this study, biomechanical parameters of the vocal folds are computed by adjusting the corresponding parameters of a three-dimensional model until the dynamics of both systems are similar. First, a mathematical optimization method is presented. Next, model parameters (such as pressure, tension and masses) are adjusted to reproduce vocal fold dynamics, and the deduced parameters are physiologically interpreted. Various combinations of global and local optimization techniques are attempted. Evaluation of the optimization procedure is performed using 50 synthetically generated data sets. The results show sufficient reliability, including 0.07 normalized error, 96% correlation, and 91% accuracy. The technique is also demonstrated on data from human hemilarynx experiments, in which a low normalized error (0.16) and high correlation (84%) values were achieved. In the future, this technique may be applied to clinical high-speed images, yielding objective measures with which to document improved vocal function of patients with voice disorders.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
41
|
Qin X, Wu L, Jiang H, Tang S, Wang S, Wan M. Measuring body-cover vibration of vocal folds based on high frame rate ultrasonic imaging and high-speed video. IEEE Trans Biomed Eng 2011; 58. [PMID: 21606016 DOI: 10.1109/tbme.2011.2157156] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Vibration of vocal folds is a body-cover layered vibration pattern due to the two-layer tissue structures of vocal folds. A method based on a synchronal imaging system is proposed in order to image and measure the body-cover vibration pattern of vocal folds. This imaging system contains two parts: high-frame-rate ultrasonic imaging part and high-speed video part, which can synchronously image the vibration of the body and cover layers a thigh speed. Then, image analysis methods are applied to measure the body-cover vibration of vocal folds from both recorded image sequences. We analyze characteristics of body-layer vibration based on the measurements from designed experiments. Moreover, these results meet simulations of a body-cover model.
Collapse
|
42
|
Schwarz R, Huttner B, Döllinger M, Luegmair G, Eysholdt U, Schuster M, Lohscheller J, Gurlek E. Substitute voice production: quantification of PE segment vibrations using a biomechanical model. IEEE Trans Biomed Eng 2011; 58:2767-76. [PMID: 21558056 DOI: 10.1109/tbme.2011.2151860] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
After total larynx excision due to laryngeal cancer, the tracheoesophageal substitute tissue vibrations at the intersection between the pharynx and the esophagus [pharyngoesophageal segment (PE segment)] serve as voice generator. The quality of the substitute voice significantly depends on the vibratory characteristics of the PE segment. For improving voice rehabilitation, the relationship between the PE dynamics and the resulting substitute voice quality is a matter of particular interest. Precondition for a comprehensive analysis of this relationship is an objective quantification of the PE vibrations. For quantification purposes, a method is proposed, which is based on the reproduction of the tissue vibrations by means of a biomechanical model of the PE segment. An optimization procedure for an automatic determination of appropriate model parameters is suggested to adapt the model dynamics to tissue movements extracted from high-speed (HS) videos. The applicability of the optimization procedure is evaluated with ten synthetic data sets. A mean error of 8.2% for the determination of previously defined model parameters was achieved as well as an overall stability of 7.1%. The application of the model to six HS recordings presented a mean correlation of the vibration patterns of 82%.
Collapse
Affiliation(s)
- Raphael Schwarz
- Healthcare Sector and the Imaging & Therapy Division Magnetic Resonance, Siemens AG, 91052 Erlangen, Germany.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Kelleher JE, Siegmund T, Chan RW, Henslee EA. Optical measurements of vocal fold tensile properties: implications for phonatory mechanics. J Biomech 2011; 44:1729-34. [PMID: 21497355 DOI: 10.1016/j.jbiomech.2011.03.037] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2010] [Revised: 03/11/2011] [Accepted: 03/29/2011] [Indexed: 11/29/2022]
Abstract
In voice research, in vitro tensile stretch experiments of vocal fold tissues are commonly employed to determine the tissue biomechanical properties. In the standard stretch-release protocol, tissue deformation is computed from displacements applied to sutures inserted through the thyroid and arytenoid cartilages, with the cartilages assumed to be rigid. Here, a non-contact optical method was employed to determine the actual tissue deformation of vocal fold lamina propria specimens from three excised human larynges in uniaxial tensile tests. Specimen deformation was found to consist not only of deformation of the tissue itself, but also deformation of the cartilages, as well as suture alignment and tightening. Stress-stretch curves of a representative load cycle were characterized by an incompressible Ogden model. The initial longitudinal elastic modulus was found to be considerably higher if determined based on optical displacement measurements than typical values reported in the literature. The present findings could change the understanding of the mechanics underlying vocal fold vibration. Given the high longitudinal elastic modulus the lamina propria appeared to demonstrate a substantial level of anisotropy. Consequently, transverse shear could play a significant role in vocal fold vibration, and fundamental frequencies of phonation should be predicted by beam theories accounting for such effects.
Collapse
Affiliation(s)
- Jordan E Kelleher
- Mechanical Engineering, 585 Purdue Mall, Purdue University, West Lafayette, IN 47907, USA
| | | | | | | |
Collapse
|
44
|
Zhang Y, Regner MF, Jiang JJ. Theoretical modeling and experimental high-speed imaging of elongated vocal folds. IEEE Trans Biomed Eng 2010; 58:2725-31. [PMID: 21118763 DOI: 10.1109/tbme.2010.2095012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this paper, the role of vocal fold elongation in governing glottal movement dynamics was theoretically and experimentally investigated. A theoretical model was first proposed to incorporate vocal fold elongation into the two-mass model. This model predicted the direct and nondirect components of the glottal time series as a function of vocal fold elongation. Furthermore, high-speed digital imaging was applied in excised larynx experiments to visualize vocal fold vibrations with variable vocal fold elongation from -10% to 50% and subglottal pressures of 18- and 24-cm H(2)O. Comparison between theoretical model simulations and experimental observations showed good agreement. A relative maximum was seen in the nondirect component of glottal area, suggesting that an optimal elongation could maximize the vocal fold vibratory power. However, sufficiently large vocal fold elongations caused the nondirect component to approach zero and the direct component to approach a constant. These results showed that vocal fold elongation plays an important role in governing the dynamics of glottal area movement and validated the applicability of the proposed theoretical model and high-speed imaging to investigate laryngeal activity.
Collapse
Affiliation(s)
- Yu Zhang
- Laboratory of Underwater Acoustic Communication and Marine Information Technology of the Ministry of Education, College of Oceanography and Environmental Science, Xiamen University, Xiamen 361005, China.
| | | | | |
Collapse
|
45
|
Yang A, Lohscheller J, Berry DA, Becker S, Eysholdt U, Voigt D, Döllinger M. Biomechanical modeling of the three-dimensional aspects of human vocal fold dynamics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1014-31. [PMID: 20136223 PMCID: PMC3137461 DOI: 10.1121/1.3277165] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 10/15/2009] [Accepted: 11/24/2009] [Indexed: 05/23/2023]
Abstract
Human voice originates from the three-dimensional (3D) oscillations of the vocal folds. In previous studies, biomechanical properties of vocal fold tissues have been predicted by optimizing the parameters of simple two-mass-models to fit its dynamics to the high-speed imaging data from the clinic. However, only lateral and longitudinal displacements of the vocal folds were considered. To extend previous studies, a 3D mass-spring, cover-model is developed, which predicts the 3D vibrations of the entire medial surface of the vocal fold. The model consists of five mass planes arranged in vertical direction. Each plane contains five longitudinal, mass-spring, coupled oscillators. Feasibility of the model is assessed using a large body of dynamical data previously obtained from excised human larynx experiments, in vivo canine larynx experiments, physical models, and numerical models. Typical model output was found to be similar to existing findings. The resulting model enables visualization of the 3D dynamics of the human vocal folds during phonation for both symmetric and asymmetric vibrations.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Bohlenplatz 21, 91054 Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
46
|
Advances in laryngeal imaging. Eur Arch Otorhinolaryngol 2009; 266:1509-20. [PMID: 19618198 DOI: 10.1007/s00405-009-1050-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 07/07/2009] [Indexed: 10/20/2022]
Abstract
Imaging and image analysis became an important issue in laryngeal diagnostics. Various techniques, such as videostroboscopy, videokymography, digital kymography, or ultrasonography are available and are used in research and clinical practice. This paper reviews recent advances in imaging for laryngeal diagnostics.
Collapse
|
47
|
Qin X, Wang S, Wan M. Improving Reliability and Accuracy of Vibration Parameters of Vocal Folds Based on High-Speed Video and Electroglottography. IEEE Trans Biomed Eng 2009; 56:1744-54. [PMID: 19272979 DOI: 10.1109/tbme.2009.2015772] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Xulei Qin
- Key Laboratory of Biomedical Information Engineering of Ministry of Education and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
| | | | | |
Collapse
|
48
|
Döllinger M, Rosanowski F, Eysholdt U, Lohscheller J. [Basic research on vocal fold dynamics: three-dimensional vibration analysis of human and canine larynges]. HNO 2009; 56:1213-20. [PMID: 17431569 DOI: 10.1007/s00106-007-1549-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
BACKGROUND The understanding of normal and pathological vocal fold dynamics is the basis for a pathophysiological motivated voice therapy. Crucial vocal fold dynamics concerning voice production occur at the medial part of the vocal fold which is seen as the most critical region of mucosal wave propagation. Due to the limited size of the larynx the possibilities of laryngeal imaging by endoscopic techniques are limited. MATERIAL AND METHODS This work describes an experimental set-up that enables quantification of the entire medial and superior vocal fold surface using excised human and in vivo canine larynges. RESULTS The data obtained enable analysis of vocal fold deflections, velocities, and mucosal wave propagation. The reciprocal dependencies can be examined and different areas of vocal fold dynamics located. The vertical components obscured in clinical endoscopy can be visualized. This is not negligible. CONCLUSIONS In particular it is shown that the vertical deflection, which cannot be observed by clinical examination, plays an important part in the dynamics and therefore cannot be omitted for therapeutic procedures. The theoretically assumed entrainment and influence of the two main vibration modes enabling normal phonation is confirmed.
Collapse
Affiliation(s)
- M Döllinger
- Abteilung für Phoniatrie und Pädaudiologie, Universitätsklinikum Erlangen, Erlangen, Germany.
| | | | | | | |
Collapse
|
49
|
Abstract
BACKGROUND Stroboscopy is widely used and is quite adequate for the examination of normal voices, but with increasing hoarseness its suitability declines, even when it is supplemented by video recordings and image evaluation. Real-time procedures such as videokymography or high-speed (HS) video imaging are more suitable methods of observing the movements of the vocal folds in such cases. A drawback of any video recording is the later time-consuming offline replay of the films in slow motion and our restricted pattern recognition for motion and other time-dependent processes. METHODS The phonovibrogram (PVG) is an image-processing algorithm that extracts the vocal fold motions of a whole laryngoscopic HS video film and automatically compresses them into a single image. RESULTS Simple patterns that vary from person to person are revealed by PVG; these can be categorized by means of simple geometric forms, which a human observer can more easily recognize and interpret than dynamic motion patterns. The PVG computation is described in detail and an extensive guide to interpretation is given, illustrated by reference to theoretical and real examples. CONCLUSION In clinical conditions, HS laryngoscopic video recording is useful only in association with automatic image processing. The PVG procedure is a promising approach and tests should be performed with a view to further clinical validation.
Collapse
Affiliation(s)
- U Eysholdt
- Abteilung für Phoniatrie und Pädaudiologie, Universitätsklinikum Erlangen, Erlangen, Germany.
| | | |
Collapse
|
50
|
Variability of Normal Vocal Fold Dynamics for Different Vocal Loading in One Healthy Subject Investigated by Phonovibrograms. J Voice 2009; 23:175-81. [DOI: 10.1016/j.jvoice.2007.09.008] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2007] [Accepted: 09/25/2007] [Indexed: 10/22/2022]
|