1
|
Kniesburges S. [Larynx Models in Voice Research and their Applications]. Laryngorhinootologie 2024; 103:775-778. [PMID: 38471542 DOI: 10.1055/a-2249-2785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/14/2024]
Abstract
In this work, different types of larynx models are introduced and their applications with regard to voice generation are shown with two examples: ventricular folds impact and endoscopic evaluation of vocal fold tissue characteristics.
Collapse
Affiliation(s)
- Stefan Kniesburges
- Phoniatrie und Pädaudiologie, Univ. Klinikum Erlangen, Hals-Nasen-Ohren-Klinik, Erlangen, Germany
| |
Collapse
|
2
|
Perrine BL, Scherer RC. Using a vertical three-mass computational model of the vocal folds to match human phonation of three adult males. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1505-1525. [PMID: 37695295 PMCID: PMC10497319 DOI: 10.1121/10.0020847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 08/10/2023] [Accepted: 08/19/2023] [Indexed: 09/12/2023]
Abstract
Computer models of phonation are used to study various parameters that are difficult to control, measure, and observe in human subjects. Imitating human phonation by varying the prephonatory conditions of computer models offers insight into the variations that occur across human phonatory production. In the present study, a vertical three-mass computer model of phonation [Perrine, Scherer, Fulcher, and Zhai (2020). J. Acoust. Soc. Am. 147, 1727-1737], driven by empirical pressures from a physical model of the vocal folds (model M5), with a vocal tract following the design of Ishizaka and Flanagan [(1972). Bell Sys. Tech. J. 51, 1233-1268] was used to match prolonged vowels produced by three male subjects using various pitch and loudness levels. The prephonatory conditions of tissue mass and tension, subglottal pressure, glottal diameter and angle, posterior glottal gap, false vocal fold gap, and vocal tract cross-sectional areas were varied in the model to match the model output with the fundamental frequency, alternating current airflow, direct current airflow, skewing quotient, open quotient, maximum flow negative derivative, and the first three formant frequencies from the human production. Parameters were matched between the model and human subjects with an average overall percent mismatch of 4.40% (standard deviation = 6.75%), suggesting a reasonable ability of the simple low dimensional model to mimic these variables.
Collapse
Affiliation(s)
- Brittany L Perrine
- Department of Communication Sciences and Disorders, Baylor University, One Bear Place #97332, Waco, Texas 76798, USA
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Ridge Street, Bowling Green, Ohio 43403, USA
| |
Collapse
|
3
|
Zhao W, Singh R. Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1039. [PMID: 37509986 PMCID: PMC10378572 DOI: 10.3390/e25071039] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/03/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
During phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker's vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker's physical state can affect voice production and alter these oscillatory patterns. Measuring these can be valuable in developing computational tools that analyze voice to infer the speaker's state. Traditionally, vocal fold oscillations (VFOs) are measured directly using physical devices in clinical settings. In this paper, we propose a novel analysis-by-synthesis approach that allows us to infer the VFOs directly from recorded speech signals on an individualized, speaker-by-speaker basis. The approach, called the ADLES-VFT algorithm, is proposed in the context of a joint model that combines a phonation model (with a glottal flow waveform as the output) and a vocal tract acoustic wave propagation model such that the output of the joint model is an estimated waveform. The ADLES-VFT algorithm is a forward-backward algorithm which minimizes the error between the recorded waveform and the output of this joint model to estimate its parameters. Once estimated, these parameter values are used in conjunction with a phonation model to obtain its solutions. Since the parameters correlate with the physical properties of the vocal folds of the speaker, model solutions obtained using them represent the individualized VFOs for each speaker. The approach is flexible and can be applied to various phonation models. In addition to presenting the methodology, we show how the VFOs can be quantified from a dynamical systems perspective for classification purposes. Mathematical derivations are provided in an appendix for better readability.
Collapse
Affiliation(s)
- Wayne Zhao
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rita Singh
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Kleiner C, Kainz MA, Echternach M, Birkholz P. Velocity differences in laryngeal adduction and abduction gestures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:45. [PMID: 35105025 DOI: 10.1121/10.0009141] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 12/05/2021] [Indexed: 06/14/2023]
Abstract
The periodic repetitions of laryngeal adduction and abduction gestures were uttered by 16 subjects. The movement of the cuneiform tubercles was tracked over time in the laryngoscopic recordings of these utterances. The adduction velocity and abduction velocity were determined objectively by means of a piecewise linear model fitted to the cuneiform tubercle trajectories. The abduction was found to be significantly faster than the adduction. This was interpreted in terms of the biomechanics and active control by the nervous system. The biomechanical properties could be responsible for a velocity of abduction that is up to 51% higher compared to the velocity of adduction. Additionally, the adduction velocity may be actively limited to prevent an overshoot of the intended adduction degree when the vocal folds are approximated to initiate phonation.
Collapse
Affiliation(s)
- Christian Kleiner
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden, Germany
| | - Marie-Anne Kainz
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Peter Birkholz
- Institute of Acoustics and Speech Communication, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
5
|
Hadwin PJ, Erath BD, Peterson SD. The influence of flow model selection on finite element model parameter estimation using Bayesian inference. JASA EXPRESS LETTERS 2021; 1:045204. [PMID: 34136884 PMCID: PMC8182970 DOI: 10.1121/10.0004260] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 03/18/2021] [Indexed: 06/12/2023]
Abstract
Recently, Bayesian estimation coupled with finite element modeling has been demonstrated as a viable tool for estimating vocal fold material properties from kinematic information obtained via high-speed video recordings. In this article, the sensitivity of the parameter estimations to the employed fluid model is explored by considering Bernoulli and one-dimensional viscous fluid flow models. Simulation results indicate that prescribing an ad hoc separation location for the Bernoulli flow model can lead to large estimate biases, whereas including the separation location as an estimated parameter leads to results comparable to that of the viscous fluid flow model.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, New York 13699, USA , ,
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
6
|
Li Z, Chen Y, Chang S, Rousseau B, Luo H. A one-dimensional flow model enhanced by machine learning for simulation of vocal fold vibration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1712. [PMID: 33765799 PMCID: PMC7954577 DOI: 10.1121/10.0003561] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 01/25/2021] [Accepted: 02/01/2021] [Indexed: 06/02/2023]
Abstract
A one-dimensional (1D) unsteady and viscous flow model that is derived from the momentum and mass conservation equations is described, and to enhance this physics-based model, a machine learning approach is used to determine the unknown modeling parameters. Specifically, an idealized larynx model is constructed and ten cases of three-dimensional (3D) fluid-structure interaction (FSI) simulations are performed. The flow data are then extracted to train the 1D flow model using a sparse identification approach for nonlinear dynamical systems. As a result of training, we obtain the analytical expressions for the entrance effect and pressure loss in the glottis, which are then incorporated in the flow model to conveniently handle different glottal shapes due to vocal fold vibration. We apply the enhanced 1D flow model in the FSI simulation of both idealized vocal fold geometries and subject-specific anatomical geometries reconstructed from the magnetic resonance imaging images of rabbits' larynges. The 1D flow model is evaluated in both of these setups and shown to have robust performance. Therefore, it provides a fast simulation tool that is superior to the previous 1D models.
Collapse
Affiliation(s)
- Zheng Li
- Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, Tennessee 37235-1592, USA
| | - Ye Chen
- Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, Tennessee 37235-1592, USA
| | - Siyuan Chang
- Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, Tennessee 37235-1592, USA
| | - Bernard Rousseau
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | - Haoxiang Luo
- Department of Mechanical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, Tennessee 37235-1592, USA
| |
Collapse
|
7
|
Drioli C, Aichinger P. Modelling sagittal and vertical phase differences in a lumped and distributed elements vocal fold model. Biomed Signal Process Control 2021. [DOI: 10.1016/j.bspc.2020.102309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
8
|
Ghasemzadeh H, Deliyski DD, Hillman RE, Mehta DD. Method for Horizontal Calibration of Laser-Projection Transnasal Fiberoptic High-Speed Videoendoscopy. APPLIED SCIENCES (BASEL, SWITZERLAND) 2021; 11:822. [PMID: 33628469 PMCID: PMC7899170 DOI: 10.3390/app11020822] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in-vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements. METHOD A set of circular grids were recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length. RESULT A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative. CONCLUSION The presented method can achieve sub-millimeter accuracy in horizontal measurement. SIGNIFICANCE Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed.
Collapse
Affiliation(s)
- Hamzeh Ghasemzadeh
- “Department of Communicative Sciences and Disorders” and “Department of Computational Mathematics Science and Engineering”, Michigan State University, East Lansing, Michigan, USA
| | - Dimitar D. Deliyski
- “Department of Communicative Sciences and Disorders”, Michigan State University, East Lansing, Michigan, USA
| | - Robert E. Hillman
- “MGH Institute of Health Professions”, “Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital”, “Department of Surgery, Harvard Medical School”, and “Speech and Hearing Bioscience and Technology, Division of Medical Sciences”, Harvard Medical School, Boston, MA, USA
| | - Daryush D. Mehta
- “MGH Institute of Health Professions”, “Center for Laryngeal Surgery and Voice Rehabilitation, Massachusetts General Hospital”, “Department of Surgery, Harvard Medical School”, and “Speech and Hearing Bioscience and Technology, Division of Medical Sciences”, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Gómez P, Kist AM, Schlegel P, Berry DA, Chhetri DK, Dürr S, Echternach M, Johnson AM, Kniesburges S, Kunduk M, Maryn Y, Schützenberger A, Verguts M, Döllinger M. BAGLS, a multihospital Benchmark for Automatic Glottis Segmentation. Sci Data 2020; 7:186. [PMID: 32561845 PMCID: PMC7305104 DOI: 10.1038/s41597-020-0526-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Accepted: 05/15/2020] [Indexed: 02/06/2023] Open
Abstract
Laryngeal videoendoscopy is one of the main tools in clinical examinations for voice disorders and voice research. Using high-speed videoendoscopy, it is possible to fully capture the vocal fold oscillations, however, processing the recordings typically involves a time-consuming segmentation of the glottal area by trained experts. Even though automatic methods have been proposed and the task is particularly suited for deep learning methods, there are no public datasets and benchmarks available to compare methods and to allow training of generalizing deep learning models. In an international collaboration of researchers from seven institutions from the EU and USA, we have created BAGLS, a large, multihospital dataset of 59,250 high-speed videoendoscopy frames with individually annotated segmentation masks. The frames are based on 640 recordings of healthy and disordered subjects that were recorded with varying technical equipment by numerous clinicians. The BAGLS dataset will allow an objective comparison of glottis segmentation methods and will enable interested researchers to train their own models and compare their methods.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany.
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - David A Berry
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Dinesh K Chhetri
- Department of Head and Neck Surgery, David Geffen School of Medicine at the University of California, Los Angeles, Los Angeles, California, USA
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany
| | - Aaron M Johnson
- NYU Voice Center, Department of Otolaryngology - Head and Neck Surgery, New York University School of Medicine, New York, New York, USA
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge, Louisiana, USA
| | - Youri Maryn
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Speech, Language and Hearing sciences, University of Ghent, Ghent, Belgium
- Faculty of Education, Health and Social Work, University College Ghent, Ghent, Belgium
- Faculty of Psychology and Educational Sciences, School of Logopedics, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
- Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| | - Monique Verguts
- European Institute for ORL-HNS, Department of Otorhinolaryngology and Head & Neck Surgery, Sint-Augustinus GZA, Wilrijk, Belgium
- Department of Otorhinolaryngology and Voice Disorders, Diest General Hospital, Diest, Belgium
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Waldstraße 1, 91054, Erlangen, Germany
| |
Collapse
|
10
|
Abstract
This review provides a comprehensive compilation, from a digital image processing point of view of the most important techniques currently developed to characterize and quantify the vibration behaviour of the vocal folds, along with a detailed description of the laryngeal image modalities currently used in the clinic. The review presents an overview of the most significant glottal-gap segmentation and facilitative playbacks techniques used in the literature for the mentioned purpose, and shows the drawbacks and challenges that still remain unsolved to develop robust vocal folds vibration function analysis tools based on digital image processing.
Collapse
|
11
|
Fehling MK, Grosch F, Schuster ME, Schick B, Lohscheller J. Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network. PLoS One 2020; 15:e0227791. [PMID: 32040514 PMCID: PMC7010264 DOI: 10.1371/journal.pone.0227791] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 12/25/2019] [Indexed: 01/22/2023] Open
Abstract
The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future.
Collapse
Affiliation(s)
- Mona Kirstin Fehling
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Fabian Grosch
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| | - Maria Elke Schuster
- Department of Otorhinolaryngology and Head and Neck Surgery, University of Munich, Campus Grosshadern, München, Germany
| | - Bernhard Schick
- Department of Otorhinolaryngology, Saarland University Hospital, Homburg/Saar, Germany
| | - Jörg Lohscheller
- Department of Computer Science, Trier University of Applied Sciences, Schneidershof, Trier, Germany
| |
Collapse
|
12
|
Drioli C, Foresti GL. Fitting a biomechanical model of the folds to high-speed video data through bayesian estimation. INFORMATICS IN MEDICINE UNLOCKED 2020. [DOI: 10.1016/j.imu.2020.100373] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
|
13
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256#suppl] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
14
|
Deng JJ, Hadwin PJ, Peterson SD. The effect of high-speed videoendoscopy configuration on reduced-order model parameter estimates by Bayesian inference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:1492. [PMID: 31472542 PMCID: PMC6715443 DOI: 10.1121/1.5124256] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 08/07/2019] [Accepted: 08/09/2019] [Indexed: 06/10/2023]
Abstract
Bayesian inference has been previously demonstrated as a viable inverse analysis tool for estimating subject-specific reduced-order model parameters and uncertainties. However, previous studies have relied upon simulated glottal area waveforms with superimposed random noise as the measurement. In practice, high-speed videoendoscopy is used to measure glottal area, which introduces practical imaging effects not captured in simulated data, such as viewing angle, frame rate, and camera resolution. Herein, high-speed videos of the vocal folds were approximated by recording the trajectories of physical vocal fold models controlled by a symmetric body-cover model. Twenty videos were recorded, varying subglottal pressure, cricothyroid activation, and viewing angle, with frame rate and video resolution varied by digital video manipulation. Bayesian inference was used to estimate subglottal pressure and cricothyroid activation from glottal area waveforms extracted from the videos. The resulting estimates show off-axis viewing of 10° can lead to a 10% bias in the estimated subglottal pressure. A viewing model is introduced such that viewing angle can be included as an estimated parameter, which alleviates estimate bias. Frame rate and pixel resolution were found to primarily affect uncertainty of parameter estimates up to a limit where spatial and temporal resolutions were too poor to resolve the glottal area. Since many high-speed cameras have the ability to sacrifice spatial for temporal resolution, the findings herein suggest that Bayesian inference studies employing high-speed video should increase temporal resolutions at the expense of spatial resolution for reduced estimate uncertainties.
Collapse
Affiliation(s)
- Jonathan J Deng
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Ontario N2L 3G1, Canada
| |
Collapse
|
15
|
Erath BD, Peterson SD, Weiland KS, Plesniak MW, Zañartu M. An acoustic source model for asymmetric intraglottal flow with application to reduced-order models of the vocal folds. PLoS One 2019; 14:e0219914. [PMID: 31344084 PMCID: PMC6657872 DOI: 10.1371/journal.pone.0219914] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2018] [Accepted: 07/03/2019] [Indexed: 01/04/2023] Open
Abstract
The complex three-way interaction between airflow, tissue, and sound, for asymmetric vocal fold vibration, is not well understood. Current modeling efforts are not able to explain clinical observations where drastic differences in sound production are often observed, with no noticeable differences in the vocal fold kinematics. To advance this understanding, an acoustical model for voiced sound generation in the presence of asymmetric intraglottal flows is developed. The source model operates in conjunction with a wave reflection analog propagation scheme and an asymmetric flow description within the glottis. To enable comparison with prior work, the source model is evaluated using a well-studied two-mass vocal fold model. The proposed source model is evaluated through acoustic measures of interest, including radiated sound pressure level, maximum flow declination rate, and spectral tilt, and also via its effects on the vocal fold dynamics. The influence of the model, in comparison to the standard symmetric Bernoulli flow description, results in an increased transfer of energy from the fluid to the vocal folds, increased radiated sound pressure level and maximum flow declination rate, and decreased spectral tilt. These differences are most pronounced for asymmetric vocal fold configurations that mimic unilateral paresis and paralysis, where minor kinematic changes can result in significant acoustic and aerodynamic differences. The results illustrate that fluid effects arising from asymmetric glottal flow can play an important role in the acoustics of pathological voiced speech.
Collapse
Affiliation(s)
- Byron D. Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, Potsdam, NY, United States of America
| | - Sean D. Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario, Canada
| | - Kelley S. Weiland
- Naval Surface Warfare Center, Dahlgren Division, Dahlgren, VA, United States of America
| | - Michael W. Plesniak
- Department of Mechanical and Aerospace Engineering, The George Washington University, Washington, D.C., United States of America
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
16
|
Li S, Scherer RC, Wan M, Wang S, Song B. Intraglottal Pressure: A Comparison Between Male and Female Larynxes. J Voice 2019; 34:813-822. [PMID: 31311664 DOI: 10.1016/j.jvoice.2019.06.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 06/08/2019] [Accepted: 06/10/2019] [Indexed: 11/25/2022]
Abstract
Acoustic differences in the phonated sounds made by men and women are related to laryngeal and vocal tract structural differences. This model-based study explored how typical vocal fold differences between males and females affect intraglottal pressure distributions under conditions of different glottal angles and transglottal pressures, and thus how they may affect phonation. The computational code ANSYS Fluent 6.3 was used to obtain the pressure distributions and other aerodynamic parameters for laminar, incompressible flow. Typical values of the vocal fold length, the vertical glottal duct length, and the lateral vocal fold tissue depth were selected both for males and females under conditions of nine typical convergent/divergent glottal angles and three transglottal pressures. There was no coupling of the upstream or downstream vocal tracts, and also no vocal fold contact in these two-dimensional static glottal geometries. Results suggest that males tend to have greater intraglottal pressures for the convergent glottal shape that occurs during glottal opening, and the male glottis offers less flow resistance than the female glottis. These results suggest that the male vocal folds may vibrate more easily (ie, with lower transglottal pressure) but the tissue differences may nullify such an hypothesis. Also, the peak velocities in the glottis were dependent on the transglottal pressure driving the flow and the minimal glottal diameter, which were the same for both the male and female larynxes, rather than on the inferior-superior length of the glottis or the anterior-posterior glottal length. In addition, the tangential forces for larger glottal convergent angles was significantly greater in the female larynx. The entrance loss coefficients, however, were similar between the male and female larynxes, except for the uniform glottis for which the values were larger for the male larynx. The results suggest that the structural differences between male and female vocal folds should be well specified when building computational and physical models of the larynx.
Collapse
Affiliation(s)
- Sheng Li
- College of Science, Xijing University, Xi'an, People's Republic of China
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Bowling Green, Ohio
| | - MingXi Wan
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China.
| | - SuPin Wang
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, and Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, People's Republic of China
| | - Bo Song
- College of Aeronautical Engineering, Air Force Engineering University, Xi'an, People's Republic of China
| |
Collapse
|
17
|
Bayesian Inference of Vocal Fold Material Properties from Glottal Area Waveforms Using a 2D Finite Element Model. APPLIED SCIENCES-BASEL 2019; 9. [PMID: 34046213 PMCID: PMC8153513 DOI: 10.3390/app9132735] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Bayesian estimation has been previously demonstrated as a viable method for developing subject-specific vocal fold models from observations of the glottal area waveform. These prior efforts, however, have been restricted to lumped-element fitting models and synthetic observation data. The indirect relationship between the lumped-element parameters and physical tissue properties renders extracting the latter from the former difficult. Herein we propose a finite element fitting model, which treats the vocal folds as a viscoelastic deformable body comprised of three layers. Using the glottal area waveforms generated by self-oscillating silicone vocal folds we directly estimate the elastic moduli, density, and other material properties of the silicone folds using a Bayesian importance sampling approach. Estimated material properties agree with the “ground truth” experimental values to within 3% for most parameters. By considering cases with varying subglottal pressure and medial compression we demonstrate that the finite element model coupled with Bayesian estimation is sufficiently sensitive to distinguish between experimental configurations. Additional information not available experimentally, namely, contact pressures, are extracted from the developed finite element models. The contact pressures are found to increase with medial compression and subglottal pressure, in agreement with expectation.
Collapse
|
18
|
Influence of spatial camera resolution in high-speed videoendoscopy on laryngeal parameters. PLoS One 2019; 14:e0215168. [PMID: 31009488 PMCID: PMC6476512 DOI: 10.1371/journal.pone.0215168] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2018] [Accepted: 03/27/2019] [Indexed: 11/19/2022] Open
Abstract
In laryngeal high-speed videoendoscopy (HSV) the area between the vibrating vocal folds during phonation is of interest, being referred to as glottal area waveform (GAW). Varying camera resolution may influence parameters computed on the GAW and hence hinder the comparability between examinations. This study investigates the influence of spatial camera resolution on quantitative vocal fold vibratory function parameters obtained from the GAW. In total 40 HSV recordings during sustained phonation (20 healthy males and 20 healthy females) were investigated. A clinically used Photron Fastcam MC2 camera with a frame rate of 4000 fps and a spatial resolution of 512×256 pixels was applied. This initial resolution was reduced by pixel averaging to (1) a resolution of 256×128 and (2) to a resolution of 128×64 pixels, yielding three sets of recordings. The GAW was extracted and in total 50 vocal fold vibratory parameters representing different features of the GAW were computed. Statistical analyses using SPSS Statistics, version 21, was performed. 15 Parameters showing strong mathematical dependencies with other parameters were excluded from the main analysis but are given in the Supporting Information. Data analysis revealed clear influence of spatial resolution on GAW parameters. Fundamental period measures and period perturbation measures were the least affected. Amplitude perturbation measures and mechanical measures were most strongly influenced. Most glottal dynamic characteristics and symmetry measures deviated significantly. Most energy perturbation measures changed significantly in males but were mostly unaffected in females. In females 18 of 35 remaining parameters (51%) and in males 22 parameters (63%) changed significantly between spatial resolutions. This work represents the first step in studying the impact of video resolution on quantitative HSV parameters. Clear influences of spatial camera resolution on computed parameters were found. The study results suggest avoiding the use of the most strongly affected parameters. Further, the use of cameras with high resolution is recommended to analyze GAW measures in HSV data.
Collapse
|
19
|
Gómez P, Schützenberger A, Semmler M, Döllinger M. Laryngeal Pressure Estimation With a Recurrent Neural Network. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2018; 7:2000111. [PMID: 30680252 PMCID: PMC6331197 DOI: 10.1109/jtehm.2018.2886021] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 10/24/2018] [Accepted: 11/30/2018] [Indexed: 11/24/2022]
Abstract
Quantifying the physical parameters of voice production is essential for understanding the process of phonation and can aid in voice research and diagnosis. As an alternative to invasive measurements, they can be estimated by formulating an inverse problem using a numerical forward model. However, high-fidelity numerical models are often computationally too expensive for this. This paper presents a novel approach to train a long short-term memory network to estimate the subglottal pressure in the larynx at massively reduced computational cost using solely synthetic training data. We train the network on synthetic data from a numerical two-mass model and validate it on experimental data from 288 high-speed ex vivo video recordings of porcine vocal folds from a previous study. The training requires significantly fewer model evaluations compared with the previous optimization approach. On the test set, we maintain a comparable performance of 21.2% versus previous 17.7% mean absolute percentage error in estimating the subglottal pressure. The evaluation of one sample requires a vanishingly small amount of computation time. The presented approach is able to maintain estimation accuracy of the subglottal pressure at significantly reduced computational cost. The methodology is likely transferable to estimate other parameters and training with other numerical models. This improvement should allow the adoption of more sophisticated, high-fidelity numerical models of the larynx. The vast speedup is a critical step to enable a future clinical application and knowledge of parameters such as the subglottal pressure will aid in diagnosis and treatment selection.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| |
Collapse
|