1
|
Zhao W, Singh R. Deriving Vocal Fold Oscillation Information from Recorded Voice Signals Using Models of Phonation. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1039. [PMID: 37509986 PMCID: PMC10378572 DOI: 10.3390/e25071039] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 07/03/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023]
Abstract
During phonation, the vocal folds exhibit a self-sustained oscillatory motion, which is influenced by the physical properties of the speaker's vocal folds and driven by the balance of bio-mechanical and aerodynamic forces across the glottis. Subtle changes in the speaker's physical state can affect voice production and alter these oscillatory patterns. Measuring these can be valuable in developing computational tools that analyze voice to infer the speaker's state. Traditionally, vocal fold oscillations (VFOs) are measured directly using physical devices in clinical settings. In this paper, we propose a novel analysis-by-synthesis approach that allows us to infer the VFOs directly from recorded speech signals on an individualized, speaker-by-speaker basis. The approach, called the ADLES-VFT algorithm, is proposed in the context of a joint model that combines a phonation model (with a glottal flow waveform as the output) and a vocal tract acoustic wave propagation model such that the output of the joint model is an estimated waveform. The ADLES-VFT algorithm is a forward-backward algorithm which minimizes the error between the recorded waveform and the output of this joint model to estimate its parameters. Once estimated, these parameter values are used in conjunction with a phonation model to obtain its solutions. Since the parameters correlate with the physical properties of the vocal folds of the speaker, model solutions obtained using them represent the individualized VFOs for each speaker. The approach is flexible and can be applied to various phonation models. In addition to presenting the methodology, we show how the VFOs can be quantified from a dynamical systems perspective for classification purposes. Mathematical derivations are provided in an appendix for better readability.
Collapse
Affiliation(s)
- Wayne Zhao
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Rita Singh
- School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| |
Collapse
|
2
|
Ikuma T, McWhorter AJ, Adkins L, Kunduk M. Investigation of Vocal Bifurcations and Voice Patterns Induced by Asymmetry of Pathological Vocal Folds. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:48-60. [PMID: 36472934 DOI: 10.1044/2022_jslhr-21-00499] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
PURPOSE Vocal fold asymmetry creates irregular entrainments and modulations in voice, which may lead to rough perceptual quality. The presence of asymmetry can also cause mid-phonation bifurcations where a small change in the phonatory system causes a drastic change in vibration pattern, resulting in transitions in and out of rough voice. This study surveys sustained phonation recordings of speakers with the diagnoses of vocal fold polyp or unilateral vocal fold paralysis to investigate the resulting voice patterns. METHOD This retrospective study observed 71 sustained phonation recordings from 48 patients. Segments with distinctive signal patterns were identified within each recording with narrowband spectrogram and computer-assisted analysis of spectral peaks. RESULTS Phonation segmentation yielded 240 segments across all the recordings. Five voice patterns were recognized: (regularly or irregularly) entrained, modulated, uncoupled, unstable, and pulsed. Thirty-six patients (75%) exhibited irregular patterns. No single irregular pattern lasted for the entire phonation and was always accompanied by at least one mid-phonation bifurcation. Durations of the irregular segments (M = 0.4 s) were significantly shorter than the segments with the regular pattern (M = 1.4 s). CONCLUSIONS The results suggest that vocal fold pathology frequently introduces dynamic vibratory patterns that affect both the acoustic signals and perceptions. Due to these abnormalities, it is important for clinical voice assessment protocols, both perceptual and acoustic, to account for these possible bifurcations, irregular signal patterns, and their tendencies.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans
- Voice Center, The Our Lady of the Lake Regional Medical Center, Baton Rouge, LA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans
- Voice Center, The Our Lady of the Lake Regional Medical Center, Baton Rouge, LA
| | - Lacey Adkins
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans
- Voice Center, The Our Lady of the Lake Regional Medical Center, Baton Rouge, LA
| | - Melda Kunduk
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans
- Voice Center, The Our Lady of the Lake Regional Medical Center, Baton Rouge, LA
- Department of Communication Sciences & Disorders, Louisiana State University, Baton Rouge
| |
Collapse
|
3
|
Yoshinaga T, Zhang Z, Iida A. Comparison of one-dimensional and three-dimensional glottal flow models in left-right asymmetric vocal fold conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2557. [PMID: 36456298 PMCID: PMC9629867 DOI: 10.1121/10.0014949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 10/04/2022] [Accepted: 10/06/2022] [Indexed: 06/17/2023]
Abstract
While the glottal flow is often simplified as one-dimensional (1D) in computational models of phonation to reduce computational costs, the 1D flow model has not been validated in left-right asymmetric vocal fold conditions, as often occur in both normal and pathological voice production. In this study, we performed three-dimensional (3D) and 1D flow simulations coupled to a two-mass model of adult male vocal folds and compared voice production at different degrees of left-right stiffness asymmetry. The flow and acoustic fields in 3D were obtained by solving the compressible Navier-Stokes equations using the volume penalization method with the moving vocal fold wall as an immersed boundary. Despite differences in the predicted flow pressure on vocal fold surface between the 1D and 3D flow models, the results showed reasonable agreement in vocal fold vibration patterns and selected voice outcome measures between the 1D and 3D models for the range of left-right asymmetric conditions investigated. This indicates that vocal fold properties play a larger role than the glottal flow in determining the overall pattern of vocal fold vibration and the produced voice, and the 1D flow simplification is sufficient in modeling phonation, at least for the simplified glottal geometry of this study.
Collapse
Affiliation(s)
- Tsukasa Yoshinaga
- Department of Mechanical Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tenpaku, Toyohashi 441-8580, Japan
| | - Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, 31-24 Rehabilitation Center, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA
| | - Akiyoshi Iida
- Department of Mechanical Engineering, Toyohashi University of Technology, 1-1 Hibarigaoka, Tenpaku, Toyohashi 441-8580, Japan
| |
Collapse
|
4
|
Stewart ME, Erath BD. Investigating blunt force trauma to the larynx: The role of inferior-superior vocal fold displacement on phonation. J Biomech 2021; 121:110377. [PMID: 33819698 DOI: 10.1016/j.jbiomech.2021.110377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 02/24/2021] [Accepted: 03/01/2021] [Indexed: 11/26/2022]
Abstract
Blunt force trauma to the larynx, which may result from motor vehicle collisions, sports activities, etc., can cause significant damage, often leading to displaced fractures of the laryngeal cartilages, thereby disrupting vocal function. Current surgical interventions primarily focus on airway restoration to stabilize the patient, with restoration of vocal function usually being a secondary consideration. Due to laryngeal fracture, asymmetric vertical misalignment of the left or right vocal fold (VF) in the inferior-superior direction often occurs. This affects VF closure and can lead to a weak, breathy voice requiring increased vocal effort. It is unclear, however, how much vertical VF misalignment can be tolerated before voice quality degrades significantly. To address this need, the influence of inferior-superior VF displacement on phonation is investigated in 1.0mm increments using synthetic, self-oscillating VF models in a physiologically-representative facility. Acoustic (SPL, frequency, H1-H2, jitter, and shimmer), kinematic (amplitude and phase differences), and aerodynamic parameters (flow rate and subglottal pressure) are investigated as a function of inferior-superior vertical displacement. Significant findings include that once the inferior-superior medial length of the VF is surpassed, sustained phonation degrades precipitously, becoming severely pathological. If laryngeal reconstruction approaches can ensure VF contact is maintained during phonation (i.e., vertical displacement doesn't surpass VF medial length), improved vocal outcomes are expected.
Collapse
Affiliation(s)
- Molly E Stewart
- Department of Mechanical and Aeronautical Engineering, Clarkson University, 8 Clarkson Ave, Potsdam, NY 13699, United States
| | - Byron D Erath
- Department of Mechanical and Aeronautical Engineering, Clarkson University, 8 Clarkson Ave, Potsdam, NY 13699, United States.
| |
Collapse
|
5
|
Gandhi S, Bhatta S, Ganesuni D, Ghanpur AD, Saindani SJ. Pre- and post-operative high-speed videolaryngoscopy in unilateral vocal cord paralysis following autologous fat augmentation. Am J Otolaryngol 2021; 42:102878. [PMID: 33418176 DOI: 10.1016/j.amjoto.2020.102878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2020] [Revised: 12/13/2020] [Accepted: 12/28/2020] [Indexed: 10/22/2022]
Abstract
PURPOSE To compare high-speed videolaryngoscopy (HSV) parameters such as open quotient (OQ), amplitude symmetry index (ASI), phase symmetry index (PSI), and frequency symmetry index (FSI), of the unilateral vocal cord paralysis (UVCP) patients pre and post (after 6 months) autologous fat augmentation. MATERIALS AND METHODS This retrospective study evaluated all age and gender patients with UVCP that underwent autologous fat augmentation from July 2016 to July 2019. The OQ, ASI, PSI, and FSI were calculated from the HSV recordings by using the montage and fast Fourier transform point analysis. The pre-and post-operative means were compared using a paired student t-test, with a p-value less than 0.05 considered significant. RESULT A total of 37 patients, age 41.2 ± 11.3 years (21 to 67 years), 59.4% females and 40.6% males, were included in the study. The average duration of symptom onset was 2.3 ± 0.87 months. The post-operative mean values of OQ, ASI, PSI, and FSI following the fat augmentation were significantly improved compared to the pre-operative mean values with p-values <0.0001, 0.0018, 0.0011, and 0.0006, respectively. CONCLUSION There was a significant improvement in the OQ, ASI, PSI, and FSI in UVCP patients after 6 months of autologous fat augmentation, signifying an enhanced vibratory function. The ability of HSV to measure the minute details of vocal cord vibration by providing quantitative measurements has also been highlighted. The need for future prospective research with an increased sample size and longer duration of follow up is recommended.
Collapse
|
6
|
Abstract
A healthy voice is crucial for verbal communication and hence in daily as well as professional life. The basis for a healthy voice are the sound producing vocal folds in the larynx. A hallmark of healthy vocal fold oscillation is the symmetric motion of the left and right vocal fold. Clinically, videoendoscopy is applied to assess the symmetry of the oscillation and evaluated subjectively. High-speed videoendoscopy, an emerging method that allows quantification of the vocal fold oscillation, is more commonly employed in research due to the amount of data and the complex, semi-automatic analysis. In this study, we provide a comprehensive evaluation of methods that detect fully automatically the glottal midline. We used a biophysical model to simulate different vocal fold oscillations, extended the openly available BAGLS dataset using manual annotations, utilized both, simulations and annotated endoscopic images, to train deep neural networks at different stages of the analysis workflow, and compared these to established computer vision algorithms. We found that classical computer vision perform well on detecting the glottal midline in glottis segmentation data, but are outperformed by deep neural networks on this task. We further suggest GlottisNet, a multi-task neural architecture featuring the simultaneous prediction of both, the opening between the vocal folds and the symmetry axis, leading to a huge step forward towards clinical applicability of quantitative, deep learning-assisted laryngeal endoscopy, by fully automating segmentation and midline detection.
Collapse
|
7
|
|
8
|
Van Hirtum A, Bouvet A, Pelorson X. Quantifying the auto-oscillation complexity following water spraying with interest for phonation. Phys Rev E 2019; 100:043111. [PMID: 31770960 DOI: 10.1103/physreve.100.043111] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Indexed: 11/07/2022]
Abstract
Human voiced sound production or phonation is the result of a fluid-structure instability in the larynx leading to vocal folds auto-oscillation. In this paper, the effect of surface hydration following water spraying (0 up to 5 ml) on an ongoing auto-oscillation is studied experimentally using different mechanical deformable vocal folds replicas. The complexity of the oscillation is quantified on the upstream pressure by a phase space recurrence and complexity analysis. It is shown that: (1) the ratio of the degree of determinism to the recurrence rate of the phase space states γ and (2) estimated correlation dimension D_{2} are suitable parameters to grasp the effect of hydration on the oscillation pattern. The oscillation regime after hydration can either remain deterministic or approach a chaotic regime depending on initial conditions prior to water spraying, such as elasticity, glottal aperture, as well as oscillation complexity.
Collapse
Affiliation(s)
- A Van Hirtum
- LEGI, UMR CNRS 5519, Grenoble Alpes University, Grenoble, France
| | - A Bouvet
- LEGI, UMR CNRS 5519, Grenoble Alpes University, Grenoble, France
| | - X Pelorson
- LEGI, UMR CNRS 5519, Grenoble Alpes University, Grenoble, France
| |
Collapse
|
9
|
Gómez P, Schützenberger A, Semmler M, Döllinger M. Laryngeal Pressure Estimation With a Recurrent Neural Network. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE 2018; 7:2000111. [PMID: 30680252 PMCID: PMC6331197 DOI: 10.1109/jtehm.2018.2886021] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 10/24/2018] [Accepted: 11/30/2018] [Indexed: 11/24/2022]
Abstract
Quantifying the physical parameters of voice production is essential for understanding the process of phonation and can aid in voice research and diagnosis. As an alternative to invasive measurements, they can be estimated by formulating an inverse problem using a numerical forward model. However, high-fidelity numerical models are often computationally too expensive for this. This paper presents a novel approach to train a long short-term memory network to estimate the subglottal pressure in the larynx at massively reduced computational cost using solely synthetic training data. We train the network on synthetic data from a numerical two-mass model and validate it on experimental data from 288 high-speed ex vivo video recordings of porcine vocal folds from a previous study. The training requires significantly fewer model evaluations compared with the previous optimization approach. On the test set, we maintain a comparable performance of 21.2% versus previous 17.7% mean absolute percentage error in estimating the subglottal pressure. The evaluation of one sample requires a vanishingly small amount of computation time. The presented approach is able to maintain estimation accuracy of the subglottal pressure at significantly reduced computational cost. The methodology is likely transferable to estimate other parameters and training with other numerical models. This improvement should allow the adoption of more sophisticated, high-fidelity numerical models of the larynx. The vast speedup is a critical step to enable a future clinical application and knowledge of parameters such as the subglottal pressure will aid in diagnosis and treatment selection.
Collapse
Affiliation(s)
- Pablo Gómez
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Marion Semmler
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric AudiologyDepartment of Otorhinolaryngology, Head and Neck SurgeryUniversity Hospital Erlangen, Friedrich-Alexander University Erlangen-Nürnberg91054ErlangenGermany
| |
Collapse
|
10
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
11
|
Aichinger P, Hagmüller M, Roesner I, Schneider-Stickler B, Schoentgen J, Pernkopf F. Fundamental frequency tracking in diplophonic voices. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2016.10.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Tokuda IT, Shimamura R. Effect of level difference between left and right vocal folds on phonation: Physical experiment and theoretical study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:482. [PMID: 28863607 DOI: 10.1121/1.4996105] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
As an alternative factor to produce asymmetry between left and right vocal folds, the present study focuses on level difference, which is defined as the distance between the upper surfaces of the bilateral vocal folds in the inferior-superior direction. Physical models of the vocal folds were utilized to study the effect of the level difference on the phonation threshold pressure. A vocal tract model was also attached to the vocal fold model. For two types of different models, experiments revealed that the phonation threshold pressure tended to increase as the level difference was extended. Based upon a small amplitude approximation of the vocal fold oscillations, a theoretical formula was derived for the phonation threshold pressure. This theory agrees with the experiments, especially when the phase difference between the left and right vocal folds is not extensive. Furthermore, an asymmetric two-mass model was simulated with a level difference to validate the experiments as well as the theory. The primary conclusion is that the level difference has a potential effect on voice production especially for patients with an extended level of vertical difference in the vocal folds, which might be taken into account for the diagnosis of voice disorders.
Collapse
Affiliation(s)
- Isao T Tokuda
- Graduate School of Science and Engineering, Ritsumeikan University, Noji-higashi, Kusatsu, Shiga 525-8577, Japan
| | - Ryo Shimamura
- Graduate School of Science and Engineering, Ritsumeikan University, Noji-higashi, Kusatsu, Shiga 525-8577, Japan
| |
Collapse
|
13
|
Teramoto Y, Takahashi DY, Holmes P, Ghazanfar AA. Vocal development in a Waddington landscape. eLife 2017; 6. [PMID: 28092262 PMCID: PMC5310845 DOI: 10.7554/elife.20782] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Accepted: 01/15/2017] [Indexed: 01/28/2023] Open
Abstract
Vocal development is the adaptive coordination of the vocal apparatus, muscles, the nervous system, and social interaction. Here, we use a quantitative framework based on optimal control theory and Waddington’s landscape metaphor to provide an integrated view of this process. With a biomechanical model of the marmoset monkey vocal apparatus and behavioral developmental data, we show that only the combination of the developing vocal tract, vocal apparatus muscles and nervous system can fully account for the patterns of vocal development. Together, these elements influence the shape of the monkeys’ vocal developmental landscape, tilting, rotating or shifting it in different ways. We can thus use this framework to make quantitative predictions regarding how interfering factors or experimental perturbations can change the landscape within a species, or to explain comparative differences in vocal development across species DOI:http://dx.doi.org/10.7554/eLife.20782.001 As infants develop they learn new behaviors and refine existing ones. For example, human infants progress from crying to babbling to producing speech-like sounds. A complex sequence of changes in muscles, the nervous system and in patterns of interactions with other individuals all contribute to these emerging behaviors. Despite this complexity, most studies of vocal development have only considered single factors in isolation. A study of speech development, for example, might examine how changes in the brain enable infants to imitate sounds. However, that same study will probably ignore how changes in the structure of the vocal cords, or in the behavior of the parents, also promote imitation. Young marmoset monkeys, like human infants, gradually develop from producing immature cries to adult-like calls. Teramoto, Takahashi et al. built a computational model of this process and compared the model to data from real animals. The first version of the model focused solely on how the marmosets’ vocal cords grow, and did not fully reproduce how adult-like calls emerge in real marmosets. Teramoto, Takahashi et al. therefore added factors to the model that simulate improvements in muscle control, learning in the nervous system and in the behavior of other animals. These findings show that, to reflect how adult-like calls emerge in real marmosets, the model needs to include all of these factors. The model developed by Teramoto, Takahashi et al. may also provide insights into why vocal learning and some other behaviors emerge in some species and not others. It may also be used to predict the consequences of disrupting individual processes in young animals at particular points in time and how such disruptions shape the way an animal develops on its way to adulthood. DOI:http://dx.doi.org/10.7554/eLife.20782.002
Collapse
Affiliation(s)
- Yayoi Teramoto
- Princeton Neuroscience Institute, Princeton University, Princeton, United States
| | - Daniel Y Takahashi
- Princeton Neuroscience Institute, Princeton University, Princeton, United States.,Department of Psychology, Princeton University, Princeton, United States
| | - Philip Holmes
- Princeton Neuroscience Institute, Princeton University, Princeton, United States.,Department of Mechanical and Aerospace Engineering and Program in Applied and Computational Mathematics, Princeton University, Princeton, United States
| | - Asif A Ghazanfar
- Princeton Neuroscience Institute, Princeton University, Princeton, United States.,Department of Psychology, Princeton University, Princeton, United States.,Department of Ecology and Evolutionary Biology, Princeton University, Princeton, United States
| |
Collapse
|
14
|
Jing B, Tang S, Wu L, Wang S, Wan M. Visualizing the Vibration of Laryngeal Tissue during Phonation Using Ultrafast Plane Wave Ultrasonography. ULTRASOUND IN MEDICINE & BIOLOGY 2016; 42:2812-2825. [PMID: 27633284 DOI: 10.1016/j.ultrasmedbio.2016.07.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2016] [Revised: 07/12/2016] [Accepted: 07/22/2016] [Indexed: 06/06/2023]
Abstract
Ultrafast plane wave ultrasonography is employed in this study to visualize the vibration of the larynx and quantify the vibration phase as well as the vibration amplitude of the laryngeal tissue. Ultrasonic images were obtained at 5000 to 10,000 frames/s in the coronal plane at the level of the glottis. Although the image quality degraded when the imaging mode was switched from conventional ultrasonography to ultrafast plane wave ultrasonography, certain anatomic structures such as the vocal folds, as well as the sub- and supraglottic structures, including the false vocal folds, can be identified in the ultrafast plane wave ultrasonic image. The periodic vibration of the vocal fold edge could be visualized in the recorded image sequence during phonation. Furthermore, a motion estimation method was used to quantify the displacement of laryngeal tissue from hundreds of frames of ultrasonic data acquired. Vibratory displacement waveforms of the sub- and supraglottic structures were successfully obtained at a high level of ultrasonic signal correlation. Moreover, statistically significant differences in vibration pattern between the sub- and supraglottic structures were found. Variation of vibration amplitude along the subglottic mucosal surface is significantly smaller than that along the supraglottic mucosal surface. Phase delay of vibration along the subglottic mucosal surface is significantly smaller than that along the supraglottic mucosal surface.
Collapse
Affiliation(s)
- Bowen Jing
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Shanshan Tang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Liang Wu
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Supin Wang
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China
| | - Mingxi Wan
- Key Laboratory of Biomedical Information Engineering of Ministry of Education, Department of Biomedical Engineering, School of Life Science and Technology, Xi'an Jiaotong University, Xi'an, China.
| |
Collapse
|
15
|
Quantitative Analysis of Vocal Fold Vibration in Vocal Fold Paralysis With the Use of High-speed Digital Imaging. J Voice 2016; 30:766.e13-766.e22. [DOI: 10.1016/j.jvoice.2015.10.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 10/22/2015] [Indexed: 11/21/2022]
|
16
|
Herbst CT, Unger J, Herzel H, Švec JG, Lohscheller J. Phasegram Analysis of Vocal Fold Vibration Documented With Laryngeal High-speed Video Endoscopy. J Voice 2016; 30:771.e1-771.e15. [DOI: 10.1016/j.jvoice.2015.11.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 11/12/2015] [Indexed: 11/16/2022]
|
17
|
Ikuma T, Kunduk M, Fink D, McWhorter AJ. Synthetic multi-line kymographic analysis: A spatiotemporal data reduction technique for high-speed videoendoscopy. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2703. [PMID: 27794340 DOI: 10.1121/1.4964400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
High-speed videoendoscopy (HSV) enables observation of the true vibratory behavior of the vocal folds. To quantify the vocal fold vibration captured by the HSV, lateral movement features (e.g., glottal width and vocal fold edge displacements) have been extracted as functions of time. The most common analysis method is to extract the features on a lateral strip used to form digital kymogram. The weakness of this method is that it can only capture the vibrational behavior local to the strip location. While the multi-line kymographic approach has been utilized to capture the spatial diversity, the observation points are either fixed or manually positioned. Behaviors of pathological vocal folds, especially those with lesions, are expected to be spatially diverse and also diverse among speakers, making fixed observation points ineffective. This paper proposes a technique to synthesize kymographic waveforms from full spatiotemporal HSV feature data to extract distinctive behaviors automatically. Each synthesized waveform represents a non-overlapping section of the glottis, where vocal folds are locally behaving homogeneously. The efficacy of the algorithm is demonstrated with four HSV recordings (three pathological) and discussed, including mitigation of the known drawbacks.
Collapse
Affiliation(s)
- Takeshi Ikuma
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Melda Kunduk
- Department of Communication Disorders, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | - Daniel Fink
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| | - Andrew J McWhorter
- Department of Otolaryngology-Head and Neck Surgery, Louisiana State University Health Sciences Center, New Orleans, Louisiana 70112, USA
| |
Collapse
|
18
|
|
19
|
Panek D, Skalski A, Zielinski T, Deliyski DD. Voice pathology classification based on High-Speed Videoendoscopy. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2015:735-8. [PMID: 26736367 DOI: 10.1109/embc.2015.7318467] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This work presents a method for automatical and objective classification of patients with healthy and pathological vocal fold vibration impairments using High-Speed Videoendoscopy of the larynx. We used an image segmentation and extraction of a novel set of numerical parameters describing the spatio-temporal dynamics of vocal folds to classification according to the normal and pathological cases and achieved 73,3% cross-validation classification accuracy. This approach is promising to develop an automatic diagnosis tool of voice disorders.
Collapse
|
20
|
Aerodynamic measures of glottal function: what extra can they tell us and how do they guide management? Curr Opin Otolaryngol Head Neck Surg 2015; 22:450-4. [PMID: 25254405 DOI: 10.1097/moo.0000000000000107] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW This article will define the major advances in laryngeal aerodynamics research from recent evidence-based literature. RECENT FINDINGS Recently published research focuses on new applications of aerodynamic parameters to improve patient diagnosis and outcomes, as well as further elucidating the mechanisms of phonation using computational modeling and excised larynges. SUMMARY Although there is an extensive amount of research on improving the diagnosis and treatment of voice disorders using aerodynamics, the majority of recent literature lacks any conclusive evidence on new methods for use in the clinic; further research in these is needed. The best practices for resonance tube phonation in water and semi-occluded voice therapy are being investigated, as is the exact mechanism by which glottal airflow interacts with vocal folds to produce phonation. It is recommended that clinicians evaluate patients with Parkinson's disease on the basis of airflow declination and lung volume expended per syllable to avoid dependence on an acoustic signal. In addition, advances in modeling laryngeal disorders and structures will contribute to future research into treatments and diagnosis. Now that the groundwork has been laid, it is crucial to begin evaluating such techniques in patient populations.
Collapse
|
21
|
Lucero JC, Schoentgen J, Haas J, Luizard P, Pelorson X. Self-entrainment of the right and left vocal fold oscillators. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:2036-46. [PMID: 25920854 DOI: 10.1121/1.4916601] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
This article presents an analysis of entrained oscillations of the right and left vocal folds in the presence of asymmetries. A simple one-mass model is proposed for each vocal fold. A stiffness asymmetry and open glottis oscillations are considered first, and regions of oscillation are determined by a stability analysis and an averaging technique. The results show that the subglottal threshold pressure for 1:1 entrainment increases with the asymmetry. Within that region, both folds oscillate with the same amplitude and with the lax fold delayed in time with regard to the tense fold. At large asymmetries, a region involving several different phase entrainments or toroidal regimes at constant threshold pressure appears. The effect of vocal fold collisions and asymmetry in the damping coefficients of the oscillators are explored next by means of numerical analyses. It is shown that the damping asymmetry expands the 1:1 entrainment region at low subglottal pressures across the whole asymmetry range. In the expanded region, the oscillator with the lowest natural frequency is dominant and the other oscillator has a large phase advance and small amplitude. The theoretical results are finally compared with data collected from a mechanical replica of the vocal folds.
Collapse
Affiliation(s)
- Jorge C Lucero
- Department of Computer Science, University of Brasilia, Brasilia, Federal District, 70910-900, Brazil
| | - Jean Schoentgen
- Laboratories of Image, Signal Processing and Acoustics, Université Libre de Bruxelles, Faculty of Applied Sciences 50, Avenue Franklin D. Roosevelt, B-1050, Brussels, Belgium
| | - Jessy Haas
- Grenoble Images Parole Signal Automatique, Unité Mixte de Recherche 5216, Centre National de la Recherche Scientifique, Grenoble Universities, 961 rue de la Houille Blanche, BP 46, 38402 Saint-Martin d'Heres, France
| | - Paul Luizard
- Grenoble Images Parole Signal Automatique, Unité Mixte de Recherche 5216, Centre National de la Recherche Scientifique, Grenoble Universities, 961 rue de la Houille Blanche, BP 46, 38402 Saint-Martin d'Heres, France
| | - Xavier Pelorson
- Grenoble Images Parole Signal Automatique, Unité Mixte de Recherche 5216, Centre National de la Recherche Scientifique, Grenoble Universities, 961 rue de la Houille Blanche, BP 46, 38402 Saint-Martin d'Heres, France
| |
Collapse
|
22
|
Xue Q, Zheng X, Mittal R, Bielamowicz S. Computational study of effects of tension imbalance on phonation in a three-dimensional tubular larynx model. J Voice 2014; 28:411-9. [PMID: 24725589 DOI: 10.1016/j.jvoice.2013.12.016] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Accepted: 12/23/2013] [Indexed: 10/25/2022]
Abstract
OBJECTIVES The present study explores the use of a continuum-based computational model to investigate the effect of left-right tension imbalance on vocal fold (VF) vibrations and glottal aerodynamics, as well as its implication on phonation. The study allows us to gain new insights into the underlying physical mechanism of irregularities induced by VF tension imbalance associated with unilateral cricothyroid muscle paralysis. METHODS A three-dimensional simulation of glottal flow and VF dynamics in a tubular laryngeal model with tension imbalance was conducted by using a coupled flow-structure interaction computational model. Tension imbalance was modeled by reducing by 20% the Young's modulus of one of the VFs, while holding VF length constant. Effects of tension imbalance on vibratory characteristic of the VFs and on the time-varying properties of glottal airflow as well as the aerodynamic energy transfer are comprehensively analyzed. RESULTS AND CONCLUSIONS The analysis demonstrates that the continuum-based biomechanical model can provide a good description of phonatory dynamics in tension imbalance conditions. It is found that although 20% tension imbalance does not have noticeable effects on the fundamental frequency, it does lead to a larger glottal flow leakage and asymmetric vibrations of the two VFs. A detailed analysis of the energy transfer suggests that the majority of the energy is consumed by the lateral motion of the VFs and the net energy transferred to the softer fold is less than the one transferred to the normal fold.
Collapse
Affiliation(s)
- Qian Xue
- Department of Mechanical Engineering, University of Maine, Orono, Maine
| | - Xudong Zheng
- Department of Mechanical Engineering, University of Maine, Orono, Maine.
| | - Rajat Mittal
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Steven Bielamowicz
- Division of Otolaryngology, The George Washington University, Washington, District of Columbia
| |
Collapse
|
23
|
Kwon SK, Kim HB, Song JJ, Cho CG, Park SW, Choi JS, Ryu J, Oh SH, Lee JH. Vocal fold augmentation with injectable polycaprolactone microspheres/pluronic F127 hydrogel: long-term in vivo study for the treatment of glottal insufficiency. PLoS One 2014; 9:e85512. [PMID: 24465582 PMCID: PMC3899012 DOI: 10.1371/journal.pone.0085512] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2013] [Accepted: 11/27/2013] [Indexed: 11/18/2022] Open
Abstract
There is increasing demand for reconstruction of glottal insufficiency. Several injection materials have been examined for this purpose, but all had limitations, such as poor long-term durability, migration from the injection site, inflammation, granuloma formation, and interference with vocal fold vibration due to viscoelastic mismatch. Here, we developed a novel injection material, consisting of polycaprolactone (PCL) microspheres, which exhibits better viscoelasticity than conventional materials, and Pluronic F127 carrier, which decreases the migration of the injection materials. The material was injected into rabbits with glottal insufficiency and compared with the FDA-approved injection material, calcium hydroxylapatite (CaHA). Endoscopic and histological examinations indicated that PCL/Pluronic F127 remained at the injection site with no inflammatory response or granuloma formation, whereas CaHA leaked out and migrated from the injection site. Therefore, vocal fold augmentation was almost completely retained during the 12-month follow-up period in this study. Moreover, induced phonation and high-speed recording of vocal fold vibration showed decreased vocal fold gap area in the PCL/Pluronic F127 group. Our newly developed injection material, PCL/Pluronic F127, permits efficient augmentation of paralyzed vocal fold without complications, a concept that can be applied clinically, as demonstrated by the successful long-term follow-up.
Collapse
Affiliation(s)
- Seong Keun Kwon
- Department of Otorhinolaryngology, Head and Neck Surgery, Seoul National University Hospital, Seoul, Republic of Korea
- Cancer Research Institute, Seoul, Republic of Korea
- Seoul National University Medical Research Center, Seoul, Republic of Korea
- * E-mail:
| | - Hee-Bok Kim
- Department of Otorhinolaryngology, Head and Neck Surgery, Dongguk University Ilsan Hospital, Goyang, Republic of Korea
| | - Jae-Jun Song
- Department of Otorhinolaryngology, Head and Neck Surgery, Dongguk University Ilsan Hospital, Goyang, Republic of Korea
| | - Chang Gun Cho
- Department of Otorhinolaryngology, Head and Neck Surgery, Dongguk University Ilsan Hospital, Goyang, Republic of Korea
| | - Seok-Won Park
- Department of Otorhinolaryngology, Head and Neck Surgery, Dongguk University Ilsan Hospital, Goyang, Republic of Korea
| | - Jong-Sun Choi
- Department of Pathology, Dongguk University Ilsan Hospital, Goyang, Republic of Korea
| | - Junsun Ryu
- Head and Neck Oncology Clinic, National Cancer Center, Goyang, Republic of Korea
| | - Se Heang Oh
- Department of Nanobiomedical Science & WCU Research Center, Dankook University, Cheonan, Republic of Korea
| | - Jin Ho Lee
- Department of Advanced Materials, Hannam University, Yuseong Gu, Daejeon, Republic of Korea
| |
Collapse
|
24
|
Pinheiro AP, Kerschen G. Vibrational dynamics of vocal folds using nonlinear normal modes. Med Eng Phys 2013; 35:1079-88. [DOI: 10.1016/j.medengphy.2012.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Revised: 10/08/2012] [Accepted: 11/04/2012] [Indexed: 10/27/2022]
|
25
|
Chan A, Mongeau L, Kost K. Vocal fold vibration measurements using laser Doppler vibrometry. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:1667-1676. [PMID: 23464036 PMCID: PMC3606305 DOI: 10.1121/1.4789937] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/23/2012] [Revised: 11/13/2012] [Accepted: 01/16/2013] [Indexed: 05/30/2023]
Abstract
The objective of this study was to measure the velocity of the superior surface of human vocal folds during phonation using laser Doppler vibrometry (LDV). A custom-made endoscopic laser beam deflection unit was designed and fabricated. An in vivo clinical experimental procedure was developed to simultaneously collect LDV velocity and video from videolaryngoscopy. The velocity along the direction of the laser beam, i.e., the inferior-superior direction, was captured. The velocity was synchronous with electroglottograph and sound level meter data. The vibration energy of the vocal folds was determined to be significant up to a frequency of 3 kHz. Three characteristic vibrational waveforms were identified which may indicate bifurcations between vibrational modes of the mucosal wave. No relationship was found between the velocity amplitude and phonation frequency or sound pressure level. A correlation was found between the peak-to-peak displacement amplitude and phonation frequency. A sparse map of the velocity amplitudes on the vocal fold surface was obtained.
Collapse
Affiliation(s)
- Alfred Chan
- Department of Mechanical Engineering, McGill University, Montreal, Quebec H3A 2K6, Canada.
| | | | | |
Collapse
|
26
|
Xue Q, Mittal R, Zheng X, Bielamowicz S. Computational modeling of phonatory dynamics in a tubular three-dimensional model of the human larynx. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 132:1602-13. [PMID: 22978889 PMCID: PMC3460983 DOI: 10.1121/1.4740485] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Simulation of the phonatory flow-structure interaction has been conducted in a three-dimensional, tubular shaped laryngeal model that has been designed with a high level of realism with respect to the human laryngeal anatomy. A non-linear spring-based contact force model is also implemented for the purpose of representing contact in more general conditions, especially those associated with three-dimensional modeling of phonation in the presence of vocal fold pathologies. The model is used to study the effects of a moderate (20%) vocal-fold tension imbalance on the phonatory dynamics. The characteristic features of phonation for normal as well as tension-imbalanced vocal folds, such as glottal waveform, glottal jet evolution, mucosal wave-type vocal-fold motion, modal entrainment, and asymmetric glottal jet deflection have been discussed in detail and compared to established data. It is found that while a moderate level of tension asymmetry does not change the vibratory dynamics significantly, it can potentially lead to measurable deterioration in voice quality.
Collapse
Affiliation(s)
- Q Xue
- Department of Mechanical Engineering, Johns Hopkins University, 126 Latrobe Hall, 3400 North Charles Street, Baltimore, Maryland 21218, USA
| | | | | | | |
Collapse
|
27
|
Choi SH, Zhang Y, Jiang JJ, Bless DM, Welham NV. Nonlinear dynamic-based analysis of severe dysphonia in patients with vocal fold scar and sulcus vocalis. J Voice 2012; 26:566-76. [PMID: 22516315 DOI: 10.1016/j.jvoice.2011.09.006] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Accepted: 09/15/2011] [Indexed: 11/24/2022]
Abstract
OBJECTIVE The primary goal of this study was to evaluate a nonlinear dynamic approach to the acoustic analysis of dysphonia associated with vocal fold scar and sulcus vocalis. STUDY DESIGN Case-control study. METHODS Acoustic voice samples from scar/sulcus patients and age-/sex-matched controls were analyzed using correlation dimension (D2) and phase plots, time-domain based perturbation indices (jitter, shimmer, signal-to-noise ratio [SNR]), and an auditory-perceptual rating scheme. Signal typing was performed to identify samples with bifurcations and aperiodicity. RESULTS Type 2 and 3 acoustic signals were highly represented in the scar/sulcus patient group. When data were analyzed irrespective of signal type, all perceptual and acoustic indices successfully distinguished scar/sulcus patients from controls. Removal of type 2 and 3 signals eliminated the previously identified differences between experimental groups for all acoustic indices except D2. The strongest perceptual-acoustic correlation in our data set was observed for SNR and the weakest correlation was observed for D2. CONCLUSIONS These findings suggest that D2 is inferior to time-domain based perturbation measures for the analysis of dysphonia associated with scar/sulcus; however, time-domain based algorithms are inherently susceptible to inflation under highly aperiodic (ie, type 2 and 3) signal conditions. Auditory-perceptual analysis, unhindered by signal aperiodicity, is therefore a robust strategy for distinguishing scar/sulcus patient voices from normal voices. Future acoustic analysis research in this area should consider alternative (e.g., frequency- and quefrency-domain based) measures alongside additional nonlinear approaches.
Collapse
Affiliation(s)
- Seong Hee Choi
- Department of Audiology and Speech-Language Pathology, Catholic University of Daegu, Gyeongsan, Korea
| | | | | | | | | |
Collapse
|
28
|
Inwald EC, Döllinger M, Schuster M, Eysholdt U, Bohr C. Multiparametric Analysis of Vocal Fold Vibrations in Healthy and Disordered Voices in High-Speed Imaging. J Voice 2011; 25:576-90. [DOI: 10.1016/j.jvoice.2010.04.004] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 04/07/2010] [Indexed: 10/19/2022]
|
29
|
Tao C, Liu X, Jiang JJ. Global modeling of complex data series using the term-ranking approach and its application to voice synthesis. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2011; 84:026205. [PMID: 21929079 DOI: 10.1103/physreve.84.026205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 05/12/2011] [Indexed: 05/31/2023]
Abstract
A term-ranking approach is proposed to globally model the underlying dynamics of a chaotic series. The basic idea of this approach is to rank candidate bases before they are used to construct the global model. The ranked bases are involved in the global model one by one in a sequence from high to low until the best model is found. Simulations show that the model obtained by the term-ranking approach has a much longer prediction time, but fewer coefficients, than the widely used standard model. The proposed approach is also successfully applied to coding and synthesis of chaoslike voice data, showing promise for its use with truly noisy experimental data.
Collapse
Affiliation(s)
- Chao Tao
- Key Lab of Modern Acoustics, Ministry of Education, Nanjing University, Nanjing 210093, People's Republic of China.
| | | | | |
Collapse
|
30
|
Veeraraghavan A, Reddy D, Raskar R. Coded strobing photography: compressive sensing of high speed periodic videos. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2011; 33:671-686. [PMID: 20421670 DOI: 10.1109/tpami.2010.87] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
We show that, via temporal modulation, one can observe and capture a high-speed periodic video well beyond the abilities of a low-frame-rate camera. By strobing the exposure with unique sequences within the integration time of each frame, we take coded projections of dynamic events. From a sequence of such frames, we reconstruct a high-speed video of the high-frequency periodic process. Strobing is used in entertainment, medical imaging, and industrial inspection to generate lower beat frequencies. But this is limited to scenes with a detectable single dominant frequency and requires high-intensity lighting. In this paper, we address the problem of sub-Nyquist sampling of periodic signals and show designs to capture and reconstruct such signals. The key result is that for such signals, the Nyquist rate constraint can be imposed on the strobe rate rather than the sensor rate. The technique is based on intentional aliasing of the frequency components of the periodic signal while the reconstruction algorithm exploits recent advances in sparse representations and compressive sensing. We exploit the sparsity of periodic signals in the Fourier domain to develop reconstruction algorithms that are inspired by compressive sensing.
Collapse
Affiliation(s)
- Ashok Veeraraghavan
- Mitsubishi Electric Research Labs, 201 Broadway, 8th Floor, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
31
|
Voigt D, Döllinger M, Yang A, Eysholdt U, Lohscheller J. Automatic diagnosis of vocal fold paresis by employing phonovibrogram features and machine learning methods. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2010; 99:275-288. [PMID: 20138386 DOI: 10.1016/j.cmpb.2010.01.004] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2009] [Revised: 11/02/2009] [Accepted: 01/09/2010] [Indexed: 05/28/2023]
Abstract
The clinical diagnosis of voice disorders is based on examination of the rapidly moving vocal folds during phonation (f0: 80-300Hz) with state-of-the-art endoscopic high-speed cameras. Commonly, analysis is performed in a subjective and time-consuming manner via slow-motion video playback and exhibits low inter- and intra-rater reliability. In this study an objective method to overcome this drawback is presented being based on Phonovibrography, a novel image analysis technique. For a collective of 45 normophonic and paralytic voices the laryngeal dynamics were captured by specialized Phonovibrogram features and analyzed with different machine learning algorithms. Classification accuracies reached 93% for 2-class and 73% for 3-class discrimination. The results were validated by subjective expert ratings given the same diagnostic criteria. The automatic Phonovibrogram analysis approach exceeded the experienced raters' classifications by 9%. The presented method holds a lot of potential for providing reliable vocal fold diagnosis support in the future.
Collapse
Affiliation(s)
- Daniel Voigt
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Bohlenplatz 21, D-91054 Erlangen, Germany.
| | | | | | | | | |
Collapse
|
32
|
Kimura M, Nito T, Imagawa H, Sakakibara KI, Chan RW, Tayama N. Collagen injection for correcting vocal fold asymmetry: high-speed imaging. Ann Otol Rhinol Laryngol 2010; 119:359-68. [PMID: 20583733 DOI: 10.1177/000348941011900601] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVES We hypothesized that high-speed digital imaging with videokymographic and laryngotopographic analysis would provide a quantitative method to evaluate the effect of collagen injection for the correction of asymmetric and irregular vocal fold vibration in unilateral vocal fold paralysis. METHODS Videokymographic and laryngotopographic analysis was performed for high-speed digital recordings of vocal fold vibration for visualizing the glottal vibratory patterns, and for quantifying the frequency of vibration of each vocal fold, respectively, including comparisons between the paralyzed and normal vocal folds before and after surgery. This included prospective observations of 11 subjects with unilateral vocal fold paralysis (4 male, 7 female; mean +/- SD age, 67.1 +/- 12.0 years) using high-speed digital image analysis before and after collagen injection. RESULTS Analysis of the laryngotopographs revealed 2 distinct frequencies of vibration for the paralyzed and contralateral vocal folds for 8 of the 11 subjects before surgery. After collagen injection, the vibration frequencies became identical, despite asymmetric vibration amplitudes. Asymmetric vibration amplitudes were also observed in the other 3 subjects before surgery, but the amplitudes became symmetric after collagen injection, despite a persistent phase shift. CONCLUSIONS Asymmetric vibration in vocal fold paralysis was exemplified by differences in vibration frequency and amplitude between the vocal folds. The present study showed that after collagen injection, these aspects of vibratory patterns improved toward symmetry. This surgical procedure could improve the functional symmetry of the larynx for phonation.
Collapse
Affiliation(s)
- Miwako Kimura
- Dept of Otolaryngology-Head and Neck Surgery, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9035, USA
| | | | | | | | | | | |
Collapse
|
33
|
Kimura M, Imagawa H, Nito T, Sakakibara KI, Chan RW, Tayama N. Arytenoid Adduction for Correcting Vocal Fold Asymmetry: High-Speed Imaging. Ann Otol Rhinol Laryngol 2010; 119:439-46. [DOI: 10.1177/000348941011900703] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Objectives We hypothesized that high-speed digital imaging provides a quantitative method to evaluate the effect of arytenoid adduction for the correction of asymmetric and irregular vocal fold vibration in unilateral vocal fold paralysis. Methods Six subjects with unilateral vocal fold paralysis participated in the study (4 male, 2 female; mean [±SD] age, 52.5 ± 21.3 years). Videokymographic and laryngotopographic methods for image analysis were performed for highspeed recordings of vocal fold vibration for visualizing the glottal vibratory patterns, and for quantifying the frequency of vibration of each vocal fold, respectively. Comparisons of the paralyzed and the normal vocal folds were made before and after arytenoid adduction. Results Analysis of the laryngotopographs revealed 2 distinct frequencies of vibration for the paralyzed and the contralateral vocal folds for all subjects before surgery. After arytenoid adduction, the vibration frequencies became identical or nearly identical in all subjects. Conclusions Asymmetric vibration in vocal fold paralysis was exemplified by differences in vibration frequency between the vocal folds. The present data showed that after arytenoid adduction the vibration frequencies and the vibratory patterns of the contralateral vocal folds approached symmetry. This surgical procedure could improve the functional symmetry of the larynx for phonation.
Collapse
Affiliation(s)
- Miwako Kimura
- Departments of Otolaryngology–Head and Neck Surgery, Tokyo, Japan
- University of Texas Southwestern Medical Center, Dallas, Texas, Department of Otolaryngology, International Medical Center of Japan, Tokyo, Japan
| | - Hiroshi Imagawa
- Department of Otorhinolaryngology–Head and Neck Surgery, University of Tokyo, Tokyo, Japan
| | - Takaharu Nito
- Department of Otorhinolaryngology–Head and Neck Surgery, University of Tokyo, Tokyo, Japan
| | - Ken-Ichi Sakakibara
- Department of Communication Disorders, Health Sciences University of Hokkaido, Hokkaido, Japan
| | - Roger W. Chan
- Departments of Otolaryngology–Head and Neck Surgery, Tokyo, Japan
- Biomedical Engineering, Tokyo, Japan
| | - Niro Tayama
- University of Texas Southwestern Medical Center, Dallas, Texas, Department of Otolaryngology, International Medical Center of Japan, Tokyo, Japan
- Department of Otorhinolaryngology–Head and Neck Surgery, University of Tokyo, Tokyo, Japan
| |
Collapse
|
34
|
Classification of functional voice disorders based on phonovibrograms. Artif Intell Med 2010; 49:51-9. [DOI: 10.1016/j.artmed.2010.01.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Revised: 08/20/2009] [Accepted: 01/10/2010] [Indexed: 11/17/2022]
|
35
|
Tokuda IT, Zemke M, Kob M, Herzel H. Biomechanical modeling of register transitions and the role of vocal tract resonators. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1528-36. [PMID: 20329853 DOI: 10.1121/1.3299201] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Biomechanical modeling and bifurcation theory are applied to study phonation onset and register transition. A four-mass body-cover model with a smooth geometry is introduced to reproduce characteristic features of chest and falsetto registers. Sub- and supraglottal resonances are modeled using a wave-reflection model. Simulations for increasing and decreasing subglottal pressure reveal that the phonation onset exhibits amplitude jumps and hysteresis referring to a subcritical Hopf bifurcation. The onset pressure is reduced due to vocal tract resonances. Hysteresis is observed also for the voice breaks at the chest-falsetto transition. Varying the length of the subglottal resonator has only minor effects on this register transition. Contrarily, supraglottal resonances have a strong effect on the pitch, at which the chest-falsetto transition is found. Experiment of glissando singing shows that the supraglottis has indeed an influence on the register transition.
Collapse
Affiliation(s)
- Isao T Tokuda
- School of Information Science, Japan Advanced Institute of Science and Technology, Nomi-city, Ishikawa 923-1292, Japan.
| | | | | | | |
Collapse
|
36
|
Yang A, Lohscheller J, Berry DA, Becker S, Eysholdt U, Voigt D, Döllinger M. Biomechanical modeling of the three-dimensional aspects of human vocal fold dynamics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 127:1014-31. [PMID: 20136223 PMCID: PMC3137461 DOI: 10.1121/1.3277165] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2009] [Revised: 10/15/2009] [Accepted: 11/24/2009] [Indexed: 05/23/2023]
Abstract
Human voice originates from the three-dimensional (3D) oscillations of the vocal folds. In previous studies, biomechanical properties of vocal fold tissues have been predicted by optimizing the parameters of simple two-mass-models to fit its dynamics to the high-speed imaging data from the clinic. However, only lateral and longitudinal displacements of the vocal folds were considered. To extend previous studies, a 3D mass-spring, cover-model is developed, which predicts the 3D vibrations of the entire medial surface of the vocal fold. The model consists of five mass planes arranged in vertical direction. Each plane contains five longitudinal, mass-spring, coupled oscillators. Feasibility of the model is assessed using a large body of dynamical data previously obtained from excised human larynx experiments, in vivo canine larynx experiments, physical models, and numerical models. Typical model output was found to be similar to existing findings. The resulting model enables visualization of the 3D dynamics of the human vocal folds during phonation for both symmetric and asymmetric vibrations.
Collapse
Affiliation(s)
- Anxiong Yang
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Bohlenplatz 21, 91054 Erlangen, Germany.
| | | | | | | | | | | | | |
Collapse
|
37
|
Advances in laryngeal imaging. Eur Arch Otorhinolaryngol 2009; 266:1509-20. [PMID: 19618198 DOI: 10.1007/s00405-009-1050-4] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 07/07/2009] [Indexed: 10/20/2022]
Abstract
Imaging and image analysis became an important issue in laryngeal diagnostics. Various techniques, such as videostroboscopy, videokymography, digital kymography, or ultrasonography are available and are used in research and clinical practice. This paper reviews recent advances in imaging for laryngeal diagnostics.
Collapse
|
38
|
Elemans CPH, Muller M, Larsen ON, van Leeuwen JL. Amplitude and frequency modulation control of sound production in a mechanical model of the avian syrinx. ACTA ACUST UNITED AC 2009; 212:1212-24. [PMID: 19329754 DOI: 10.1242/jeb.026872] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Birdsong has developed into one of the important models for motor control of learned behaviour and shows many parallels with speech acquisition in humans. However, there are several experimental limitations to studying the vocal organ - the syrinx - in vivo. The multidisciplinary approach of combining experimental data and mathematical modelling has greatly improved the understanding of neural control and peripheral motor dynamics of sound generation in birds. Here, we present a simple mechanical model of the syrinx that facilitates detailed study of vibrations and sound production. Our model resembles the 'starling resistor', a collapsible tube model, and consists of a tube with a single membrane in its casing, suspended in an external pressure chamber and driven by various pressure patterns. With this design, we can separately control 'bronchial' pressure and tension in the oscillating membrane and generate a wide variety of 'syllables' with simple sweeps of the control parameters. We show that the membrane exhibits high frequency, self-sustained oscillations in the audio range (>600 Hz fundamental frequency) using laser Doppler vibrometry, and systematically explore the conditions for sound production of the model in its control space. The fundamental frequency of the sound increases with tension in three membranes with different stiffness and mass. The lower-bound fundamental frequency increases with membrane mass. The membrane vibrations are strongly coupled to the resonance properties of the distal tube, most likely because of its reflective properties to sound waves. Our model is a gross simplification of the complex morphology found in birds, and more closely resembles mathematical models of the syrinx. Our results confirm several assumptions underlying existing mathematical models in a complex geometry.
Collapse
Affiliation(s)
- Coen P H Elemans
- Experimental Zoology Group, Wageningen University, Marijkeweg 40, NL-6709 PG Wageningen, The Netherlands.
| | | | | | | |
Collapse
|
39
|
Tao C, Jiang JJ. Effects of mucosal loading on vocal fold vibration. CHAOS (WOODBURY, N.Y.) 2009; 19:023113. [PMID: 19566248 PMCID: PMC2832046 DOI: 10.1063/1.3120293] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2008] [Accepted: 03/24/2009] [Indexed: 05/28/2023]
Abstract
A chain model was proposed in this study to examine the effects of mucosal loading on vocal fold vibration. Mucosal loading was defined as the loading caused by the interaction between the vocal folds and the surrounding tissue. In the proposed model, the vocal folds and the surrounding tissue were represented by a series of oscillators connected by a coupling spring. The lumped masses, springs, and dampers of the oscillators modeled the tissue properties of mass, stiffness, and viscosity, respectively. The coupling spring exemplified the tissue interactions. By numerically solving this chain model, the effects of mucosal loading on the phonation threshold pressure, phonation instability pressure, and energy distribution in a voice production system were studied. It was found that when mucosal loading is small, phonation threshold pressure increases with the damping constant R(r), the mass constant R(m), and the coupling constant R(mu) of mucosal loading but decreases with the stiffness constant R(k). Phonation instability pressure is also related to mucosal loading. It was found that phonation instability pressure increases with the coupling constant R(mu) but decreases with the stiffness constant R(k) of mucosal loading. Therefore, it was concluded that mucosal loading directly affects voice production.
Collapse
Affiliation(s)
- Chao Tao
- Department of Surgery, Division of Otolaryngology Head and Neck Surgery, University of Wisconsin Medical School, Madison, Wisconsin 53792-7375, USA.
| | | |
Collapse
|
40
|
Granada A, Hennig RM, Ronacher B, Kramer A, Herzel H. Phase response curves elucidating the dynamics of coupled oscillators. Methods Enzymol 2009; 454:1-27. [PMID: 19216921 DOI: 10.1016/s0076-6879(08)03801-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Phase response curves (PRCs) are widely used in circadian clocks, neuroscience, and heart physiology. They quantify the response of an oscillator to pulse-like perturbations. Phase response curves provide valuable information on the properties of oscillators and their synchronization. This chapter discusses biological self-sustained oscillators (circadian clock, physiological rhythms, etc.) in the context of nonlinear dynamics theory. Coupled oscillators can synchronize with different frequency ratios, can generate toroidal dynamics (superposition of independent frequencies), and may lead to deterministic chaos. These nonlinear phenomena can be analyzed with the aid of a phase transition curve, which is intimately related to the phase response curve. For illustration purposes, this chapter discusses a model of circadian oscillations based on a delayed negative feedback. In a second part, the chapter provides a step-by-step recipe to measure phase response curves. It discusses specifications of this recipe for circadian rhythms, heart rhythms, neuronal spikes, central pattern generators, and insect communication. Finally, it stresses the predictive power of measured phase response curves. PRCs can be used to quantify the coupling strength of oscillations, to classify oscillator types, and to predict the complex dynamics of periodically driven oscillations.
Collapse
Affiliation(s)
- A Granada
- Institute for Theoretical Biology, Humboldt-Universität zu Berlin, Berlin, Germany
| | | | | | | | | |
Collapse
|
41
|
Voice Pathology Classification by Using Features from High-Speed Videos. Artif Intell Med 2009. [DOI: 10.1007/978-3-642-02976-9_44] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
42
|
Zollinger SA, Riede T, Suthers RA. Two-voice complexity from a single side of the syrinx in northern mockingbird Mimus polyglottos vocalizations. ACTA ACUST UNITED AC 2008; 211:1978-91. [PMID: 18515729 DOI: 10.1242/jeb.014092] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The diverse vocal signals of songbirds are produced by highly coordinated motor patterns of syringeal and respiratory muscles. These muscles control separate sound generators on the right and left side of the duplex vocal organ, the syrinx. Whereas most song is under active neural control, there has been a growing interest in a different class of nonlinear vocalizations consisting of frequency jumps, subharmonics, biphonation and deterministic chaos that are also present in the vocal repertoires of many vertebrates, including many birds. These nonlinear phenomena may not require active neural control, depending instead on the intrinsic nonlinear dynamics of the oscillators housed within each side of the syrinx. This study investigates the occurrence of these phenomena in the vocalizations of intact northern mockingbirds Mimus polyglottos. By monitoring respiratory pressure and airflow on each side of the syrinx, we provide the first analysis of the contribution made by each side of the syrinx to the production of nonlinear phenomena and are able to reliably discriminate two-voice vocalizations from potentially similar appearing, unilaterally produced, nonlinear events. We present the first evidence of syringeal lateralization of nonlinear dynamics during bilaterally produced chaotic calls. The occurrence of unilateral nonlinear events was not consistently correlated with fluctuations in air sac pressure or the rate of syringeal airflow. Our data support previous hypotheses for mechanical and acoustic coupling between the two sides of the syrinx. These results help lay a foundation upon which to understand the communicative functions of nonlinear phenomena.
Collapse
Affiliation(s)
- Sue Anne Zollinger
- Department of Biology, Jordan Hall, Indiana University, Bloomington, IN 47405, USA.
| | | | | |
Collapse
|
43
|
Cisonni J, Van Hirtum A, Pelorson X, Willems J. Theoretical simulation and experimental validation of inverse quasi-one-dimensional steady and unsteady glottal flow models. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:535-545. [PMID: 18646996 DOI: 10.1121/1.2931959] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
In physical modeling of phonation, the pressure drop along the glottal constriction is classically assessed with the glottal geometry and the subglottal pressure as known input parameters. Application of physical modeling to study phonation abnormalities and pathologies requires input parameters related to in vivo measurable quantities commonly corresponding to the physical model output parameters. Therefore, the current research presents the inversion of some popular simplified flow models in order to estimate the subglottal pressure, the glottal constriction area, or the separation coefficient inherent to the simplified flow modeling for steady and unsteady flow conditions. The inverse models are firstly validated against direct simulations and secondly against in vitro measurements performed for different configurations of rigid vocal fold replicas mounted in a suitable experimental setup. The influence of the pressure corrections related to viscosity and flow unsteadiness on the flow modeling is quantified. The inversion of one-dimensional glottal flow models including the major viscous effects can predict the main flow quantities with respect to the in vitro measurements. However, the inverse model accuracy is strongly dependent on the pertinence of the direct flow modeling. The choice of the separation coefficient is preponderant to obtain pressure predictions relevant to the experimental data.
Collapse
Affiliation(s)
- Julien Cisonni
- Department of Speech and Cognition, GIPSA-Laboratory, UMR CNRS 5216, Grenoble Universities, 961 rue de la Houille Blanche, BP 46, 38402 Saint Martin d'Heres, France.
| | | | | | | |
Collapse
|
44
|
Schwarz R, Döllinger M, Wurzbacher T, Eysholdt U, Lohscheller J. Spatio-temporal quantification of vocal fold vibrations using high-speed videoendoscopy and a biomechanical model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:2717-32. [PMID: 18529190 DOI: 10.1121/1.2902167] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Pathologic changes within the organic constitution of vocal folds or a functional impairment of the larynx may result in disturbed or even irregular vocal fold vibrations. The consequences are perturbations of the acoustic speech signal which are perceived as a hoarse voice. By means of appropriate image processing techniques, the vocal fold dynamics are extracted from digital high-speed videos. This study addresses the approach to obtain a parametric description of the spatio-temporal characteristics of the vocal fold oscillations for the aim of classification. For this purpose a biomechanical vocal fold model is introduced. An automatic optimization procedure is developed for fitting the model dynamics to the observed vocal fold oscillations. Thus, the resulting parameter values represent a specific vibration pattern and serve as an objective quantification measure. Performance and reliability of the optimization procedure are validated with synthetically generated data sets. The high-speed videos of two normal voice subjects and six patients suffering from different voice disorders are processed. The resulting model parameters represent a rough approximation of physiological parameters along the entire vocal folds.
Collapse
Affiliation(s)
- Raphael Schwarz
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA.
| | | | | | | | | |
Collapse
|
45
|
Wurzbacher T, Döllinger M, Schwarz R, Hoppe U, Eysholdt U, Lohscheller J. Spatiotemporal classification of vocal fold dynamics by a multimass model comprising time-dependent parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:2324-34. [PMID: 18397036 DOI: 10.1121/1.2835435] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
A model-based approach is proposed to objectively measure and classify vocal fold vibrations by left-right asymmetries along the anterior-posterior direction, especially in the case of nonstationary phonation. For this purpose, vocal fold dynamics are recorded in real time with a digital high-speed camera during phonation of sustained vowels as well as pitch raises. The dynamics of a multimass model with time-dependent parameters are matched to vocal fold vibrations extracted at dorsal, medial, and ventral positions by an automatic optimization procedure. The block-based optimization accounts for nonstationary vibrations and compares the vocal fold and model dynamics by wavelet coefficients. The optimization is verified with synthetically generated data sets and is applied to 40 clinical high-speed recordings comprising normal and pathological voice subjects. The resulting model parameters allow an intuitive visual assessment of vocal fold instabilities within an asymmetry diagram and are applicable to an objective quantification of asymmetries.
Collapse
Affiliation(s)
- Tobias Wurzbacher
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen, Medical School, Erlangen, Germany.
| | | | | | | | | | | |
Collapse
|
46
|
Lohscheller J, Eysholdt U, Toy H, Dollinger M. Phonovibrography: mapping high-speed movies of vocal fold vibrations into 2-D diagrams for visualizing and analyzing the underlying laryngeal dynamics. IEEE TRANSACTIONS ON MEDICAL IMAGING 2008; 27:300-9. [PMID: 18334426 DOI: 10.1109/tmi.2007.903690] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Endoscopic high-speed laryngoscopy in combination with image analysis strategies is the most promising approach to investigate the interrelation between vocal fold vibrations and voice disorders. So far, due to the lack of an objective and standardized analysis procedure a unique characterization of vocal fold vibrations has not been achieved yet. We present a visualization and analysis strategy which transforms the segmented edges of vibrating vocal folds into a single 2-D image, denoted Phonovibrogram (PVG). Within a PVG the individual type of vocal fold vibration becomes uniquely characterized by specific geometric patterns. The PVG geometries give an intuitive access on the type and degree of the laryngeal asymmetry and can be quantified using an image segmentation approach. The PVG analysis was applied to 14 representative recordings derived from a high-speed database comprising normal and pathological voices. We demonstrate that PVGs are capable to differentiate and quantify different types of normal and pathological vocal fold vibrations. The objective and precise quantification of the PVG geometry may have the potential to realize a novel classification of vocal fold vibrations.
Collapse
Affiliation(s)
- Jörg Lohscheller
- Department of Phoniatrics and Pediatric Audiology, University Hospital Erlangen Medical School, 91054 Erlangen, Germany.
| | | | | | | |
Collapse
|
47
|
Calibration of laryngeal endoscopic high-speed image sequences by an automated detection of parallel laser line projections. Med Image Anal 2008; 12:300-17. [PMID: 18373942 DOI: 10.1016/j.media.2007.12.007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2007] [Revised: 10/16/2007] [Accepted: 12/15/2007] [Indexed: 11/22/2022]
Abstract
High-speed laryngeal endoscopic systems record vocal fold vibrations during phonation in real-time. For a quantitative analysis of vocal fold dynamics a metrical scale is required to get absolute laryngeal dimensions of the recorded image sequence. For the clinical use there is no automated and stable calibration procedure up to now. A calibration method is presented that consists of a laser projection device and the corresponding image processing for the automated detection of the laser calibration marks. The laser projection device is clipped to the endoscope and projects two parallel laser lines with a known distance to each other as calibration information onto the vocal folds. Image processing methods automatically identify the pixels belonging to the projected laser lines in the image data. The line detection bases on a Radon transform approach and is a two-stage process, which successively uses temporal and spatial characteristics of the projected laser lines in the high-speed image sequence. The robustness and the applicability are demonstrated with clinical endoscopic image sequences. The combination of the laser projection device and the image processing enables the calibration of laryngeal endoscopic images within the vocal fold plane and thus provides quantitative metrical data of vocal fold dynamics.
Collapse
|
48
|
Tokuda IT, Horácek J, Svec JG, Herzel H. Comparison of biomechanical modeling of register transitions and voice instabilities with excised larynx experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 122:519-31. [PMID: 17614509 DOI: 10.1121/1.2741210] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Voice instabilities were studied using excised human larynx experiments and biomechanical modeling. With a controlled elongation of the vocal folds, the experiments showed registers with chest-like and falsetto-like vibrations. Observed instabilities included abrupt jumps between the two registers exhibiting hysteresis, aphonic episodes, subharmonics, and chaos near the register transitions. In order to model these phenomena, a three-mass model was constructed by adding a third mass on top of the simplified two-mass model. Simulation studies showed that the three-mass model can vibrate in both chest-like and falsetto-like patterns. Variation of tension parameters which mimic activities of laryngeal muscles could induce transitions between both registers. For reduced prephonatory areas and damping constants, extended coexistence of chest and falsetto registers was found, in agreement with experimental data. Subharmonics and deterministic chaos were observed close to transitions between the registers. It is concluded that the abrupt changes between chest and falsetto registers can be understood as shifts in dominance of eigenmodes of the vocal folds.
Collapse
Affiliation(s)
- Isao T Tokuda
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan.
| | | | | | | |
Collapse
|
49
|
Tao C, Zhang Y, Jiang JJ. Extracting Physiologically Relevant Parameters of Vocal Folds From High-Speed Video Image Series. IEEE Trans Biomed Eng 2007; 54:794-801. [PMID: 17518275 DOI: 10.1109/tbme.2006.889182] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this paper, a new method is proposed to extract the physiologically relevant parameters of the vocal fold mathematic model including masses, spring constants and damper constants from high-speed video (HSV) image series. This method uses a genetic algorithm to optimize the model parameters until the model and the realistic vocal folds have similar dynamic behavior. Numerical experiments theoretically test the validity of the proposed parameter estimation method. Then the validated method is applied to extract the physiologically relevant parameters from the glottal area series measured by HSV in an excised larynx model. With the estimated parameters, the vocal fold model accurately describes the vibration of the observed vocal folds. Further studies show that the proposed parameter estimation method can successfully detect the increase of longitudinal tension due to the vocal fold elongation from the glottal area signal. These results imply the potential clinical application of this method in inspecting the tissue properties of vocal fold.
Collapse
Affiliation(s)
- Chao Tao
- Department of Surgery, Division of Otolaryngology Head and Neck Surgery, University of Wisconsin Medical School, Madison, WI 53792-7375, USA.
| | | | | |
Collapse
|
50
|
Lohscheller J, Toy H, Rosanowski F, Eysholdt U, Döllinger M. Clinically evaluated procedure for the reconstruction of vocal fold vibrations from endoscopic digital high-speed videos. Med Image Anal 2007; 11:400-13. [PMID: 17544839 DOI: 10.1016/j.media.2007.04.005] [Citation(s) in RCA: 120] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2006] [Revised: 02/27/2007] [Accepted: 04/24/2007] [Indexed: 11/29/2022]
Abstract
Investigation of voice disorders requires the examination of vocal fold vibrations. State of the art is the recording of endoscopic high-speed movies which capture vocal fold vibrations in real-time. It enables investigating the interrelation between disturbances of vocal fold vibrations and voice disorders. However, the lack of clinical studies and of a standardized procedure to reconstruct vocal fold vibrations from high-speed videos constrain the clinical acceptance of the high-speed technique. An image processing approach is presented that extracts the vibrating vocal fold edges from digital high-speed movies. The initial segmentation is principally based on a seeded region-growing algorithm. Even in movies with low image quality the algorithm segments successfully the glottal area by an introduced two-dimensional threshold matrix. Following segmentation, the vocal fold edges are reconstructed from the computed time-varying glottal area. The performance of the procedure was objectively evaluated within a study comprising 372 high-speed recordings. The accuracy of vocal fold reconstruction exceeds manual segmentation results obtained by clinical experts. The algorithm reaches an information flow-rate of up to 98 images per second. The robustness and high accuracy of the procedure makes it suitable for the application in clinical routine. It enables an objective and highly accurate description of vocal fold vibrations which is essential to realize extensive clinical studies which focus on the classification of voice disorders.
Collapse
Affiliation(s)
- Jörg Lohscheller
- Department of Phoniatrics and Pediatric Audiology, Bohlenplatz 21, 91054 Erlangen, Germany.
| | | | | | | | | |
Collapse
|