1
|
Baker CP, Sundberg J, Purdy SC, Rakena TO, Leão SHDS. CPPS and Voice-Source Parameters: Objective Analysis of the Singing Voice. J Voice 2024; 38:549-560. [PMID: 35000836 DOI: 10.1016/j.jvoice.2021.12.010] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 12/08/2021] [Accepted: 12/13/2021] [Indexed: 11/19/2022]
Abstract
INTRODUCTION In recent years cepstral analysis and specific cepstrum-based measures such as smoothed cepstral peak prominence (CPPS) has become increasingly researched and utilized in attempts to determine the extent of overall dysphonia in voice signals. Yet, few studies have extensively examined how specific voice-source parameters affect CPPS values. OBJECTIVE Using a range of synthesized tones, this exploratory study sought to systematically analyze the effect of fundamental frequency (fo), vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental on CPPS values. MATERIALS AND METHODS A series of scales were synthesised using the freeware Madde. Fundamental frequency, vibrato extent, source-spectrum tilt, and the amplitude of the voice-source fundamental were systematically and independently varied. The tones were analysed in PRAAT, and statistical analyses were conducted in SPSS. RESULTS CPPS was significantly affected by both fo and source-spectrum tilt, independently. A nonlinear association was seen between vibrato extent and CPPS, where CPPS values increased from 0 to 0.6 semitones (ST), then rapidly decreased approaching 1.0 ST. No relationship was seen between the amplitude of the voice-source fundamental and CPPS. CONCLUSION The large effect of fo should be taken into account when analyzing the voice, particularly in singing-voice research, when comparing pre and posttreatment data, and when comparing inter-subject CPPS data.
Collapse
Affiliation(s)
- Calvin P Baker
- Department of Voice, School of Music, University of Auckland, Auckland Central, Auckland, New Zealand.
| | - Johan Sundberg
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH (Royal Institute of Technology), Stockholm, Sweden; Department of Linguistics, Stockholm University, Stockholm, Sweden; University College of Music Education Stockholm, Sweden
| | - Suzanne C Purdy
- School of Psychology, University of Auckland, Auckland Central, Auckland, New Zealand
| | - Te Oti Rakena
- Department of Voice, School of Music, University of Auckland, Auckland Central, Auckland, New Zealand
| | - Sylvia H de S Leão
- Speech Science, School of Psychology, University of Auckland, Grafton, Auckland, New Zealand
| |
Collapse
|
2
|
Kist AM, Gómez P, Dubrovskiy D, Schlegel P, Kunduk M, Echternach M, Patel R, Semmler M, Bohr C, Dürr S, Schützenberger A, Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1889-1903. [PMID: 34000199 DOI: 10.1044/2021_jslhr-20-00498] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose High-speed videoendoscopy (HSV) is an emerging, but barely used, endoscopy technique in the clinic to assess and diagnose voice disorders because of the lack of dedicated software to analyze the data. HSV allows to quantify the vocal fold oscillations by segmenting the glottal area. This challenging task has been tackled by various studies; however, the proposed approaches are mostly limited and not suitable for daily clinical routine. Method We developed a user-friendly software in C# that allows the editing, motion correction, segmentation, and quantitative analysis of HSV data. We further provide pretrained deep neural networks for fully automatic glottis segmentation. Results We freely provide our software Glottis Analysis Tools (GAT). Using GAT, we provide a general threshold-based region growing platform that enables the user to analyze data from various sources, such as in vivo recordings, ex vivo recordings, and high-speed footage of artificial vocal folds. Additionally, especially for in vivo recordings, we provide three robust neural networks at various speed and quality settings to allow a fully automatic glottis segmentation needed for application by untrained personnel. GAT further evaluates video and audio data in parallel and is able to extract various features from the video data, among others the glottal area waveform, that is, the changing glottal area over time. In total, GAT provides 79 unique quantitative analysis parameters for video- and audio-based signals. Many of these parameters have already been shown to reflect voice disorders, highlighting the clinical importance and usefulness of the GAT software. Conclusion GAT is a unique tool to process HSV and audio data to determine quantitative, clinically relevant parameters for research, diagnosis, and treatment of laryngeal disorders. Supplemental Material https://doi.org/10.23641/asha.14575533.
Collapse
Affiliation(s)
- Andreas M Kist
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Denis Dubrovskiy
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Patrick Schlegel
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Melda Kunduk
- Department of Communication Sciences and Disorders, Louisiana State University, Baton Rouge
| | - Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Germany
| | - Rita Patel
- Department of Speech, Language and Hearing Sciences, College of Arts and Sciences, Indiana University, Bloomington
| | - Marion Semmler
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Christopher Bohr
- Klinik und Poliklinik für Hals-Nasen-Ohren-Heilkunde Universitätsklinikum Regensburg, Germany
| | - Stephan Dürr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| | - Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology-Head & Neck Surgery, University Hospital Erlangen, Germany
| |
Collapse
|
3
|
Impact of Subharmonic and Aperiodic Laryngeal Dynamics on the Phonatory Process Analyzed in Ex Vivo Rabbit Models. APPLIED SCIENCES-BASEL 2019; 9. [PMID: 33815832 PMCID: PMC8018220 DOI: 10.3390/app9091963] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Normal voice is characterized by periodic oscillations of the vocal folds. On the other hand, disordered voice dynamics (e.g., subharmonic and aperiodic oscillations) are often associated with voice pathologies and dysphonia. Unfortunately, not all investigations may be conducted on human subjects; hence animal laryngeal studies have been performed for many years to better understand human phonation. The rabbit larynx has been shown to be a potential model of the human larynx. Despite this fact, only a few studies regarding the phonatory parameters of rabbit larynges have been performed. Further, to the best of our knowledge, no ex vivo study has systematically investigated phonatory parameters from high-speed, audio and subglottal pressure data with irregular oscillations. To remedy this, the present study analyzes experiments with sustained phonation in 11 ex vivo rabbit larynges for 51 conditions of disordered vocal fold dynamics. (1) The results of this study support previous findings on non-disordered data, that the stronger the glottal closure insufficiency is during phonation, the worse the phonatory characteristics are; (2) aperiodic oscillations showed worse phonatory results than subharmonic oscillations; (3) in the presence of both types of irregular vibrations, the voice quality (i.e., cepstral peak prominence) of the audio and subglottal signal greatly deteriorated compared to normal/periodic vibrations. In summary, our results suggest that the presence of both types of irregular vibration have a major impact on voice quality and should be considered along with glottal closure measures in medical diagnosis and treatment.
Collapse
|
4
|
Döllinger M, Kniesburges S, Berry DA, Birk V, Wendler O, Dürr S, Alexiou C, Schützenberger A. Investigation of phonatory characteristics using ex vivo rabbit larynges. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:142. [PMID: 30075689 PMCID: PMC6037535 DOI: 10.1121/1.5043384] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Quantitative analysis of phonatory characteristics of rabbits has been widely neglected. However, preliminary studies established the rabbit larynx as a potential model of human phonation. This study reports quantitative data on phonation using ex vivo rabbit larynx models to achieve more insight into dependencies of three main components of the phonation process, including airflow, vocal fold dynamics, and the acoustic output. Sustained phonation was induced in 11 ex vivo rabbit larynges. For 414 phonatory conditions, vocal fold vibrations, acoustic, and aerodynamic parameters were analyzed as functions of longitudinal vocal fold pre-stress, applied air flow, and glottal closure insufficiency. Dimensions of the vocal folds were measured and histological data were analyzed. Glottal closure characteristics improved for increasing longitudinal pre-stress and applied airflow. For the subglottal pressure signal only the cepstral peak prominence showed dependency on glottal closure. In contrast, vibrational, acoustic, and aerodynamic parameters were found to be highly dependent on the degree of glottal closure: The more complete the glottal closure during phonation, the better the aerodynamic and acoustic characteristics. Hence, complete or at least partial glottal closure appears to enhance acoustic signal quality. Finally, results validate the ex vivo rabbit larynx as an effective model for analyzing the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division for Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| | - Stefan Kniesburges
- Division for Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| | - David A Berry
- Laryngeal Dynamics Laboratory, Division of Head and Neck Surgery, David Geffen School of Medicine at UCLA, 1000 Veteran Avenue, 31-24 Rehab Center, Los Angeles, California 90095-1794, USA
| | - Veronika Birk
- Division for Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| | - Olaf Wendler
- Laboratory for Molecular Biology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| | - Stephan Dürr
- Division for Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, FAU Erlangen-Nürnberg, Glückstrasse 10a, Erlangen, 91054, Germany
| | - Anne Schützenberger
- Division for Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, FAU Erlangen-Nürnberg, Waldstrasse 1, Erlangen, 91054, Germany
| |
Collapse
|
5
|
Döllinger M, Gómez P, Patel RR, Alexiou C, Bohr C, Schützenberger A. Biomechanical simulation of vocal fold dynamics in adults based on laryngeal high-speed videoendoscopy. PLoS One 2017; 12:e0187486. [PMID: 29121085 PMCID: PMC5679561 DOI: 10.1371/journal.pone.0187486] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/18/2017] [Indexed: 12/18/2022] Open
Abstract
MOTIVATION Human voice is generated in the larynx by the two oscillating vocal folds. Owing to the limited space and accessibility of the larynx, endoscopic investigation of the actual phonatory process in detail is challenging. Hence the biomechanics of the human phonatory process are still not yet fully understood. Therefore, we adapt a mathematical model of the vocal folds towards vocal fold oscillations to quantify gender and age related differences expressed by computed biomechanical model parameters. METHODS The vocal fold dynamics are visualized by laryngeal high-speed videoendoscopy (4000 fps). A total of 33 healthy young subjects (16 females, 17 males) and 11 elderly subjects (5 females, 6 males) were recorded. A numerical two-mass model is adapted to the recorded vocal fold oscillations by varying model masses, stiffness and subglottal pressure. For adapting the model towards the recorded vocal fold dynamics, three different optimization algorithms (Nelder-Mead, Particle Swarm Optimization and Simulated Bee Colony) in combination with three cost functions were considered for applicability. Gender differences and age-related kinematic differences reflected by the model parameters were analyzed. RESULTS AND CONCLUSION The biomechanical model in combination with numerical optimization techniques allowed phonatory behavior to be simulated and laryngeal parameters involved to be quantified. All three optimization algorithms showed promising results. However, only one cost function seems to be suitable for this optimization task. The gained model parameters reflect the phonatory biomechanics for men and women well and show quantitative age- and gender-specific differences. The model parameters for younger females and males showed lower subglottal pressures, lower stiffness and higher masses than the corresponding elderly groups. Females exhibited higher subglottal pressures, smaller oscillation masses and larger stiffness than the corresponding similar aged male groups. Optimizing numerical models towards vocal fold oscillations is useful to identify underlying laryngeal components controlling the phonatory process.
Collapse
Affiliation(s)
- Michael Döllinger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Pablo Gómez
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Rita R. Patel
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana, Indiana, United States of America
| | - Christoph Alexiou
- Section of Experimental Oncology and Nanomedicine (SEON), Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Else Kröner-Fresenius-Stiftung-Professorship, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Christopher Bohr
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Anne Schützenberger
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head and Neck Surgery, Medical School, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|