1
|
Iob NA, He L, Ternström S, Cai H, Brockmann-Bauser M. Effects of Speech Characteristics on Electroglottographic and Instrumental Acoustic Voice Analysis Metrics in Women With Structural Dysphonia Before and After Treatment. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:1660-1681. [PMID: 38758676 DOI: 10.1044/2024_jslhr-23-00253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2024]
Abstract
PURPOSE Literature suggests a dependency of the acoustic metrics, smoothed cepstral peak prominence (CPPS) and harmonics-to-noise ratio (HNR), on human voice loudness and fundamental frequency (F0). Even though this has been explained with different oscillatory patterns of the vocal folds, so far, it has not been specifically investigated. In the present work, the influence of three elicitation levels, calibrated sound pressure level (SPL), F0 and vowel on the electroglottographic (EGG) and time-differentiated EGG (dEGG) metrics hybrid open quotient (OQ), dEGG OQ and peak dEGG, as well as on the acoustic metrics CPPS and HNR, was examined, and their suitability for voice assessment was evaluated. METHOD In a retrospective study, 29 women with a mean age of 25 years (± 8.9, range: 18-53) diagnosed with structural vocal fold pathologies were examined before and after voice therapy or phonosurgery. Both acoustic and EGG signals were recorded simultaneously during the phonation of the sustained vowels /ɑ/, /i/, and /u/ at three elicited levels of loudness (soft/comfortable/loud) and unconstrained F0 conditions. RESULTS A linear mixed-model analysis showed a significant effect of elicitation effort levels on peak dEGG, HNR, and CPPS (all p < .01). Calibrated SPL significantly influenced HNR and CPPS (both p < .01). Furthermore, F0 had a significant effect on peak dEGG and CPPS (p < .0001). All metrics showed significant changes with regard to vowel (all p < .05). However, the treatment had no effect on the examined metrics, regardless of the treatment type (surgery vs. voice therapy). CONCLUSIONS The value of the investigated metrics for voice assessment purposes when sampled without sufficient control of SPL and F0 is limited, in that they are significantly influenced by the phonatory context, be it speech or elicited sustained vowels. Future studies should explore the diagnostic value of new data collation approaches such as voice mapping, which take SPL and F0 effects into account.
Collapse
Affiliation(s)
- Naomi Anna Iob
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
| | - Lei He
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
- Department of Computational Linguistics, University of Zurich, Switzerland
| | - Sten Ternström
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Huanchen Cai
- Division of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Meike Brockmann-Bauser
- Division of Phoniatrics and Speech Pathology, Department of Otorhinolaryngology, Head and Neck Surgery, University Hospital Zurich, University of Zurich, Switzerland
| |
Collapse
|
2
|
Cai H, Ternström S, Chaffanjon P, Henrich Bernardoni N. Effects on Voice Quality of Thyroidectomy: A Qualitative and Quantitative Study Using Voice Maps. J Voice 2024:S0892-1997(24)00082-1. [PMID: 38714436 DOI: 10.1016/j.jvoice.2024.03.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/11/2024] [Accepted: 03/12/2024] [Indexed: 05/09/2024]
Abstract
OBJECTIVES This study aims to explore the effects of thyroidectomy-a surgical intervention involving the removal of the thyroid gland-on voice quality, as represented by acoustic and electroglottographic measures. Given the thyroid gland's proximity to the inferior and superior laryngeal nerves, thyroidectomy carries a potential risk of affecting vocal function. While earlier studies have documented effects on the voice range, few studies have looked at voice quality after thyroidectomy. Since voice quality effects could manifest in many ways, that a priori are unknown, we wish to apply an exploratory approach that collects many data points from several metrics. METHODS A voice-mapping analysis paradigm was applied retrospectively on a corpus of spoken and sung sentences produced by patients who had thyroid surgery. Voice quality changes were assessed objectively for 57 patients prior to surgery and 2months after surgery, by making comparative voice maps, pre- and post-intervention, of six acoustic and electroglottographic (EGG) metrics. RESULTS After thyroidectomy, statistically significant changes consistent with a worsening of voice quality were observed in most metrics. For all individual metrics, however, the effect sizes were too small to be clinically relevant. Statistical clustering of the metrics helped to clarify the nature of these changes. While partial thyroidectomy demonstrated greater uniformity than did total thyroidectomy, the type of perioperative damage had no discernible impact on voice quality. CONCLUSIONS Changes in voice quality after thyroidectomy were related mostly to increased phonatory instability in both the acoustic and EGG metrics. Clustered voice metrics exhibited a higher correlation to voice complaints than did individual voice metrics.
Collapse
Affiliation(s)
- Huanchen Cai
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Sten Ternström
- Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Philippe Chaffanjon
- University of Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France; Medical School, Université Grenoble Alpes, Grenoble, France
| | | |
Collapse
|
3
|
Luo J, Wu Y, Liu M, Li Z, Wang Z, Zheng Y, Feng L, Lu J, He F. Differentiation between depression and bipolar disorder in child and adolescents by voice features. Child Adolesc Psychiatry Ment Health 2024; 18:19. [PMID: 38287442 PMCID: PMC10826007 DOI: 10.1186/s13034-024-00708-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Accepted: 01/11/2024] [Indexed: 01/31/2024] Open
Abstract
OBJECTIVE Major depressive disorder (MDD) and bipolar disorder (BD) are serious chronic disabling mental and emotional disorders, with symptoms that often manifest atypically in children and adolescents, making diagnosis difficult without objective physiological indicators. Therefore, we aimed to objectively identify MDD and BD in children and adolescents by exploring their voiceprint features. METHODS This study included a total of 150 participants, with 50 MDD patients, 50 BD patients, and 50 healthy controls aged between 6 and 16 years. After collecting voiceprint data, chi-square test was used to screen and extract voiceprint features specific to emotional disorders in children and adolescents. Then, selected characteristic voiceprint features were used to establish training and testing datasets with the ratio of 7:3. The performances of various machine learning and deep learning algorithms were compared using the training dataset, and the optimal algorithm was selected to classify the testing dataset and calculate the sensitivity, specificity, accuracy, and ROC curve. RESULTS The three groups showed differences in clustering centers for various voice features such as root mean square energy, power spectral slope, low-frequency percentile energy level, high-frequency spectral slope, spectral harmonic gain, and audio signal energy level. The model of linear SVM showed the best performance in the training dataset, achieving a total accuracy of 95.6% in classifying the three groups in the testing dataset, with sensitivity of 93.3% for MDD, 100% for BD, specificity of 93.3%, AUC of 1 for BD, and AUC of 0.967 for MDD. CONCLUSION By exploring the characteristics of voice features in children and adolescents, machine learning can effectively differentiate between MDD and BD in a population, and voice features hold promise as an objective physiological indicator for the auxiliary diagnosis of mood disorder in clinical practice.
Collapse
Affiliation(s)
- Jie Luo
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Yuanzhen Wu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Mengqi Liu
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Zhaojun Li
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Zhuo Wang
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Yi Zheng
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China
| | - Lihui Feng
- Beijing Institute of Technology, School of Optics and Photonics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China
| | - Jihua Lu
- Beijing Institute of Technology, School of Integrated Circuits and Electronics, Zhongguancun South Street 5 Hao, Hai Dian Qu, Beijing, 100081, China.
| | - Fan He
- National Clinical Research Center for Mental Disorders, Beijing Key Laboratory of Mental Disorders, Beijing Anding Hospital, Beijing Institute for Brain Disorders Capital Medical University, De Sheng Men Wai An Kang Hu Tong 5 Hao, Xi Cheng Qu, Beijing, 100088, People's Republic of China.
| |
Collapse
|
4
|
Herbst CT, Story BH, Meyer D. Acoustical Theory of Vowel Modification Strategies in Belting. J Voice 2023:S0892-1997(23)00004-8. [PMID: 37080890 DOI: 10.1016/j.jvoice.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 04/22/2023]
Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
Collapse
Affiliation(s)
- Christian T Herbst
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia; Department of Vocal Studies, Mozarteum University, Salzburg, Austria.
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia
| |
Collapse
|
5
|
Kankare E, Rantala L, Laukkanen AM. Vocal Fatigue Index in Finnish-Speaking Population. J Voice 2023:S0892-1997(23)00092-9. [PMID: 37003862 DOI: 10.1016/j.jvoice.2023.02.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 02/24/2023] [Accepted: 02/24/2023] [Indexed: 04/03/2023]
Abstract
BACKGROUND AND OBJECTIVE Vocal fatigue is an important complaint that may indicate a voice disorder or a risk thereof. There is a need for a reliable tool to detect and quantify vocal fatigue and distinguish dysphonic and vocally healthy speakers. The Vocal Fatigue Index (VFI) questionnaire has been found valid and reliable among speakers of different languages. This study aims to validate it for speakers of Finnish. STUDY DESIGN Experimental comparative study. METHODS The VFI questionnaire was translated from English to Finnish according to the WHO recommendations. Next, it was subjected to the validation procedure. In total, 160 Finnish speakers volunteered to participate in the study. Hundred-and-eight were voice patients (83 males, 25 females) and 52 were vocally healthy controls (37 females, 15 males). As a comparison, the Voice Handicap Index (VHI) questionnaire was completed and voice samples were recorded to enable Acoustic Voice Quality Index (AVQI03.01FIN) analysis. RESULTS Results from the first and second completions of the VFI(F) questionnaire correlated strongly (Spearman's rho 0.901, P = 0.01). Answers to the individual questions the VFI(F) also correlated strongly, showing high internal consistency. Factor 1 (Tiredness of voice and avoidance of voice use) of the VFI correlated strongly with the VHI, and the two other factors (Physical discomfort associated with voicing and Improvement of symptoms) correlated moderately with the VHI. Factor one of the VFI(F) correlated moderately with AVQI03.01FIN and its sub-parameters, CPPS, HNR, and shimmer. The VFI(F) showed good construct validity, differentiating voice patients and controls at cut-off 13.5, with sensitivity of 0.963 and specificity of 0.885. Discriminatory power was strong for all factors: F1 AROC = 0.985, F2 AROC = 0.864, and F3 AROC = 0.821. CONCLUSION The VFI(F) correlates with the VHI and with AVQI01.01FIN and it is a valid and reliable tool for detecting vocal fatigue in Finnish speakers.
Collapse
Affiliation(s)
- Eliina Kankare
- Department of Rehabilitation and Psychosocial Support, Logopedics, Phoniatrics, Tampere University Hospital, Tampere, Finland; Speech and Voice Research Laboratory, Tampere University, Tampere, Finland.
| | - Leena Rantala
- Speech and Voice Research Laboratory, Tampere University, Tampere, Finland
| | | |
Collapse
|
6
|
Echternach M, Nusseck M, Strasding M, Richter B. Differences of Electroglottographical Contact Quotients between Connected Speech and Sustained Phonation in Clinical Measurement of Voice. J Voice 2023:S0892-1997(23)00077-2. [PMID: 36941166 DOI: 10.1016/j.jvoice.2023.02.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 02/15/2023] [Accepted: 02/15/2023] [Indexed: 03/23/2023]
Abstract
INTRODUCTION In clinical practice, sustained phonation is mostly used for acoustic voice measurements, while perceptual evaluation is based on connected speech. Since sustained phonation could be associated with the use of the singing voice, and since vocal registers are more relevant for singing rather than speech, it is unclear if vocal registers contribute to observable vocal fold contact differences between sustained phonation and speech. MATERIAL AND METHODS Sustained phonation (vowel [a] on comfortable pitch and loudness) and connected speech (German text: Der Nordwind und die Sonne) were analyzed for 1216 subjects (426 with and 790 without dysphonia) using the Laryngograph system (combining electroglottography and audio recordings). From these samples, fundamental frequency (ƒo), contact quotient (CQ), sound pressure level (SPL) and frequency perturbation (jitter first for sustained and cFx for connected speech) were evaluated. RESULTS Compared to connected speech, the values of ƒo and SPL were higher for sustained phonation. For female voices, ƒo difference was greater than for male voices. At the same time, and only for the females, CQ was lower for the sustained phonation, indicating a register difference. CONCLUSION In order to achieve a better comparability, sustained phonation should be standardized regarding the ƒo and SPL values in correspondence to the ƒo and SPL range of reading a text. This should also reduce the risk of using a different register for different types of phonation.
Collapse
Affiliation(s)
- Matthias Echternach
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Munich University Hospital (LMU), Munich, Germany.
| | - Manfred Nusseck
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| | - Malin Strasding
- Division of Fixed Prosthodontics and Biomaterials, Université de Genève, Geneve, Switzerland
| | - Bernhard Richter
- Institute of Musicians' Medicine, University of Freiburg Medical Center, Faculty of Medicine, University of Freiburg, Freiburg, Germany
| |
Collapse
|
7
|
Barsties V Latoszek B, Mathmann P, Neumann K. The cepstral spectral index of dysphonia, the acoustic voice quality index and the acoustic breathiness index as novel multiparametric indices for acoustic assessment of voice quality. Curr Opin Otolaryngol Head Neck Surg 2021; 29:451-457. [PMID: 34334615 DOI: 10.1097/moo.0000000000000743] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
PURPOSE OF REVIEW The objective assessment of voice quality using acoustic measures is an important pillar of voice diagnostics. This article reviews three recent acoustic measures and their clinical use in phoniatrics and laryngology. RECENT FINDINGS Two acoustic parameters, the cepstral spectral index of dysphonia (CSID) and the acoustic voice quality index (AVQI), have gained importance as validated multiparametric indices in the objective assessment of hoarseness because they include both continuous speech and sustained vowels. The acoustic breathiness index (ABI), another multiparametric index, assesses breathiness admixture during phonation and identifies it robustly, unaffected by other characteristics of dysphonia such as roughness. SUMMARY Acoustic measurements are useful diagnostic tools when used correctly with an appropriate recording system, consideration of environment and use of software programs. CSID, AVQI and ABI objectively improve the detection of voice quality abnormalities. In addition to their proven validity, their application is simple and their usability for clinicians is high.
Collapse
Affiliation(s)
- Ben Barsties V Latoszek
- Department of Phoniatrics and Pediatric Audiology, University Hospital Münster, University of Münster, Münster
- Speech-Language Pathology, SRH University of Applied Health Sciences, Düsseldorf, Germany
| | - Philipp Mathmann
- Department of Phoniatrics and Pediatric Audiology, University Hospital Münster, University of Münster, Münster
| | - Katrin Neumann
- Department of Phoniatrics and Pediatric Audiology, University Hospital Münster, University of Münster, Münster
| |
Collapse
|
8
|
Patel RR, Ternström S. Quantitative and Qualitative Electroglottographic Wave Shape Differences in Children and Adults Using Voice Map-Based Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2977-2995. [PMID: 34319772 DOI: 10.1044/2021_jslhr-20-00717] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Purpose The purpose of this study is to identify the extent to which various measurements of contacting parameters differ between children and adults during habitual range and overlap vocal frequency/intensity, using voice map-based assessment of noninvasive electroglottography (EGG). Method EGG voice maps were analyzed from 26 adults (22-45 years) and 22 children (4-8 years) during connected speech and vowel /a/ over the habitual range and the overlap vocal frequency/intensity from the voice range profile task on the vowel /a/. Mean and standard deviations of contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were obtained. Group differences were evaluated using the linear mixed model analysis for the habitual range connected speech and the vowel, whereas analysis of covariance was conducted for the overlap vocal frequency/intensity from the voice range profile task. Presence of a "knee" on the EGG wave shape was determined by visual inspection of the presence of convexity along the decontacting slope of the EGG pulse and the presence of the second derivative zero-crossing. Results The contact quotient by integration, normalized contacting speed, quotient of speed by integration, and cycle-rate sample entropy were significantly different in children compared to (a) adult males for habitual range and (b) adult males and adult females for the overlap vocal frequency/intensity. None of the children had a "knee" on the decontacting slope of the EGG slope. Conclusion EGG parameters of contact quotient by integration, normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and absence of a "knee" on the decontacting slope characterize the wave shape differences between children and adults, whereas the normalized contacting speed, quotient of speed by integration, cycle-rate sample entropy, and presence of a "knee" on the downward pulse slope characterize the wave shape differences between adult males and adult females. Supplemental Material https://doi.org/10.23641/asha.15057345.
Collapse
Affiliation(s)
- Rita R Patel
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Sten Ternström
- Division of Speech, Music, and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|
9
|
Titze IR, Palaparthi A, Cox K, Stark A, Maxfield L, Manternach B. Vocalization with semi-occluded airways is favorable for optimizing sound production. PLoS Comput Biol 2021; 17:e1008744. [PMID: 33780433 PMCID: PMC8031921 DOI: 10.1371/journal.pcbi.1008744] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 04/08/2021] [Accepted: 01/26/2021] [Indexed: 01/25/2023] Open
Abstract
Vocalization in mammals, birds, reptiles, and amphibians occurs with airways that have wide openings to free-space for efficient sound radiation, but sound is also produced with occluded or semi-occluded airways that have small openings to free-space. It is hypothesized that pressures produced inside the airway with semi-occluded vocalizations have an overall widening effect on the airway. This overall widening then provides more opportunity to produce wide-narrow contrasts along the airway for variation in sound quality and loudness. For human vocalization described here, special emphasis is placed on the epilaryngeal airway, which can be adjusted for optimal aerodynamic power transfer and for optimal acoustic source-airway interaction. The methodology is three-fold, (1) geometric measurement of airway dimensions from CT scans, (2) aerodynamic and acoustic impedance calculation of the airways, and (3) simulation of acoustic signals with a self-oscillating computational model of the sound source and wave propagation.
Collapse
Affiliation(s)
- Ingo R. Titze
- National Center for Voice and Speech University of Utah, Salt Lake City, Utah, United States of America
- Department of Biomedical Engineering, University of Utah, Salt Lake City, Utah, United States of America
- National Center for Voice and Speech.Org, Salt Lake City, Utah, United States of America
| | - Anil Palaparthi
- National Center for Voice and Speech University of Utah, Salt Lake City, Utah, United States of America
- Department of Biomedical Engineering, University of Utah, Salt Lake City, Utah, United States of America
- National Center for Voice and Speech.Org, Salt Lake City, Utah, United States of America
| | - Karin Cox
- National Center for Voice and Speech.Org, Salt Lake City, Utah, United States of America
| | - Amanda Stark
- National Center for Voice and Speech University of Utah, Salt Lake City, Utah, United States of America
| | - Lynn Maxfield
- National Center for Voice and Speech University of Utah, Salt Lake City, Utah, United States of America
| | - Brian Manternach
- National Center for Voice and Speech University of Utah, Salt Lake City, Utah, United States of America
| |
Collapse
|
10
|
Titze IR, Palaparthi A. Vocal Loudness Variation With Spectral Slope. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:74-82. [PMID: 31940253 PMCID: PMC7213475 DOI: 10.1044/2019_jslhr-19-00018] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Revised: 09/18/2019] [Accepted: 10/08/2019] [Indexed: 06/10/2023]
Abstract
Objective This investigation addresses the loudness variations in sones achievable with spectral slope variations (higher harmonic energy) in human vocalization and compares it to the sound pressure level (SPL) variations typically reported in the voice range profile (VRP). Method The primary methodology was computational. The ISO standard 226 was used to convert SPL values to sones for a 125- to 1000-Hz range of fundamental frequency and a -3 dB/octave to -12 dB/octave range of spectral slope. In addition, a retrospective analysis of human subjects' VRPs was conducted, and the experimental results were compared to the theoretical results. Results A very small range of SPL variation (less than 5 dB) in the VRP can produce a large range of loudness. The sensitivity can be on the order of 4 sones per dB SPL change. Conclusion For vocalization in the modal register, loudness variation is not well described by SPL change in dB, especially at high fundamental frequencies where the SPL range in the VRP becomes very small but sizeable loudness variations are still possible.
Collapse
Affiliation(s)
- Ingo R. Titze
- National Center for Voice and Speech, University of Utah, Salt Lake City
- Department of Biomedical Engineering, University of Utah, Salt Lake City
| | - Anil Palaparthi
- National Center for Voice and Speech, University of Utah, Salt Lake City
- Department of Biomedical Engineering, University of Utah, Salt Lake City
| |
Collapse
|
11
|
Glottal Source Contribution to Higher Order Modes in the Finite Element Synthesis of Vowels. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9214535] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Articulatory speech synthesis has long been based on one-dimensional (1D) approaches. They assume plane wave propagation within the vocal tract and disregard higher order modes that typically appear above 5 kHz. However, such modes may be relevant in obtaining a more natural voice, especially for phonation types with significant high frequency energy (HFE) content. This work studies the contribution of the glottal source at high frequencies in the 3D numerical synthesis of vowels. The spoken vocal range is explored using an LF (Liljencrants–Fant) model enhanced with aspiration noise and controlled by the R d glottal shape parameter. The vowels [ɑ], [i], and [u] are generated with a finite element method (FEM) using realistic 3D vocal tract geometries obtained from magnetic resonance imaging (MRI), as well as simplified straight vocal tracts of a circular cross-sectional area. The symmetry of the latter prevents the onset of higher order modes. Thus, the comparison between realistic and simplified geometries enables us to analyse the influence of such modes. The simulations indicate that higher order modes may be perceptually relevant, particularly for tense phonations (lower R d values) and/or high fundamental frequency values, F 0 s. Conversely, vowels with a lax phonation and/or low F0s may result in inaudible HFE levels, especially if aspiration noise is not considered in the glottal source model.
Collapse
|
12
|
Ternström S, D'Amario S, Selamtzis A. Effects of the Lung Volume on the Electroglottographic Waveform in Trained Female Singers. J Voice 2018; 34:485.e1-485.e21. [PMID: 30337119 DOI: 10.1016/j.jvoice.2018.09.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Revised: 09/04/2018] [Accepted: 09/06/2018] [Indexed: 11/25/2022]
Abstract
OBJECTIVES To determine if in singing there is an effect of lung volume on the electroglottographic waveform, and if so, how it varies over the voice range. STUDY DESIGN Eight trained female singers sang the tune "Frère Jacques" in 18 conditions: three phonetic contexts, three dynamic levels, and high or low lung volume. Conditions were randomized and replicated. METHODS The audio and EGG signals were recorded in synchrony with signals tracking respiration and vertical larynx position. The first 10 Fourier descriptors of every EGG cycle were computed. These spectral data were clustered statistically, and the clusters were mapped by color into a voice range profile display, thus visualizing the EGG waveform changes under the influence of fo and SPL. The rank correlations and effect sizes of the relationships between relative lung volume and several adduction-related EGG wave shape metrics were similarly rendered on a color scale, in voice range profile-style 'voice maps.' RESULTS In most subjects, EGG waveforms varied considerably over the voice range. Within subjects, reproducibility was high, not only across the replications, but also across the phonetic contexts. The EGG waveforms were quite individual, as was the nature of the EGG shape variation across the range. EGG metrics were significantly correlated to changes in lung volume, in parts of the range of the song, and in most subjects. However, the effect sizes of the relative lung volume were generally much smaller than the effects of fo and SPL, and the relationships always varied, even changing polarity from one part of the range to another. CONCLUSIONS Most subjects exhibited small, reproducible effects of the relative lung volume on the EGG waveform. Some hypothesized influences of tracheal pull were seen, mostly at the lowest SPLs. The effects were however highly variable, both across the moderately wide fo-SPL range and across subjects. Different singers may be applying different techniques and compensatory behaviors with changing lung volume. The outcomes emphasize the importance of making observations over a substantial part of the voice range, and not only of phonations sustained at a few fundamental frequencies and sound levels.
Collapse
Affiliation(s)
- Sten Ternström
- Department of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden.
| | - Sara D'Amario
- Department of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden; Audio Lab, Department of Electronic Engineering, University of York, Heslington, United Kingdom
| | - Andreas Selamtzis
- Department of Speech, Music and Hearing, School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
| |
Collapse
|