1
|
Stone TC, Erickson ML. Experienced and Inexperienced Listeners' Perception of Vocal Strain. J Voice 2024:S0892-1997(24)00024-9. [PMID: 38443265 DOI: 10.1016/j.jvoice.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/01/2024] [Accepted: 02/02/2024] [Indexed: 03/07/2024]
Abstract
OBJECTIVE The ability to perceive strain or tension in a voice is critical for both speech-language pathologists and singing teachers. Research on voice quality has focused primarily on the perception of breathiness or roughness. The perception of vocal strain has not been extensively researched and is poorly understood. METHODS/DESIGN This study employs a group and a within-subject design. Synthetic female sung stimuli were created that varied in source slope and vocal tract transfer function. Two groups of listeners, inexperienced listeners and experienced vocal pedagogues, listened to the stimuli and rated the perceived strain using a visual analog scale Synthetic female stimuli were constructed on the vowel /ɑ/ at 2 pitches, A3 and F5, using glottal source slopes that drop in amplitude at constant rates varying from - 6 dB/octave to - 18 dB/octave. All stimuli were filtered using three vocal tract transfer functions, one derived from a lyric/coloratura soprano, one derived from a mezzo-soprano, and a third that has resonance frequencies mid-way between the two. Listeners heard the stimuli over headphones and rated them on a scale from "no strain" to "very strained" using a visual-analog scale. RESULTS Spectral source slope was strongly related to the perception of strain in both groups of listeners. Experienced listeners' perception of strain was also related to formant pattern, while inexperienced listeners' perception of strain was also related to pitch. CONCLUSION This study has shown that spectral source slope can be a powerful cue to the perception of strain. However, inexperienced and experienced listeners also differ from each other in how strain is perceived across speaking and singing pitches. These differences may be based on both experience and the goals of the listener.
Collapse
Affiliation(s)
- Taylor Colton Stone
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee.
| | - Molly L Erickson
- Department of Audiology and Speech Pathology, University of Tennessee Health Science Center, Knoxville, Tennessee
| |
Collapse
|
2
|
Wang B, Kügler F, Genzel S. The interaction of focus and phrasing with downstep and post-low-bouncing in Mandarin Chinese. Front Psychol 2022; 13:884102. [PMID: 36248550 PMCID: PMC9561885 DOI: 10.3389/fpsyg.2022.884102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Accepted: 08/01/2022] [Indexed: 11/13/2022] Open
Abstract
L(ow) tone in Mandarin Chinese causes both downstep and post-low-bouncing. Downstep refers to the lowering of a H(igh) tone after a L tone, which is usually measured by comparing the H tones in a “H…HLH…H” sentence with a “H…HHH…H” sentence (cross-comparison), investigating whether downstep sets a new pitch register for the scaling of subsequent tones. Post-low-bouncing refers to the raising of a H tone after a focused L tone. The current study investigates how downstep and post-low-bouncing interact with focus and phrasing in Mandarin Chinese. In the experiment, we systematically manipulated (a) the tonal environment by embedding two syllables with either LH or HH tone (syllable X and Y) sentence-medially in the same carrier sentences containing only H tones; (b) boundary strength between X and Y by introducing either a syllable boundary or a phonological phrase boundary; and (c) information structure by either placing a contrastive focus in the HL/HH word (XF), syllable Y (YF), or the sentence-final word (ZF). A wide-focus condition served as the baseline. With systematic control of focus and boundary strength around the L tone, the current study shows that the downstep effect in Mandarin is quite robust, lasting for 3–5 H tones after the L tone, but eventually levelling back again to the register reference line of a H tone. The way how focus and phrasing interact with the downstep effect is unexpected. Firstly, sentence-final focus has no anticipatory effect on shortening the downstep effect; instead, it makes the downstep effect lasts longer as compared to the wide focus condition. Secondly, the downstep effect still shows when the H tone after the L tone is on-focus (YF), in a weaker manner than the wide focus condition, and is overridden by the post-focus-compression. Thirdly, the downstep effect gets greater when the boundary after the L tone is stronger, because the L tone is longer and more likely to be creaky. We further analyzed downstep by measuring the F0 drop between the two H tones surrounding the L tone (sequential-comparison). Comparing it with F0 drop in all-H sentences (i.e., declination), it showed that the downstep effect was much greater and more robust than declination. However, creaky voice in the L tone was not the direct cause of downstep. At last, when the L tone was under focus (XF), it caused a post-low-bouncing effect, which is weakened by a phonological phrase boundary. Altogether, the results showed that although intonation is largely controlled by informative functions, the physical-articulatory controls are relatively persistent, varying within the pitch range of 2.5 semitones. Downstep and post-low-bouncing in Mandarin Chinese thus seem to be mainly due to physical-articulatory movement on varying pitch, with the gradual tonal F0 change meeting the requirement of smooth transition across syllables, and avoiding confusion in informative F0 control.
Collapse
Affiliation(s)
- Bei Wang
- Key Laboratory of Language, Cognition and Computation, School of Foreign Languages, Beijing Institute of Technology, Beijing, China
| | - Frank Kügler
- Department of Linguistics, Goethe University Frankfurt, Frankfurt, Germany
- *Correspondence: Frank Kügler,
| | | |
Collapse
|
3
|
Zhang Z. Interaction between epilaryngeal and laryngeal adjustments in regulating vocal fold contact pressure. JASA EXPRESS LETTERS 2021; 1:025201. [PMID: 33615313 PMCID: PMC7869442 DOI: 10.1121/10.0003393] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
This study investigates the peak vocal fold contact pressure at different conditions of epilaryngeal narrowing and laryngeal adjustments. The results show that for a given subglottal pressure, the peak vocal fold contact pressure may increase or decrease with epilaryngeal narrowing, depending on a complex interaction between vocal fold vertical thickness, initial glottal angle, and subglottal pressure. However, epilaryngeal narrowing also significantly increases vocal efficiency so that for a target sound pressure level, the peak vocal fold contact pressure decreases with epilaryngeal narrowing. Overall, the peak vocal fold contact pressure and respiratory effort can be minimized by epilaryngeal narrowing, adopting a small initial glottal angle, and an intermediate vocal fold thickness.
Collapse
Affiliation(s)
- Zhaoyan Zhang
- Department of Head and Neck Surgery, University of California, Los Angeles, 31-24 Rehabilitation Center, 1000 Veteran Avenue, Los Angeles, California 90095-1794, USA
| |
Collapse
|
4
|
Alexander R, Sorensen T, Toutios A, Narayanan S. A modular architecture for articulatory synthesis from gestural specification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:4458. [PMID: 31893678 PMCID: PMC7043897 DOI: 10.1121/1.5139413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 09/19/2019] [Accepted: 11/11/2019] [Indexed: 06/10/2023]
Abstract
This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.
Collapse
Affiliation(s)
- Rachel Alexander
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Tanner Sorensen
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Asterios Toutios
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Shrikanth Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| |
Collapse
|
5
|
Anand S, Kopf LM, Shrivastav R, Eddins DA. Objective Indices of Perceived Vocal Strain. J Voice 2019; 33:838-845. [DOI: 10.1016/j.jvoice.2018.06.005] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 06/06/2018] [Accepted: 06/07/2018] [Indexed: 10/28/2022]
|
6
|
Beguš G. Effects of ejective stops on preceding vowel duration. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:2168. [PMID: 29092567 DOI: 10.1121/1.5007728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
One of the most widely studied observations in linguistic phonetics is that, all else being equal, vowels are longer before voiced than before voiceless obstruents. The causes of this phonetic generalization are, however, poorly understood and several competing explanations have been proposed. No studies have so far measured vowel duration before stops with yet another laryngeal feature: ejectives. This study fills this gap and presents results from an experiment that measures vowel duration before stops with all three laryngeal features in Georgian and models effects of both closure and voice onset time (VOT) on preceding vowel duration at the same time. The results show that vowels have significantly different durations before all three series of stops, voiced, ejective, and voiceless aspirated, even when closure and VOT durations are controlled for. The results also suggest that closure and VOT durations are inversely correlated with preceding vowel duration. These results combined bear several implications for the discussion of causes of vowel duration differences: the data support the hypotheses that claim that laryngeal gestures, temporal compensation, and closure velocity affect vowel duration. Some explanations, especially perceptual and airflow expenditure explanations, are considerably weakened by the results.
Collapse
Affiliation(s)
- Gašper Beguš
- Department of Linguistics, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
7
|
Galindo GE, Peterson SD, Erath BD, Castro C, Hillman RE, Zañartu M. Modeling the Pathophysiology of Phonotraumatic Vocal Hyperfunction With a Triangular Glottal Model of the Vocal Folds. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:2452-2471. [PMID: 28837719 PMCID: PMC5831616 DOI: 10.1044/2017_jslhr-s-16-0412] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/29/2016] [Accepted: 04/19/2017] [Indexed: 05/08/2023]
Abstract
PURPOSE Our goal was to test prevailing assumptions about the underlying biomechanical and aeroacoustic mechanisms associated with phonotraumatic lesions of the vocal folds using a numerical lumped-element model of voice production. METHOD A numerical model with a triangular glottis, posterior glottal opening, and arytenoid posturing is proposed. Normal voice is altered by introducing various prephonatory configurations. Potential compensatory mechanisms (increased subglottal pressure, muscle activation, and supraglottal constriction) are adjusted to restore an acoustic target output through a control loop that mimics a simplified version of auditory feedback. RESULTS The degree of incomplete glottal closure in both the membranous and posterior portions of the folds consistently leads to a reduction in sound pressure level, fundamental frequency, harmonic richness, and harmonics-to-noise ratio. The compensatory mechanisms lead to significantly increased vocal-fold collision forces, maximum flow-declination rate, and amplitude of unsteady flow, without significantly altering the acoustic output. CONCLUSION Modeling provided potentially important insights into the pathophysiology of phonotraumatic vocal hyperfunction by demonstrating that compensatory mechanisms can counteract deterioration in the voice acoustic signal due to incomplete glottal closure, but this also leads to high vocal-fold collision forces (reflected in aerodynamic measures), which significantly increases the risk of developing phonotrauma.
Collapse
Affiliation(s)
- Gabriel E. Galindo
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| | - Sean D. Peterson
- Mechanical and Mechatronics Engineering, University of Waterloo, Ontario, Canada
| | - Byron D. Erath
- Department of Mechanical & Aeronautical Engineering, Clarkson University, Potsdam, NY
| | - Christian Castro
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
- School of Speech and Hearing Sciences, Universidad de Valparaíso, Chile
| | - Robert E. Hillman
- Center for Laryngeal Surgery & Voice Rehabilitation, Massachusetts General Hospital, Boston
- Harvard Medical School, Boston, MA
- MGH Institute of Health Professions, Boston, MA
| | - Matías Zañartu
- Department of Electronic Engineering, Universidad Técnica Federico Santa María, Valparaíso, Chile
| |
Collapse
|
8
|
Hadwin PJ, Peterson SD. An extended Kalman filter approach to non-stationary Bayesian estimation of reduced-order vocal fold model parameters. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2909. [PMID: 28464670 DOI: 10.1121/1.4981240] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The Bayesian framework for parameter inference provides a basis from which subject-specific reduced-order vocal fold models can be generated. Previously, it has been shown that a particle filter technique is capable of producing estimates and associated credibility intervals of time-varying reduced-order vocal fold model parameters. However, the particle filter approach is difficult to implement and has a high computational cost, which can be barriers to clinical adoption. This work presents an alternative estimation strategy based upon Kalman filtering aimed at reducing the computational cost of subject-specific model development. The robustness of this approach to Gaussian and non-Gaussian noise is discussed. The extended Kalman filter (EKF) approach is found to perform very well in comparison with the particle filter technique at dramatically lower computational cost. Based upon the test cases explored, the EKF is comparable in terms of accuracy to the particle filter technique when greater than 6000 particles are employed; if less particles are employed, the EKF actually performs better. For comparable levels of accuracy, the solution time is reduced by 2 orders of magnitude when employing the EKF. By virtue of the approximations used in the EKF, however, the credibility intervals tend to be slightly underpredicted.
Collapse
Affiliation(s)
- Paul J Hadwin
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| | - Sean D Peterson
- Department of Mechanical and Mechatronics Engineering, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada
| |
Collapse
|
9
|
Moisik SR, Gick B. The Quantal Larynx: The Stable Regions of Laryngeal Biomechanics and Implications for Speech Production. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:540-560. [PMID: 28241199 DOI: 10.1044/2016_jslhr-s-16-0019] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 08/28/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE Recent proposals suggest that (a) the high dimensionality of speech motor control may be reduced via modular neuromuscular organization that takes advantage of intrinsic biomechanical regions of stability and (b) computational modeling provides a means to study whether and how such modularization works. In this study, the focus is on the larynx, a structure that is fundamental to speech production because of its role in phonation and numerous articulatory functions. METHOD A 3-dimensional model of the larynx was created using the ArtiSynth platform (http://www.artisynth.org). This model was used to simulate laryngeal articulatory states, including inspiration, glottal fricative, modal prephonation, plain glottal stop, vocal-ventricular stop, and aryepiglotto-epiglottal stop and fricative. RESULTS Speech-relevant laryngeal biomechanics is rich with "quantal" or highly stable regions within muscle activation space. CONCLUSIONS Quantal laryngeal biomechanics complement a modular view of speech control and have implications for the articulatory-biomechanical grounding of numerous phonetic and phonological phenomena.
Collapse
Affiliation(s)
- Scott Reid Moisik
- Division of Linguistics and Multilingual Studies, Nanyang Technological University, SingaporeThe Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
| | - Bryan Gick
- Department of Linguistics, University of British Columbia, Vancouver, CanadaHaskins Laboratories, New Haven, CT
| |
Collapse
|