1
|
Lester-Smith RA, Jebaily CG, Story BH. The Effects of Remote Signal Transmission and Recording on Acoustical Measures of Simulated Essential Vocal Tremor: Considerations for Remote Treatment Research and Telepractice. J Voice 2024; 38:325-336. [PMID: 34702610 PMCID: PMC9033886 DOI: 10.1016/j.jvoice.2021.09.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 10/20/2022]
Abstract
PURPOSE Studies on medical and behavioral interventions for essential vocal tremor (EVT) have shown inconsistent effects on acoustical and perceptual outcome measures across studies and across participants. Remote acoustical and perceptual assessments might facilitate studies with larger samples of participants and repeated measures that could clarify treatment effects and identify optimal treatment candidates. Furthermore, remote acoustical and perceptual assessment might allow clinicians to monitor clients' treatment responses and optimize treatment approaches during telepractice. Thus, the purpose of this study was to evaluate the accuracy of remote signal transmission and recording for acoustical and perceptual assessment of EVT. METHOD Simulations of EVT were produced using a computational model and were recorded using local and remote procedures to represent client- and clinician-end recordings respectively. Acoustical analyses measured the extent and rate of fundamental frequency (fo) and intensity modulation to represent vocal tremor severity and the cepstral peak prominence (CPPS) to represent voice quality. The data were analyzed using repeated measures analysis of variance (ANOVA) with recording as the within-subjects factor and sex of the computational model as the between-subjects factor. RESULTS There was a significant main effect of recording on the rate of fo modulation and significant interactions of recording and sex for the extent of intensity modulation, rate of intensity modulation, and CPPS. Posthoc pairwise comparisons and analysis of effect size indicated that recording procedures had the largest effect on the extent of intensity modulation for male simulations, the rate of intensity modulation for male and female simulations, and the CPPS for male and female simulations. Despite having disabled all known software and computer audio enhancing options and having stable ethernet connections, there was inconsistent attenuation of signal amplitude in remote recordings that was most problematic for samples with a breathy voice quality but also affected samples with typical and pressed voice qualities. CONCLUSIONS Acoustical measures that correlate to perception of vocal tremor and voice quality were altered by remote signal transmission and recording. In particular, signal transmission and recording in Zoom altered time-based estimates of intensity modulation and CPPS with male and female simulations of EVT and magnitude-based estimates of intensity modulation with male simulations of EVT. In contrast, signal transmission and recording in Zoom minimally altered time- and magnitude-based estimates of fo modulation with male and female simulations of EVT. Therefore, acoustical and perceptual assessments of EVT should be performed using audio recordings that are collected locally on the participant- or client-end, particularly when measuring modulation of intensity and CPP or estimating vocal tremor severity and voice quality. Development of procedures for collecting local audio recordings in remote settings may expand data collection for treatment research and enhance telepractice.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas.
| | - Charles G Jebaily
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, Texas; Texas NeuroRehab Center, Austin, Texas
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, Arizona
| |
Collapse
|
2
|
Story BH, Bunton K. The relation of velopharyngeal coupling area and vocal tract scaling to identification of stop-nasal cognates. J Acoust Soc Am 2023; 154:3741-3759. [PMID: 38099832 DOI: 10.1121/10.0023958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 11/22/2023] [Indexed: 12/18/2023]
Abstract
The purpose of this study was to determine whether the threshold of velopharyngeal (VP) coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English was different for speech produced by a model based on an adult male, an adult female, and a 4-year-old child. V1CV2 stimuli were generated with a speech production model that encodes phonetic segments as relative acoustic targets imposed on an underlying vocal tract and laryngeal structure that can be scaled according to sex and age. Each V1CV2 was synthesized with a set of VP coupling functions whose maximum area ranged from 0 to 0.1 cm2. Results showed that scaling the vocal tract and vocal folds had essentially no effect on the VP coupling area at which listener identification shifted from stop to nasal. The range of coupling areas at which the crossover occurred was 0.037-0.049 cm2 for the male model, 0.040-0.055 cm2 for the female model, and 0.039-0.052 cm2 for the 4-year-old child model, and overall mean was 0.044 cm2. Calculations of band limited peak nasalance indicated that 85% peak nasalance during the consonant was well aligned with listener responses.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
3
|
Herbst CT, Story BH, Meyer D. Acoustical Theory of Vowel Modification Strategies in Belting. J Voice 2023:S0892-1997(23)00004-8. [PMID: 37080890 DOI: 10.1016/j.jvoice.2023.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 04/22/2023]
Abstract
Various authors have argued that belting is to be produced by "speech-like" sounds, with the first and second supraglottic vocal tract resonances (fR1 and fR2) at frequencies of the vowels determined by the lyrics to be sung. Acoustically, the hallmark of belting has been identified as a dominant second harmonic, possibly enhanced by first resonance tuning (fR1≈2fo). It is not clear how both these concepts - (a) phonating with "speech-like," unmodified vowels; and (b) producing a belting sound with a dominant second harmonic, typically enhanced by fR1 - can be upheld when singing across a singer's entire musical pitch range. For instance, anecdotal reports from pedagogues suggest that vowels with a low fR1, such as [i] or [u], might have to be modified considerably (by raising fR1) in order to phonate at higher pitches. These issues were systematically addressed in silico with respect to treble singing, using a linear source-filter voice production model. The dominant harmonic of the radiated spectrum was assessed in 12987 simulations, covering a parameter space of 37 fundamental frequencies (fo) across the musical pitch range from C3 to C6; 27 voice source spectral slope settings from -4 to -30 dB/octave; computed for 13 different IPA vowels. The results suggest that, for most unmodified vowels, the stereotypical belting sound characteristics with a dominant second harmonic can only be produced over a pitch range of about a musical fifth, centered at fo≈0.5fR1. In the [ɔ] and [ɑ] vowels, that range is extended to an octave, supported by a low second resonance. Data aggregation - considering the relative prevalence of vowels in American English - suggests that, historically, belting with fR1≈2fo was derived from speech, and that songs with an extended musical pitch range likely demand considerable vowel modification. We thus argue that - on acoustical grounds - the pedagogical commandment for belting with unmodified, "speech-like" vowels can not always be fulfilled.
Collapse
Affiliation(s)
- Christian T Herbst
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia; Department of Vocal Studies, Mozarteum University, Salzburg, Austria.
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - David Meyer
- Janette Ogg Voice Research Center, Shenandoah Conservatory, Winchester, Virginia
| |
Collapse
|
4
|
Herbst CT, Story BH. Computer simulation of vocal tract resonance tuning strategies with respect to fundamental frequency and voice source spectral slope in singing. J Acoust Soc Am 2022; 152:3548. [PMID: 36586864 DOI: 10.1121/10.0014421] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Accepted: 09/13/2022] [Indexed: 06/17/2023]
Abstract
A well-known concept of singing voice pedagogy is "formant tuning," where the lowest two vocal tract resonances ( fR1, fR2) are systematically tuned to harmonics of the laryngeal voice source to maximize the level of radiated sound. A comprehensive evaluation of this resonance tuning concept is still needed. Here, the effect of fR1, fR2 variation was systematically evaluated in silico across the entire fundamental frequency range of classical singing for three voice source characteristics with spectral slopes of -6, -12, and -18 dB/octave. Respective vocal tract transfer functions were generated with a previously introduced low-dimensional computational model, and resultant radiated sound levels were expressed in dB(A). Two distinct strategies for optimized sound output emerged for low vs high voices. At low pitches, spectral slope was the predominant factor for sound level increase, and resonance tuning only had a marginal effect. In contrast, resonance tuning strategies became more prevalent and voice source strength played an increasingly marginal role as fundamental frequency increased to the upper limits of the soprano range. This suggests that different voice classes (e.g., low male vs high female) likely have fundamentally different strategies for optimizing sound output, which has fundamental implications for pedagogical practice.
Collapse
Affiliation(s)
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| |
Collapse
|
5
|
Chuang YJ, Hwang SJ, Buhr KA, Miller CA, Avey GD, Story BH, Vorperian HK. Anatomic development of the upper airway during the first five years of life: A three-dimensional imaging study. PLoS One 2022; 17:e0264981. [PMID: 35275939 PMCID: PMC8916633 DOI: 10.1371/journal.pone.0264981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2021] [Accepted: 02/21/2022] [Indexed: 12/05/2022] Open
Abstract
Purpose Normative data on the growth and development of the upper airway across the sexes is needed for the diagnosis and treatment of congenital and acquired respiratory anomalies and to gain insight on developmental changes in speech acoustics and disorders with craniofacial anomalies. Methods The growth of the upper airway in children ages birth to 5 years, as compared to adults, was quantified using an imaging database with computed tomography studies from typically developing individuals. Methodological criteria for scan inclusion and airway measurements included: head position, histogram-based airway segmentation, anatomic landmark placement, and development of a semi-automatic centerline for data extraction. A comprehensive set of 2D and 3D supra- and sub-glottal measurements from the choanae to tracheal opening were obtained including: naso-oro-laryngo-pharynx subregion volume and length, each subregion’s superior and inferior cross-sectional-area, and antero-posterior and transverse/width distances. Results Growth of the upper airway during the first 5 years of life was more pronounced in the vertical and transverse/lateral dimensions than in the antero-posterior dimension. By age 5 years, females have larger pharyngeal measurement than males. Prepubertal sex-differences were identified in the subglottal region. Conclusions Our findings demonstrate the importance of studying the growth of the upper airway in 3D. As the lumen length increases, its shape changes, becoming increasingly elliptical during the first 5 years of life. This study also emphasizes the importance of methodological considerations for both image acquisition and data extraction, as well as the use of consistent anatomic structures in defining pharyngeal regions.
Collapse
Affiliation(s)
- Ying Ji Chuang
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Seong Jae Hwang
- Department of Computer Science, University of Pittsburgh, Pittsburg, Pennsylvania, United States of America
| | - Kevin A. Buhr
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Courtney A. Miller
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Gregory D. Avey
- Department of Radiology, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin, United States of America
| | - Brad H. Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona, United States of America
| | - Houri K. Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
6
|
Story BH, Bunton K. The relation of velopharyngeal coupling area to the identification of stop versus nasal consonants in North American English based on speech generated by acoustically driven vocal tract modulations. J Acoust Soc Am 2021; 150:3618. [PMID: 34852618 DOI: 10.1121/10.0007223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/23/2021] [Indexed: 06/13/2023]
Abstract
The purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on V1CV2 stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each V1CV2 was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm2. Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm2, depending on place of articulation and final vowel. The smallest coupling area (0.035 cm2) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm2) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
7
|
Abstract
A recently developed speech production model, in which speech segments are specified by relative acoustic events called resonance deflection patterns, was used to generate speech signals that were presented to listeners in a perceptual test. The purpose was to determine the effect of variations of the magnitude and polarity of the third resonance deflection on identification of the consonant in a V1CV2 disyllable while the deflections of the first and second resonances were held constant. Result showed that listeners' identification changed from /d/ to /ɡ/ when the polarity of the third resonance deflection switched from positive to negative.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA ,
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA ,
| |
Collapse
|
8
|
Mailend ML, Maas E, Story BH. Apraxia of speech and the study of speech production impairments: Can we avoid further confusion? Reply to Romani (2021). Cogn Neuropsychol 2021; 38:309-317. [PMID: 34881683 PMCID: PMC10011684 DOI: 10.1080/02643294.2021.2009790] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
We agree with Cristina Romani (CR) about reducing confusion and agree that the issues raised in her commentary are central to the study of apraxia of speech (AOS). However, CR critiques our approach from the perspective of basic cognitive neuropsychology. This is confusing and misleading because, contrary to CR's claim, we did not attempt to inform models of typical speech production. Instead, we relied on such models to study the impairment in the clinical category of AOS (translational cognitive neuropsychology). Thus, the approach along with the underlying assumptions is different. This response aims to clarify these assumptions, broaden the discussion regarding the methodological approach, and address CR's concerns. We argue that our approach is well-suited to meet the goals of our recent studies and is commensurate with the current state of the science of AOS. Ultimately, a plurality of approaches is needed to understand a phenomenon as complex as AOS.
Collapse
Affiliation(s)
- Marja-Liisa Mailend
- Moss Rehabilitation Research Institute, Einstein Healthcare Network, Elkins Park, PA, USA.,Department of Special Education, University of Tartu, Tartu, Estonia
| | - Edwin Maas
- Department of Communication Sciences and Disorders, Temple University, Philadelphia, PA, USA
| | - Brad H Story
- Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, AZ, USA
| |
Collapse
|
9
|
Mailend ML, Maas E, Beeson PM, Story BH, Forster KI. Examining speech motor planning difficulties in apraxia of speech and aphasia via the sequential production of phonetically similar words. Cogn Neuropsychol 2021; 38:72-87. [PMID: 33249997 PMCID: PMC7895325 DOI: 10.1080/02643294.2020.1847059] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 08/11/2020] [Accepted: 10/20/2020] [Indexed: 10/22/2022]
Abstract
This study investigated the underlying nature of apraxia of speech (AOS) by testing two competing hypotheses. The Reduced Buffer Capacity Hypothesis argues that people with AOS can plan speech only one syllable at a time Rogers and Storkel [1999. Planning speech one syllable at a time: The reduced buffer capacity hypothesis in apraxia of speech. Aphasiology, 13(9-11), 793-805. https://doi.org/10.1080/026870399401885]. The Program Retrieval Deficit Hypothesis states that selecting a motor programme is difficult in face of competition from other simultaneously activated programmes Mailend and Maas [2013. Speech motor programming in apraxia of speech: Evidence from a delayed picture-word interference task. American Journal of Speech-Language Pathology, 22(2), S380-S396. https://doi.org/10.1044/1058-0360(2013/12-0101)]. Speakers with AOS and aphasia, aphasia without AOS, and unimpaired controls were asked to prepare and hold a two-word utterance until a go-signal prompted a spoken response. Phonetic similarity between target words was manipulated. Speakers with AOS had longer reaction times in conditions with two similar words compared to two identical words. The Control and the Aphasia group did not show this effect. These results suggest that speakers with AOS need additional processing time to retrieve target words when multiple motor programmes are simultaneously activated.
Collapse
Affiliation(s)
- Marja-Liisa Mailend
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, AZ, USA
- Marja-Liisa Mailend is now at Moss Rehabilitation Research Institute, Einstein Healthcare Network, Elkins Park, PA, USA
| | - Edwin Maas
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, AZ, USA
- Edwin Maas is now at the Department of Communication Sciences and Disorders, Temple University, Philadelphia, PA, USA
| | - Pélagie M. Beeson
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, AZ, USA
| | - Brad H. Story
- Department of Speech, Language, and Hearing Sciences, The University of Arizona, Tucson, AZ, USA
| | | |
Collapse
|
10
|
Milenkovic PH, Wagner M, Kent RD, Story BH, Vorperian HK. Effects of sampling rate and type of anti-aliasing filter on linear-predictive estimates of formant frequencies in men, women, and children. J Acoust Soc Am 2020; 147:EL221. [PMID: 32237805 PMCID: PMC7056453 DOI: 10.1121/10.0000824] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2019] [Revised: 01/31/2020] [Accepted: 02/10/2020] [Indexed: 06/11/2023]
Abstract
The purpose of this study was to assess the effect of downsampling the acoustic signal on the accuracy of linear-predictive (LPC) formant estimation. Based on speech produced by men, women, and children, the first four formant frequencies were estimated at sampling rates of 48, 16, and 10 kHz using different anti-alias filtering. With proper selection of number of LPC coefficients, anti-alias filter and between-frame averaging, results suggest that accuracy is not improved by rates substantially below 48 kHz. Any downsampling should not go below 16 kHz with a filter cut-off centered at 8 kHz.
Collapse
Affiliation(s)
- Paul H Milenkovic
- Department of Electrical and Computer Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, USA
| | - Madison Wagner
- Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53706, USA
| | - Raymond D Kent
- Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53706, USA
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, , , , ,
| | - Houri K Vorperian
- Vocal Tract Development Laboratory, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53706, USA
| |
Collapse
|
11
|
Story BH, Bunton K. A model of speech production based on the acoustic relativity of the vocal tract. J Acoust Soc Am 2019; 146:2522. [PMID: 31671993 PMCID: PMC7064311 DOI: 10.1121/1.5127756] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 09/10/2019] [Accepted: 09/12/2019] [Indexed: 06/10/2023]
Abstract
A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
12
|
Mailend ML, Maas E, Beeson PM, Story BH, Forster KI. Speech motor planning in the context of phonetically similar words: Evidence from apraxia of speech and aphasia. Neuropsychologia 2019; 127:171-184. [PMID: 30817912 PMCID: PMC6459184 DOI: 10.1016/j.neuropsychologia.2019.02.018] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 02/22/2019] [Accepted: 02/24/2019] [Indexed: 11/17/2022]
Abstract
The purpose of this study was to test two competing hypotheses about the nature of the impairment in apraxia of speech (AOS). The Reduced Buffer Capacity Hypothesis argues that people with AOS can hold only one syllable at a time in the speech motor planning buffer. The Program Retrieval Deficit Hypothesis, states that people with AOS have difficulty accessing the intended motor program in the context where several motor programs are activated simultaneously. The participants included eight speakers with AOS, most of whom also had aphasia, nine speakers with aphasia without AOS, and 25 age-matched control speakers. The experimental paradigm prompted single word production following three types of primes. In most trials, prime and target were the same (e.g., bill-bill). On some trials, the initial consonant differed in one phonetic feature (e.g., bill-dill; Similar) or in all phonetic features (fill-bill; Different). The dependent measures were accuracy and reaction time. The results revealed a switch cost - longer reaction times in trials where the prime and target differed compared to trials where they were the same words - in all groups; however, the switch cost was significantly larger in the AOS group compared to the other two groups. These findings are in line with the prediction of the Program Retrieval Deficit Hypothesis and suggest that speakers with AOS have difficulty with selecting one program over another when several programs compete for selection.
Collapse
Affiliation(s)
| | - Edwin Maas
- The University of Arizona, United States
| | | | | | | |
Collapse
|
13
|
Story BH, Vorperian HK, Bunton K, Durtschi RB. An age-dependent vocal tract model for males and females based on anatomic measurements. J Acoust Soc Am 2018; 143:3079. [PMID: 29857736 PMCID: PMC5966313 DOI: 10.1121/1.5038264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 04/29/2018] [Accepted: 05/01/2018] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Houri K Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Reid B Durtschi
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| |
Collapse
|
14
|
Abstract
The purpose of this study was to develop a method for visualizing and assessing the characteristics of vowel production by measuring the local density of normalized F1 and F2 formant frequencies. The result is a three-dimensional plot called the vowel space density (VSD) and indicates the regions in the vowel space most heavily used by a talker during speech production. The area of a convex hull enclosing the vowel space at specific threshold density values was proposed as a means of quantifying the VSD.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA ,
| | - Kate Bunton
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA ,
| |
Collapse
|
15
|
Abstract
The purpose of this study was to further develop a multi-tier model of the vocal tract area function in which the modulations of shape to produce speech are generated by the product of a vowel substrate and a consonant superposition function. The new approach consists of specifying input parameters for a target consonant as a set of directional changes in the resonance frequencies of the vowel substrate. Using calculations of acoustic sensitivity functions, these "resonance deflection patterns" are transformed into time-varying deformations of the vocal tract shape without any direct specification of location or extent of the consonant constriction along the vocal tract. The configuration of the constrictions and expansions that are generated by this process were shown to be physiologically-realistic and produce speech sounds that are easily identifiable as the target consonants. This model is a useful enhancement for area function-based synthesis and can serve as a tool for understanding how the vocal tract is shaped by a talker during speech production.
Collapse
|
16
|
Samlan RA, Story BH. Influence of Left-Right Asymmetries on Voice Quality in Simulated Paramedian Vocal Fold Paralysis. J Speech Lang Hear Res 2017; 60:306-321. [PMID: 28199505 DOI: 10.1044/2016_jslhr-s-16-0076] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Accepted: 05/31/2016] [Indexed: 05/25/2023]
Abstract
PURPOSE The purpose of this study was to determine the vocal fold structural and vibratory symmetries that are important to vocal function and voice quality in a simulated paramedian vocal fold paralysis. METHOD A computational kinematic speech production model was used to simulate an exemplar "voice" on the basis of asymmetric settings of parameters controlling glottal configuration. These parameters were then altered individually to determine their effect on maximum flow declination rate, spectral slope, cepstral peak prominence, harmonics-to-noise ratio, and perceived voice quality. RESULTS Asymmetry of each of the 5 vocal fold parameters influenced vocal function and voice quality; measured change was greatest for adduction and bulging. Increasing the symmetry of all parameters improved voice, and the best voice occurred with overcorrection of adduction, followed by bulging, nodal point ratio, starting phase, and amplitude of vibration. CONCLUSIONS Although vocal process adduction and edge bulging asymmetries are most influential in voice quality for simulated vocal fold motion impairment, amplitude of vibration and starting phase asymmetries are also perceptually important. These findings are consistent with the current surgical approach to vocal fold motion impairment, where goals include medializing the vocal process and straightening concave edges. The results also explain many of the residual postoperative voice limitations.
Collapse
Affiliation(s)
- Robin A Samlan
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson
| |
Collapse
|
17
|
Neely KD, Bunton K, Story BH. A Modeling Study of the Effects of Vocal Tract Movement Duration and Magnitude on the F2 Trajectory in CV Words. J Speech Lang Hear Res 2016; 59:1327-1334. [PMID: 27768174 PMCID: PMC5399760 DOI: 10.1044/2016_jslhr-s-14-0331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Revised: 09/11/2015] [Accepted: 02/19/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE This study used a computational vocal tract model to investigate the relationship of diphthong duration and vocal tract movement magnitude to measures of the F2 trajectory in CV words. METHOD Three words (bough, boy, and buy) were simulated on the basis of an adult female vocal tract model, in which the model parameters were estimated from audio recordings of a female talker. Model parameters were then modified to generate 35 simulations of each word corresponding to 7 different durations and 5 movement magnitude settings. In addition, these simulations were repeated with vocal tract lengths representative of an adult male and an approximately 6-year-old child. RESULTS On the basis of univariate analysis, measures of frequency predicted changes in magnitude, and temporal measures predicted changes in speaking rate consistent with the hypothesis. The combined effects of duration and magnitude showed that F2 was more sensitive to changes in magnitude at shorter word durations compared with longer word durations. This finding held across words and vocal tract length. CONCLUSIONS Results suggest that there is an interaction between duration and magnitude that affects the slope of the F2 trajectory. The next step is to relate kinematics to F2 trajectory output using real speakers.
Collapse
|
18
|
Lester-Smith RA, Story BH. The effects of physiological adjustments on the perceptual and acoustical characteristics of vibrato as a model of vocal tremor. J Acoust Soc Am 2016; 140:3827. [PMID: 27908094 PMCID: PMC5392085 DOI: 10.1121/1.4967454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Revised: 10/17/2016] [Accepted: 10/28/2016] [Indexed: 06/06/2023]
Abstract
The purpose of this study was to investigate the effects of physiological adjustments on listeners' perception of the magnitude of modulation of voice and to determine the characteristics of the acoustical modulations that explained listeners' judgments. This research was carried out using singers producing vibrato as a model of vocal tremor. Twenty healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with singers' samples, which differed by fundamental frequency, vocal quality, and vowel. Results revealed that listeners perceived a higher magnitude of voice modulation when female samples had a pressed vocal quality. Acoustical analyses were performed with voice samples to determine the features that predicted listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information in frequency bands across the spectrum.
Collapse
Affiliation(s)
- Rosemary A Lester-Smith
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
19
|
Abstract
OBJECTIVE The goal of the Arizona Child Acoustic Database project was to obtain a large set of acoustic recordings, primarily vowels, collected from a cohort of children over a critical period of growth and development. METHOD Data was recorded longitudinally from 63 children between the ages of 2;0 and 7;0 at 3-month intervals. The protocol included individual American English vowels and diphthongs, nonsense multi-vowel transitions, word level multi-vowel sequences (e.g., Hawaii), single-syllable words targeting each American English vowel, short sentences, and conversation. RESULTS Acoustic files are available for download through the University of Arizona Library Repository for use in future research projects. CONCLUSION Longitudinal recordings may be of interest because they allow tracking of acoustic characteristics produced by an individual child during a period of rapid growth and speech development.
Collapse
Affiliation(s)
- Kate Bunton
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ, USA
| | | |
Collapse
|
20
|
Abstract
Children's speech presents a challenging problem for formant frequency measurement. In part, this is because high fundamental frequencies, typical of a children's speech production, generate widely spaced harmonic components that may undersample the spectral shape of the vocal tract transfer function. In addition, there is often a weakening of upper harmonic energy and a noise component due to glottal turbulence. The purpose of this study was to develop a formant measurement technique based on cepstral analysis that does not require modification of the cepstrum itself or transformation back to the spectral domain. Instead, a narrow-band spectrum is low-pass filtered with a cutoff point (i.e., cutoff "quefrency" in the terminology of cepstral analysis) to preserve only the spectral envelope. To test the method, speech representative of a 2-3 year-old child was simulated with an airway modulation model of speech production. The model, which includes physiologically-scaled vocal folds and vocal tract, generates sound output analogous to a microphone signal. The vocal tract resonance frequencies can be calculated independently of the output signal and thus provide test cases that allow for assessing the accuracy of the formant tracking algorithm. When applied to the simulated child-like speech, the spectral filtering approach was shown to provide a clear spectrographic representation of formant change over the time course of the signal, and facilitates tracking formant frequencies for further analysis.
Collapse
Affiliation(s)
- Brad H. Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, AZ 85721
| | - Kate Bunton
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, AZ 85721
| |
Collapse
|
21
|
Lester RA, Story BH. The effects of physiological adjustments on the perceptual and acoustical characteristics of simulated laryngeal vocal tremor. J Acoust Soc Am 2015; 138:953-63. [PMID: 26328711 PMCID: PMC4545074 DOI: 10.1121/1.4927561] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
The purpose of this study was to determine if adjustments to the voice source [i.e., fundamental frequency (F0), degree of vocal fold adduction] or vocal tract filter (i.e., vocal tract shape for vowels) reduce the perception of simulated laryngeal vocal tremor and to determine if listener perception could be explained by characteristics of the acoustical modulations. This research was carried out using a computational model of speech production that allowed for precise control and manipulation of the glottal and vocal tract configurations. Forty-two healthy adults participated in a perceptual study involving pair-comparisons of the magnitude of "shakiness" with simulated samples of laryngeal vocal tremor. Results revealed that listeners perceived a higher magnitude of voice modulation when simulated samples had a higher mean F0, greater degree of vocal fold adduction, and vocal tract shape for /i/ vs /ɑ/. However, the effect of F0 was significant only when glottal noise was not present in the acoustic signal. Acoustical analyses were performed with the simulated samples to determine the features that affected listeners' judgments. Based on regression analyses, listeners' judgments were predicted to some extent by modulation information present in both low and high frequency bands.
Collapse
Affiliation(s)
- Rosemary A Lester
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
22
|
Titze IR, Baken RJ, Bozeman KW, Granqvist S, Henrich N, Herbst CT, Howard DM, Hunter EJ, Kaelin D, Kent RD, Kreiman J, Kob M, Löfqvist A, McCoy S, Miller DG, Noé H, Scherer RC, Smith JR, Story BH, Švec JG, Ternström S, Wolfe J. Toward a consensus on symbolic notation of harmonics, resonances, and formants in vocalization. J Acoust Soc Am 2015; 137:3005-7. [PMID: 25994732 PMCID: PMC5392060 DOI: 10.1121/1.4919349] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Affiliation(s)
- Ingo R Titze
- National Center for Voice and Speech, 136 South Main Street, Suite 320, Salt Lake City, Utah 84101, USA
| | - Ronald J Baken
- Department of Otolaryngology, New York Eye and Ear Infirmary of Mount Sinai, 310 East 14th Street, New York, New York 10003, USA
| | - Kenneth W Bozeman
- Lawrence University Conservatory of Music, 711 East Boldt Way, Appleton, Wisconsin 54911, USA
| | - Svante Granqvist
- Royal Institute of Technology (KTH), School of Technology and Health (STH), Basic Science and Biomedicine, Campus Haninge, SE-136 40 Handen, Sweden
| | - Nathalie Henrich
- GIPSA-lab, Département Parole et Cognition, Domaine Universitaire, 11 rue des Mathématiques, BP 46 38402 Saint Martin d'Hères cedex, France
| | - Christian T Herbst
- Laboratory of Biophysics, Department of Experimental Physics, Palacký University Olomouc, Czech Republic
| | - David M Howard
- Department of Electronics, University of York, Heslington, York, United Kingdom
| | - Eric J Hunter
- Communicative Sciences and Disorders, Michigan State University, 1026 Red Cedar Road, East Lansing, Michigan 48824, USA
| | - Dean Kaelin
- Dean Kaelin Vocal Studio, 2539 East 4430 Street, Holladay, Utah 84124, USA
| | - Raymond D Kent
- The Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, Wisconsin 53705, USA
| | - Jody Kreiman
- Head and Neck Surgery, University of California-Los Angeles School of Medicine, 1000 Veteran Avenue, Los Angeles, California 90095, USA
| | - Malte Kob
- Musikalische Akustik and Theorie der Musikübertragung Erich-Thienhaus-Institut Hochschule für Musik Detmold, Neustadt 22, 32756 Detmold, Germany
| | - Anders Löfqvist
- Haskins Laboratories, Yale University, 300 George Street, Suite 900, New Haven, Connecticut 06511, USA
| | - Scott McCoy
- School of Music, Ohio State University, 304 Mershon Auditorium, 1871 North High Street, Columbus, Ohio 43201, USA
| | - Donald G Miller
- Voce Vista, Achterste Kamp 9, 9301 RB Roden, the Netherlands
| | - Hubert Noé
- Klostergasse 30, 8280 Fürstenfeld, Austria
| | - Ronald C Scherer
- Communication Sciences and Disorders, Bowling Green State University, 200 Health and Human Services Building, Bowling Green, Ohio 43403, USA
| | - John R Smith
- School of Physics, University of New South Wales, Sydney, New South Wales 2052, Australia
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, Arizona 85721, USA
| | - Jan G Švec
- Department of Biophysics, Palacký University, Olomouc, the Czech Republic
| | - Sten Ternström
- Department of Speech, Music and Hearing (KTH), Lindstedtsvägen 24, S-100 44 Stockholm, Sweden
| | - Joe Wolfe
- School of Physics, University of New South Wales, Sydney 2052, Australia
| |
Collapse
|
23
|
Carbonell KM, Lester RA, Story BH, Lotto AJ. Discriminating simulated vocal tremor source using amplitude modulation spectra. J Voice 2014; 29:140-7. [PMID: 25532813 DOI: 10.1016/j.jvoice.2014.07.020] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Accepted: 07/31/2014] [Indexed: 11/28/2022]
Abstract
OBJECTIVES/HYPOTHESIS Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations. STUDY DESIGN Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes. METHODS Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands. RESULTS The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included. CONCLUSIONS These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope.
Collapse
Affiliation(s)
- Kathy M Carbonell
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona.
| | - Rosemary A Lester
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| | - Andrew J Lotto
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona
| |
Collapse
|
24
|
Monson BB, Lotto AJ, Story BH. Gender and vocal production mode discrimination using the high frequencies for speech and singing. Front Psychol 2014; 5:1239. [PMID: 25400613 PMCID: PMC4214223 DOI: 10.3389/fpsyg.2014.01239] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 10/11/2014] [Indexed: 11/13/2022] Open
Abstract
Humans routinely produce acoustical energy at frequencies above 6 kHz during vocalization, but this frequency range is often not represented in communication devices and speech perception research. Recent advancements toward high-definition (HD) voice and extended bandwidth hearing aids have increased the interest in the high frequencies. The potential perceptual information provided by high-frequency energy (HFE) is not well characterized. We found that humans can accomplish tasks of gender discrimination and vocal production mode discrimination (speech vs. singing) when presented with acoustic stimuli containing only HFE at both amplified and normal levels. Performance in these tasks was robust in the presence of low-frequency masking noise. No substantial learning effect was observed. Listeners also were able to identify the sung and spoken text (excerpts from "The Star-Spangled Banner") with very few exposures. These results add to the increasing evidence that the high frequencies provide at least redundant information about the vocal signal, suggesting that its representation in communication devices (e.g., cell phones, hearing aids, and cochlear implants) and speech/voice synthesizers could improve these devices and benefit normal-hearing and hearing-impaired listeners.
Collapse
Affiliation(s)
- Brian B Monson
- Department of Pediatric Newborn Medicine, Brigham and Women's Hospital, Harvard Medical School Boston, MA, USA
| | - Andrew J Lotto
- Speech, Language, and Hearing Sciences, University of Arizona Tucson, AZ, USA
| | - Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona Tucson, AZ, USA
| |
Collapse
|
25
|
Samlan RA, Story BH, Lotto AJ, Bunton K. Acoustic and perceptual effects of left-right laryngeal asymmetries based on computational modeling. J Speech Lang Hear Res 2014; 57:1619-37. [PMID: 24845730 PMCID: PMC4495963 DOI: 10.1044/2014_jslhr-s-12-0405] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 04/11/2014] [Indexed: 05/11/2023]
Abstract
PURPOSE Computational modeling was used to examine the consequences of 5 different laryngeal asymmetries on acoustic and perceptual measures of vocal function. METHOD A kinematic vocal fold model was used to impose 5 laryngeal asymmetries: adduction, edge bulging, nodal point ratio, amplitude of vibration, and starting phase. Thirty /a/ and /ɪ/ vowels were generated for each asymmetry and analyzed acoustically using cepstral peak prominence (CPP), harmonics-to-noise ratio (HNR), and 3 measures of spectral slope (H1*-H2*, B0-B1, and B0-B2). Twenty listeners rated voice quality for a subset of the productions. RESULTS Increasingly asymmetric adduction, bulging, and nodal point ratio explained significant variance in perceptual rating (R2 = .05, p < .001). The same factors resulted in generally decreasing CPP, HNR, and B0-B2 and in increasing B0-B1. Of the acoustic measures, only CPP explained significant variance in perceived quality (R2 = .14, p < .001). Increasingly asymmetric amplitude of vibration or starting phase minimally altered vocal function or voice quality. CONCLUSION Asymmetries of adduction, bulging, and nodal point ratio drove acoustic measures and perception in the current study, whereas asymmetric amplitude of vibration and starting phase demonstrated minimal influence on the acoustic signal or voice quality.
Collapse
|
26
|
Auvinen H, Raitio T, Airaksinen M, Siltanen S, Story BH, Alku P. Automatic glottal inverse filtering with the Markov chain Monte Carlo method. COMPUT SPEECH LANG 2014. [DOI: 10.1016/j.csl.2013.09.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
27
|
Story BH. Structure, Movement, Sound, and Perception. Perspect Speech Sci Orofac Disord 2014; 24:7-20. [PMID: 25383138 PMCID: PMC4222052 DOI: 10.1044/ssod24.1.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Models that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona
| |
Collapse
|
28
|
Monson BB, Hunter EJ, Lotto AJ, Story BH. The perceptual significance of high-frequency energy in the human voice. Front Psychol 2014; 5:587. [PMID: 24982643 PMCID: PMC4059169 DOI: 10.3389/fpsyg.2014.00587] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2014] [Accepted: 05/26/2014] [Indexed: 11/25/2022] Open
Abstract
While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.
Collapse
Affiliation(s)
- Brian B. Monson
- Department of Pediatric Newborn Medicine, Brigham and Women’s Hospital, Harvard Medical SchoolBoston, MA, USA
- National Center for Voice and Speech, University of UtahSalt Lake City, UT, USA
| | - Eric J. Hunter
- National Center for Voice and Speech, University of UtahSalt Lake City, UT, USA
- Department of Communicative Sciences and Disorders, Michigan State UniversityEast Lansing, MI, USA
| | - Andrew J. Lotto
- Speech, Language, and Hearing Sciences, University of ArizonaTucson, AZ, USA
| | - Brad H. Story
- Speech, Language, and Hearing Sciences, University of ArizonaTucson, AZ, USA
| |
Collapse
|
29
|
Abstract
Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more "tuned" to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6-11.3 kHz) but 8-10 dB in the 16-kHz octave (11.3-22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments.
Collapse
Affiliation(s)
- Brian B Monson
- National Center for Voice and Speech, University of Utah, 136 South Main Street, Suite 320, Salt Lake City, Utah 84101
| | - Andrew J Lotto
- Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, Arizona 85721
| | - Brad H Story
- Department of Speech, Language, and Hearing Sciences, University of Arizona, P.O. Box 210071, Tucson, Arizona 85721
| |
Collapse
|
30
|
Alku P, Pohjalainen J, Vainio M, Laukkanen AM, Story BH. Formant frequency estimation of high-pitched vowels using weighted linear prediction. J Acoust Soc Am 2013; 134:1295-313. [PMID: 23927127 DOI: 10.1121/1.4812756] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
All-pole modeling is a widely used formant estimation method, but its performance is known to deteriorate for high-pitched voices. In order to address this problem, several all-pole modeling methods robust to fundamental frequency have been proposed. This study compares five such previously known methods and introduces a technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes temporally weighted linear prediction (LP) in which the square of the prediction error is multiplied by a given parametric weighting function. The weighting downgrades the contribution of the main excitation of the vocal tract in optimizing the filter coefficients. Consequently, the resulting all-pole model is affected more by the characteristics of the vocal tract leading to less biased formant estimates. By using synthetic vowels created with a physical modeling approach, the results showed that WLP-AME yields improved formant frequencies for high-pitched sounds in comparison to the previously known methods (e.g., relative error in the first formant of the vowel [a] decreased from 11% to 3% when conventional LP was replaced with WLP-AME). Experiments conducted on natural vowels indicate that the formants detected by WLP-AME changed in a more regular manner between repetitions of different pitch than those computed by conventional LP.
Collapse
Affiliation(s)
- Paavo Alku
- Department of Signal Processing and Acoustics, Aalto University, P.O. Box 13000, FI-00076 Aalto, Finland.
| | | | | | | | | |
Collapse
|
31
|
Samlan RA, Story BH, Bunton K. Relation of perceived breathiness to laryngeal kinematics and acoustic measures based on computational modeling. J Speech Lang Hear Res 2013; 56:1209-23. [PMID: 23785184 PMCID: PMC3984008 DOI: 10.1044/1092-4388(2012/12-0194)] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
PURPOSE In this study, the authors sought to determine (a) how specific vocal fold structural and vibratory features relate to breathy voice quality and (b) the relation of perceived breathiness to 4 acoustic correlates of breathiness. METHOD A computational, kinematic model of the vocal fold medial surfaces was used to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: vocal process separation, surface bulging, vibratory nodal point, and epilaryngeal constriction. Twelve naïve listeners rated breathiness of 364 samples relative to a reference. The degree of breathiness was then compared to (a) the underlying kinematic profile and (b) 4 acoustic measures: cepstral peak prominence (CPP), harmonics-to-noise ratio, and two measures of spectral slope. RESULTS Vocal process separation alone accounted for 61.4% of the variance in perceptual rating. Adding nodal point ratio and bulging to the equation increased the explained variance to 88.7%. The acoustic measure CPP accounted for 86.7% of the variance in perceived breathiness, and explained variance increased to 92.6% with the addition of one spectral slope measure. CONCLUSION Breathiness ratings were best explained kinematically by the degree of vocal process separation and acoustically by CPP.
Collapse
|
32
|
Lester RA, Barkmeier-Kraemer J, Story BH. Physiologic and Acoustic Patterns of Essential Vocal Tremor. J Voice 2013; 27:422-32. [DOI: 10.1016/j.jvoice.2013.01.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2012] [Accepted: 01/07/2013] [Indexed: 10/27/2022]
|
33
|
Abstract
Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Dept. of Speech, Language, and Hearing Sciences, University of Arizona, 1131 E. 2nd St., P.O. Box 210071, Tucson, AZ, 85721, United States
| |
Collapse
|
34
|
Monson BB, Lotto AJ, Story BH. Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives. J Acoust Soc Am 2012; 132:1754-64. [PMID: 22978902 PMCID: PMC3460988 DOI: 10.1121/1.4742724] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2011] [Revised: 07/04/2012] [Accepted: 07/16/2012] [Indexed: 05/04/2023]
Abstract
The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Collapse
Affiliation(s)
- Brian B Monson
- National Center for Voice and Speech, University of Utah, 136 S. Main Street, Suite 320, Salt Lake City, Utah 84101, USA.
| | | | | |
Collapse
|
35
|
Abstract
Speech and singing directivity in the horizontal plane was examined using simultaneous multi-channel full-bandwidth recordings to investigate directivity of high-frequency energy, in particular. This method allowed not only for accurate analysis of running speech using the long-term average spectrum, but also for examination of directivity of separate transient phonemes. Several vocal production factors that could affect directivity were examined. Directivity differences were not found between modes of production (speech vs singing) and only slight differences were found between genders and production levels (soft vs normal vs loud), more pronounced in the higher frequencies. Large directivity differences were found between specific voiceless fricatives, with /s,∫/ more directional than /f,θ/ in the 4, 8, 16 kHz octave bands.
Collapse
Affiliation(s)
- Brian B Monson
- National Center for Voice and Speech, University of Utah, 136 S. Main Street, Suite 320, Salt Lake City, Utah 84101, USA.
| | | | | |
Collapse
|
36
|
Abstract
OBJECTIVE The purpose of this study was to examine the relation of perceptual ratings of nasality by experienced listeners, measures of nasalance, and the size of the nasal port opening for three simulated English corner vowels, /i/, /u/, and /a/. DESIGN Samples were generated using a computational model that allowed for exact control of nasal port size and a direct measure of nasalance. Perceptual ratings were obtained using a paired-stimulus presentation. PARTICIPANTS Five experienced listeners. MAIN OUTCOME MEASURES Measures of nasalance and perceptual nasality ratings. RESULTS Differences in nasalance and perceptual ratings of nasality were noted among the three vowels, with values being greater for the high vowels /i/ and /u/ compared to the low vowel /a/. Listeners detected nasality for the high and low vowels simulated with nasal port areas of 0.01 and 0.15 cm(2), respectively. Correlations between ratings of nasality and nasalance were high for all three vowels. CONCLUSIONS Results of the present study show a high correlation between ratings of nasality and measures of nasalance for nasal port areas ranging from 0 to 0.5 cm(2). The correlations were based on sustained vowel samples. The restricted speech sample limits generalization of the findings to clinical data; however, the results are a demonstration of the usefulness of modeling to understand the perceptual phenomena of nasality.
Collapse
Affiliation(s)
- Kate Bunton
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tuscon, AZ 85721-0071, USA.
| | | |
Collapse
|
37
|
Samlan RA, Story BH. Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. J Speech Lang Hear Res 2011; 54:1267-83. [PMID: 21498582 PMCID: PMC3184371 DOI: 10.1044/1092-4388(2011/10-0195)] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
PURPOSE To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2). METHOD The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics. RESULTS CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0 mm to 1.5 mm and decreased with separation > 1.5 mm. CONCLUSIONS CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
Collapse
Affiliation(s)
- Robin A Samlan
- Speech Acoustics Laboratory, University of Arizona, Tucson, USA.
| | | |
Collapse
|
38
|
Abstract
PURPOSE The present study was designed to investigate the relation of formant transitions to place-of-articulation for stop consonants. A speech production model was used to generate simulated utterances containing voiced stop consonants, and a perceptual experiment was performed to test their identification by listeners. METHOD Based on a model of the vocal tract shape, a theoretical basis for reducing highly variable formant transitions to more invariant formant deflection patterns as a function of constriction location was proposed. A speech production model was used to simulate vowel-consonant-vowel (VCV) utterances for 3 underlying vowel-vowel contexts and for which the constriction location was incrementally moved from the lips toward the velar part of the vocal tract. These simulated VCVs were presented to listeners who were asked to identify the consonant. RESULTS Listener responses indicated that phonetic boundaries were well aligned with points along the vocal tract length where there was a shift in the deflection polarity of either the 2nd or 3rd formant. CONCLUSIONS This study demonstrated that regions of the vocal tract exist that, when constricted, shift the formant frequencies in a predictable direction. Based on a perceptual experiment, the boundaries of these acoustically defined regions were shown to coincide with phonetic categories for stop consonants.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, AZ 85721, USA.
| | | |
Collapse
|
39
|
Abstract
The purpose of this study was to conduct an identification experiment with synthetic vowels based on the same sets of speaker-dependent area functions as in Bunton and Story [(2009) J. Acoust. Soc. Am. 125, 19-22], but with additional time-varying characteristics that are more representative of natural speech. The results indicated that vowels synthesized using an area function model that allows for time variation of the vocal tract shape and includes natural vowel durations were more accurately identified for 7 of 11 English vowels than those based on static area functions.
Collapse
Affiliation(s)
- Kate Bunton
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| | | |
Collapse
|
40
|
Abstract
The purpose of this study was to develop a method by which a vowel-consonant-vowel (VCV) utterance based on x-ray microbeam articulatory data could be separated into a vowel-to-vowel transition and a consonant superposition function. The result is a model that represents a vowel sequence as a time-dependent perturbation of the neutral vocal tract shape governed by coefficients of canonical deformation patterns. Consonants were modeled as superposition functions that can force specific portions of the vocal tract shape to be constricted or expanded, over a specific time course. The three VCVs [pa], [ta], and [ka], produced by one female speaker, were analyzed and reconstructed with the developed model. They were shown to be reasonable approximations of the original VCVs, as assessed qualitatively by visual inspection and quantitatively by calculating rms error and correlation coefficients. This establishes a method for future modeling of other speech material.
Collapse
Affiliation(s)
- Brad H Story
- Department of Speech, Language, and Hearing Sciences, Speech Acoustics Laboratory, University of Arizona, Tucson, AZ 85721, USA.
| |
Collapse
|
41
|
Story BH. Vocal tract modes based on multiple area function sets from one speaker. J Acoust Soc Am 2009; 125:EL141-EL147. [PMID: 19354352 PMCID: PMC2677261 DOI: 10.1121/1.3082263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2008] [Revised: 01/02/2009] [Accepted: 01/08/2009] [Indexed: 05/26/2023]
Abstract
The purpose of this study was to derive vocal tract modes from a wider range of vowel area functions for a specific speaker than has been previously reported. Area functions from Story et al. [(1996). J. Acoust. Soc. Am. 100, 537-554] and Story [(2008). J. Acoust. Soc. Am. 123, 327-335] were combined in a composite set from which modes were derived with principal component analysis. Along with scaling coefficients, these modes were used to generate a [F1, F2] formant space. In comparison to formant spaces similarly generated based on the two area function sets alone, the combined version provides a wider range of both F1 and F2 values. This new set of modes may be useful for inverse mapping of formant frequencies to area functions or for modeling of vocal tract shape changes.
Collapse
Affiliation(s)
- Brad H Story
- Department of Speech, Language, and Hearing Sciences, Speech Acoustics Laboratory, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
42
|
Bunton K, Story BH. Identification of synthetic vowels based on selected vocal tract area functions. J Acoust Soc Am 2009; 125:19-22. [PMID: 19173389 PMCID: PMC2677276 DOI: 10.1121/1.3033740] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2008] [Revised: 10/14/2008] [Accepted: 10/16/2008] [Indexed: 05/26/2023]
Abstract
The purpose of this study was to determine the degree to which synthetic vowel samples based on previously reported vocal tract area functions of eight speakers could be accurately identified by listeners. Vowels were synthesized with a wave-reflection type of vocal tract model coupled to a voice source. A particular vowel was generated by specifying an area function that had been derived from previous magnetic resonance imaging based measurements. The vowel samples were presented to ten listeners in a forced choice paradigm in which they were asked to identify the vowel. Results indicated that the vowels [i], [ae], and [u] were identified most accurately for all of speakers. The identification errors of the other vowels were typically due to confusions with adjacent vowels.
Collapse
Affiliation(s)
- Kate Bunton
- Department of Speech, Speech Acoustics Laboratory, University of Arizona, Tucson, Arizona 85721, USA.
| | | |
Collapse
|
43
|
Lowell SY, Barkmeier-Kraemer JM, Hoit JD, Story BH. Respiratory and laryngeal function during spontaneous speaking in teachers with voice disorders. J Speech Lang Hear Res 2008; 51:333-49. [PMID: 18367681 DOI: 10.1044/1092-4388(2008/025)] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
PURPOSE To determine if respiratory and laryngeal function during spontaneous speaking were different for teachers with voice disorders compared with teachers without voice problems. METHOD Eighteen teachers, 9 with and 9 without voice disorders, were included in this study. Respiratory function was measured with magnetometry, and laryngeal function was measured with electroglottography during 3 spontaneous speaking tasks: a simulated teaching task at a typical loudness level, a simulated teaching task at an increased loudness level, and a conversational speaking task. Electroglottography measures were also obtained for 3 structured speaking tasks: a paragraph reading task, a sustained vowel, and a maximum phonation time vowel. RESULTS Teachers with voice disorders started and ended their breath groups at significantly smaller lung volumes than teachers without voice problems during teaching-related speaking tasks; however, there were no between-group differences in laryngeal measures. Task-related differences were found on several respiratory measures and on one laryngeal measure. CONCLUSIONS These findings suggest that teachers with voice disorders used different speech breathing strategies than teachers without voice problems. Implications for clinical management of teachers with voice disorders are discussed.
Collapse
Affiliation(s)
- Soren Y Lowell
- National Institute of Neurological Disorders and Stroke, Laryngeal and Speech Section, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
44
|
Story BH. Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. J Acoust Soc Am 2008; 123:327-35. [PMID: 18177162 PMCID: PMC2377017 DOI: 10.1121/1.2805683] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A new set of area functions for vowels has been obtained with magnetic resonance imaging from the same speaker as that previously reported in 1996 [Story et al., J. Acoust. Soc. Am. 100, 537-554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on magnetic resonance images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intraspeaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
45
|
Abstract
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
46
|
Sapir S, Spielman JL, Ramig LO, Story BH, Fox C. Effects of intensive voice treatment (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson disease: acoustic and perceptual findings. J Speech Lang Hear Res 2007; 50:899-912. [PMID: 17675595 DOI: 10.1044/1092-4388(2007/064)] [Citation(s) in RCA: 206] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
PURPOSE To evaluate the effects of intensive voice treatment targeting vocal loudness (the Lee Silverman Voice Treatment [LSVT]) on vowel articulation in dysarthric individuals with idiopathic Parkinson's disease (PD). METHOD A group of individuals with PD receiving LSVT (n = 14) was compared to a group of individuals with PD not receiving LSVT (n = 15) and a group of age-matched healthy individuals (n = 14) on the variables vocal sound pressure level (VocSPL); various measures of the first (F1) and second (F2) formants of the vowels /i/, /u/, and /a/; vowel triangle area; and perceptual vowel ratings. The vowels were extracted from the words key, stew, and Bobby embedded in phrases. Perceptual vowel rating was performed by trained raters using a visual analog scale. RESULTS Only VocSPL, F2 of the vowel /u/ (F2u), and the ratio F2i/F2u significantly differed between patients and healthy individuals pretreatment. These variables, along with perceptual vowel ratings, significantly changed (improved) in the group receiving LSVT only. CONCLUSION These results, along with previous findings, add further support to the generalized therapeutic impact of intensive voice treatment on orofacial functions (speech, swallowing, facial expression) and respiratory and laryngeal functions in individuals with PD.
Collapse
Affiliation(s)
- Shimon Sapir
- Department of Communication Sciences and Disorders, Faculty of Social Welfare and Health Sciences, University of Haifa, Mount Carmel, Haifa 39105, Israel.
| | | | | | | | | |
Collapse
|
47
|
Pruthi T, Espy-Wilson CY, Story BH. Simulation and analysis of nasalized vowels based on magnetic resonance imaging data. J Acoust Soc Am 2007; 121:3858-73. [PMID: 17552733 DOI: 10.1121/1.2722220] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In this study, vocal tract area functions for one American English speaker, recorded using magnetic resonance imaging, were used to simulate and analyze the acoustics of vowel nasalization. Computer vocal tract models and susceptance plots were used to study the three most important sources of acoustic variability involved in the production of nasalized vowels: velar coupling area, asymmetry of nasal passages, and the sinus cavities. Analysis of the susceptance plots of the pharyngeal and oral cavities, -(B(p)+B(o)), and the nasal cavity, B(n), helped in understanding the movement of poles and zeros with varying coupling areas. Simulations using two nasal passages clearly showed the introduction of extra pole-zero pairs due to the asymmetry between the passages. Simulations with the inclusion of maxillary and sphenoidal sinuses showed that each sinus can potentially introduce one pole-zero pair in the spectrum. Further, the right maxillary sinus introduced a pole-zero pair at the lowest frequency. The effective frequencies of these poles and zeros due to the sinuses in the sum of the oral and nasal cavity outputs changes with a change in the configuration of the oral cavity, which may happen due to a change in the coupling area, or in the vowel being articulated.
Collapse
Affiliation(s)
- Tarun Pruthi
- Speech Communication Laboratory, Institute of Systems Research and Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA.
| | | | | |
Collapse
|
48
|
Abstract
Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
49
|
Lowell SY, Story BH. Simulated effects of cricothyroid and thyroarytenoid muscle activation on adult-male vocal fold vibration. J Acoust Soc Am 2006; 120:386-97. [PMID: 16875234 DOI: 10.1121/1.2204442] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Adjustments to cricothyroid and thyroarytenoid muscle activation are critical to the control of fundamental frequency and aerodynamic aspects of vocal fold vibration in humans. The aerodynamic and physical effects of these muscles are not well understood and are difficult to study in vivo. Knowledge of the contributions of these two muscles is essential to understanding both normal and disordered voice physiology. In this study, a three-mass model for voice simulation in adult males was used to produce systematic changes to cricothyroid and thyroarytenoid muscle activation levels. Predicted effects on fundamental frequency, aerodynamic quantities, and physical quantities of vocal fold vibration were assessed. Certain combinations of these muscle activations resulted in aerodynamic and physical characteristics of vibration that might increase the mechanical stress placed on the vocal fold tissue.
Collapse
Affiliation(s)
- Soren Y Lowell
- Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-210071, USA
| | | |
Collapse
|
50
|
Farinella KA, Hixon TJ, Hoit JD, Story BH, Jones PA. Listener perception of respiratory-induced voice tremor. Am J Speech Lang Pathol 2006; 15:72-84. [PMID: 16533094 DOI: 10.1044/1058-0360(2006/008)] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2005] [Revised: 08/29/2005] [Accepted: 11/10/2005] [Indexed: 05/07/2023]
Abstract
PURPOSE The purpose of this study was to determine the relation of respiratory oscillation to the perception of voice tremor. METHOD Forced oscillation of the respiratory system was used to simulate variations in alveolar pressure such as are characteristic of voice tremor of respiratory origin. Five healthy men served as speakers, and 6 clinically experienced women served as listeners. Speakers produced utterances while forced sinusoidal pressure changes were applied to the surface of the respiratory system. Utterances included vowels and sentences produced using usual loudness, pitch, quality, and rate, and vowels produced using different loudness, pitch, and quality. Perceptual tasks included detection threshold for voice tremor and pair comparison judgments in which listeners identified the sample with the greater magnitude of voice tremor. RESULTS The mean detection threshold for voice tremor was 1.37 cmH(2)O (SD = 0.47) for vowel utterances and 2.16 cmH(2)O (SD = 1.52) for sentence utterances. Tremor magnitude was judged to be different for vowel and sentence utterances, but not for different vowels. Results revealed differential effects for loudness, pitch, and quality. CONCLUSIONS These findings offer implications for the evaluation and management of voice tremor of respiratory causation.
Collapse
|