1
|
Kraxberger F, Näger C, Laudato M, Sundström E, Becker S, Mihaescu M, Kniesburges S, Schoder S. On the Alignment of Acoustic and Coupled Mechanic-Acoustic Eigenmodes in Phonation by Supraglottal Duct Variations. Bioengineering (Basel) 2023; 10:1369. [PMID: 38135960 PMCID: PMC10740796 DOI: 10.3390/bioengineering10121369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/15/2023] [Accepted: 11/23/2023] [Indexed: 12/24/2023] Open
Abstract
Sound generation in human phonation and the underlying fluid-structure-acoustic interaction that describes the sound production mechanism are not fully understood. A previous experimental study, with a silicone made vocal fold model connected to a straight vocal tract pipe of fixed length, showed that vibroacoustic coupling can cause a deviation in the vocal fold vibration frequency. This occurred when the fundamental frequency of the vocal fold motion was close to the lowest acoustic resonance frequency of the pipe. What is not fully understood is how the vibroacoustic coupling is influenced by a varying vocal tract length. Presuming that this effect is a pure coupling of the acoustical effects, a numerical simulation model is established based on the computation of the mechanical-acoustic eigenvalue. With varying pipe lengths, the lowest acoustic resonance frequency was adjusted in the experiments and so in the simulation setup. In doing so, the evolution of the vocal folds' coupled eigenvalues and eigenmodes is investigated, which confirms the experimental findings. Finally, it was shown that for normal phonation conditions, the mechanical mode is the most efficient vibration pattern whenever the acoustic resonance of the pipe (lowest formant) is far away from the vocal folds' vibration frequency. Whenever the lowest formant is slightly lower than the mechanical vocal fold eigenfrequency, the coupled vocal fold motion pattern at the formant frequency dominates.
Collapse
Affiliation(s)
- Florian Kraxberger
- Institute of Fundamentals and Theory in Electrical Engineering (IGTE), Graz University of Technology, Inffeldgasse 18/I, 8010 Graz, Austria;
| | - Christoph Näger
- Institute of Fluid Mechanics (LSTM), Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany; (C.N.); (S.B.)
| | - Marco Laudato
- Department of Engineering Mechanics, FLOW Research Center, KTH Royal Institute of Technology, Osquars Backe 18, 10044 Stockholm, Sweden; (M.L.); (E.S.); (M.M.)
| | - Elias Sundström
- Department of Engineering Mechanics, FLOW Research Center, KTH Royal Institute of Technology, Osquars Backe 18, 10044 Stockholm, Sweden; (M.L.); (E.S.); (M.M.)
| | - Stefan Becker
- Institute of Fluid Mechanics (LSTM), Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany; (C.N.); (S.B.)
| | - Mihai Mihaescu
- Department of Engineering Mechanics, FLOW Research Center, KTH Royal Institute of Technology, Osquars Backe 18, 10044 Stockholm, Sweden; (M.L.); (E.S.); (M.M.)
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstraße 1, 91054 Erlangen, Germany;
| | - Stefan Schoder
- Institute of Fundamentals and Theory in Electrical Engineering (IGTE), Graz University of Technology, Inffeldgasse 18/I, 8010 Graz, Austria;
| |
Collapse
|
2
|
Näger C, Kniesburges S, Tur B, Schoder S, Becker S. An Investigation of Acoustic Back-Coupling in Human Phonation on a Synthetic Larynx Model. Bioengineering (Basel) 2023; 10:1343. [PMID: 38135934 PMCID: PMC10740801 DOI: 10.3390/bioengineering10121343] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 11/12/2023] [Accepted: 11/19/2023] [Indexed: 12/24/2023] Open
Abstract
In the human phonation process, acoustic standing waves in the vocal tract can influence the fluid flow through the glottis as well as vocal fold oscillation. To investigate the amount of acoustic back-coupling, the supraglottal flow field has been recorded via high-speed particle image velocimetry (PIV) in a synthetic larynx model for several configurations with different vocal tract lengths. Based on the obtained velocity fields, acoustic source terms were computed. Additionally, the sound radiation into the far field was recorded via microphone measurements and the vocal fold oscillation via high-speed camera recordings. The PIV measurements revealed that near a vocal tract resonance frequency fR, the vocal fold oscillation frequency fo (and therefore also the flow field's fundamental frequency) jumps onto fR. This is accompanied by a substantial relative increase in aeroacoustic sound generation efficiency. Furthermore, the measurements show that fo-fR-coupling increases vocal efficiency, signal-to-noise ratio, harmonics-to-noise ratio and cepstral peak prominence. At the same time, the glottal volume flow needed for stable vocal fold oscillation decreases strongly. All of this results in an improved voice quality and phonation efficiency so that a person phonating with fo-fR-coupling can phonate longer and with better voice quality.
Collapse
Affiliation(s)
- Christoph Näger
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| | - Stefan Kniesburges
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Bogac Tur
- Division of Phoniatrics and Pediatric Audiology, Department of Otorhinolaryngology, Head & Neck Surgery, University Hospital Erlangen, Medical School, Friedrich-Alexander-Universität Erlangen-Nürnberg, Waldstrasse 1, 91054 Erlangen, Germany
| | - Stefan Schoder
- Aeroacoustics and Vibroacoustics Group, Institute of Fundamentals and Theory in Electrical Engineering, Graz University of Technology, Inffeldgasse 16, 8010 Graz, Austria;
| | - Stefan Becker
- Institute of Fluid Mechanics, Friedrich-Alexander-Universität Erlangen-Nürnberg, Cauerstraße 4, 91058 Erlangen, Germany;
| |
Collapse
|
3
|
Kröger BJ. Computer-Implemented Articulatory Models for Speech Production: A Review. Front Robot AI 2022; 9:796739. [PMID: 35494539 PMCID: PMC9040071 DOI: 10.3389/frobt.2022.796739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Accepted: 02/21/2022] [Indexed: 11/24/2022] Open
Abstract
Modeling speech production and speech articulation is still an evolving research topic. Some current core questions are: What is the underlying (neural) organization for controlling speech articulation? How to model speech articulators like lips and tongue and their movements in an efficient but also biologically realistic way? How to develop high-quality articulatory-acoustic models leading to high-quality articulatory speech synthesis? Thus, on the one hand computer-modeling will help us to unfold underlying biological as well as acoustic-articulatory concepts of speech production and on the other hand further modeling efforts will help us to reach the goal of high-quality articulatory-acoustic speech synthesis based on more detailed knowledge on vocal tract acoustics and speech articulation. Currently, articulatory models are not able to reach the quality level of corpus-based speech synthesis. Moreover, biomechanical and neuromuscular based approaches are complex and still not usable for sentence-level speech synthesis. This paper lists many computer-implemented articulatory models and provides criteria for dividing articulatory models in different categories. A recent major research question, i.e., how to control articulatory models in a neurobiologically adequate manner is discussed in detail. It can be concluded that there is a strong need to further developing articulatory-acoustic models in order to test quantitative neurobiologically based control concepts for speech articulation as well as to uncover the remaining details in human articulatory and acoustic signal generation. Furthermore, these efforts may help us to approach the goal of establishing high-quality articulatory-acoustic as well as neurobiologically grounded speech synthesis.
Collapse
|
4
|
Story BH, Bunton K. The relation of velopharyngeal coupling area to the identification of stop versus nasal consonants in North American English based on speech generated by acoustically driven vocal tract modulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3618. [PMID: 34852618 DOI: 10.1121/10.0007223] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 10/23/2021] [Indexed: 06/13/2023]
Abstract
The purpose of this study was to determine the threshold of velopharyngeal coupling area at which listeners switch from identifying a consonant as a stop to a nasal in North American English, based on V1CV2 stimuli generated with a speech production model that encodes phonetic segments as relative acoustic targets. Each V1CV2 was synthesized with a set of velopharyngeal coupling functions whose area ranged from 0 to 0.1 cm2. Results show that consonants were identified by listeners as a stop when the coupling area was less than 0.035-0.057 cm2, depending on place of articulation and final vowel. The smallest coupling area (0.035 cm2) at which the stop-to-nasal switch occurred was found for an alveolar consonant in the /ɑCi/ context, whereas the largest (0.057 cm2) was for a bilabial in /ɑCɑ/. For each stimulus, the balance of oral versus nasal acoustic energy was characterized by the peak nasalance during the consonant. Stimuli with peak nasalance below 40% were mostly identified by listeners as stops, whereas those above 40% were identified as nasals. This study was intended to be a precursor to further investigations using the same model but scaled to represent the developing speech production system of male and female talkers.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA
| |
Collapse
|
5
|
Story BH, Bunton K. Identification of voiced stop consonants produced by acoustically driven vocal tract modulations. JASA EXPRESS LETTERS 2021; 1:085203. [PMID: 36154248 DOI: 10.1121/10.0005917] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
A recently developed speech production model, in which speech segments are specified by relative acoustic events called resonance deflection patterns, was used to generate speech signals that were presented to listeners in a perceptual test. The purpose was to determine the effect of variations of the magnitude and polarity of the third resonance deflection on identification of the consonant in a V1CV2 disyllable while the deflections of the first and second resonances were held constant. Result showed that listeners' identification changed from /d/ to /ɡ/ when the polarity of the third resonance deflection switched from positive to negative.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA ,
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721-0071, USA ,
| |
Collapse
|
6
|
Bodaghi D, Jiang W, Xue Q, Zheng X. Effect of Supraglottal Acoustics on Fluid-Structure Interaction During Human Voice Production. J Biomech Eng 2021; 143:1094015. [PMID: 33399816 DOI: 10.1115/1.4049497] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Indexed: 11/08/2022]
Abstract
A hydrodynamic/acoustic splitting method was used to examine the effect of supraglottal acoustics on fluid-structure interactions during human voice production in a two-dimensional computational model. The accuracy of the method in simulating compressible flows in typical human airway conditions was verified by comparing it to full compressible flow simulations. The method was coupled with a three-mass model of vocal fold lateral motion to simulate fluid-structure interactions during human voice production. By separating the acoustic perturbation components of the airflow, the method allows isolation of the role of supraglottal acoustics in fluid-structure interactions. The results showed that an acoustic resonance between a higher harmonic of the sound source and the first formant of the supraglottal tract occurred during normal human phonation when the fundamental frequency was much lower than the formants. The resonance resulted in acoustic pressure perturbation at the glottis which was of the same order as the incompressible flow pressure and found to affect vocal fold vibrations and glottal flow rate waveform. Specifically, the acoustic perturbation delayed the opening of the glottis, reduced the vertical phase difference of vocal fold vibrations, decreased flow rate and maximum flow deceleration rate (MFDR) at the glottal exit; yet, they had little effect on glottal opening. The results imply that the sound generation in the glottis and acoustic resonance in the supraglottal tract are coupled processes during human voice production and computer modeling of vocal fold vibrations needs to include supraglottal acoustics for accurate predictions.
Collapse
Affiliation(s)
- Dariush Bodaghi
- Department of Mechanical Engineering, University of Maine, 204 Crosby Hall, Orono, ME 04473
| | - Weili Jiang
- Department of Mechanical Engineering, University of Maine, 204 Crosby Hall, Orono, ME 04473
| | - Qian Xue
- Department of Mechanical Engineering, University of Maine, Room 213, Boardman Hall, Orono, ME 04473
| | - Xudong Zheng
- Department of Mechanical Engineering, University of Maine, Room 213 A, Boardman Hall, Orono, ME 04473
| |
Collapse
|
7
|
Bergevin C, Narayan C, Williams J, Mhatre N, Steeves JK, Bernstein JG, Story B. Overtone focusing in biphonic tuvan throat singing. eLife 2020; 9:50476. [PMID: 32048990 PMCID: PMC7064340 DOI: 10.7554/elife.50476] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 01/31/2020] [Indexed: 11/13/2022] Open
Abstract
Khoomei is a unique singing style originating from the republic of Tuva in central Asia. Singers produce two pitches simultaneously: a booming low-frequency rumble alongside a hovering high-pitched whistle-like tone. The biomechanics of this biphonation are not well-understood. Here, we use sound analysis, dynamic magnetic resonance imaging, and vocal tract modeling to demonstrate how biphonation is achieved by modulating vocal tract morphology. Tuvan singers show remarkable control in shaping their vocal tract to narrowly focus the harmonics (or overtones) emanating from their vocal cords. The biphonic sound is a combination of the fundamental pitch and a focused filter state, which is at the higher pitch (1-2 kHz) and formed by merging two formants, thereby greatly enhancing sound-production in a very narrow frequency range. Most importantly, we demonstrate that this biphonation is a phenomenon arising from linear filtering rather than from a nonlinear source.
Collapse
Affiliation(s)
- Christopher Bergevin
- Physics and Astronomy, York University, Toronto, Canada.,Centre for Vision Research, York University, Toronto, Canada.,Fields Institute for Research in Mathematical Sciences, Toronto, Canada.,Kavli Institute of Theoretical Physics, University of California, Santa Barbara, United States
| | - Chandan Narayan
- Languages, Literatures and Linguistics, York University, Toronto, Canada
| | - Joy Williams
- York MRI Facility, York University, Toronto, Canada
| | | | - Jennifer Ke Steeves
- Centre for Vision Research, York University, Toronto, Canada.,Psychology, York University, Toronto, Canada
| | - Joshua Gw Bernstein
- National Military Audiology & Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, United States
| | - Brad Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, United States
| |
Collapse
|
8
|
Story BH, Bunton K. A model of speech production based on the acoustic relativity of the vocal tract. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:2522. [PMID: 31671993 PMCID: PMC7064311 DOI: 10.1121/1.5127756] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 09/10/2019] [Accepted: 09/12/2019] [Indexed: 06/10/2023]
Abstract
A model is described in which the effects of articulatory movements to produce speech are generated by specifying relative acoustic events along a time axis. These events consist of directional changes of the vocal tract resonance frequencies that, when associated with a temporal event function, are transformed via acoustic sensitivity functions, into time-varying modulations of the vocal tract shape. Because the time course of the events may be considerably overlapped in time, coarticulatory effects are automatically generated. Production of sentence-level speech with the model is demonstrated with audio samples and vocal tract animations.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
9
|
Qureshi TM, Syed KS. Improved vocal tract model for the elongation of segment lengths in a real time. COMPUT SPEECH LANG 2019. [DOI: 10.1016/j.csl.2019.02.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
10
|
Ogata K, Kodama T, Hayakawa T, Aoki R. Inverse estimation of the vocal tract shape based on a vocal tract mapping interface. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1961. [PMID: 31046355 DOI: 10.1121/1.5095409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 03/08/2019] [Indexed: 06/09/2023]
Abstract
This paper describes the inverse estimation of the vocal tract shape for vowels by using a vocal tract mapping interface. In prior research, an interface capable of generating a vocal tract shape by clicking on its window was developed. The vocal tract shapes for five vowels are located at the vertices of a pentagonal chart and a different shape that corresponds to an arbitrary mouse-pointer position on the interface window is calculated by interpolation. In this study, an attempt was made to apply the interface to the inverse estimation of vocal tract shapes based on formant frequencies. A target formant frequency data set was searched based on the geometry of the interface window by using a coarse to fine algorithm. It was revealed that the estimated vocal tract shapes obtained from the mapping interface were close to those from magnetic resonance imaging data in another study and to lip area data captured using video recordings. The results of experiments to evaluate the estimated vocal tract shapes showed that each subject demonstrated unique trajectories on the interface window corresponding to the estimated vocal tract shapes. These results suggest the usefulness of inverse estimation using the interface.
Collapse
Affiliation(s)
- Kohichi Ogata
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Tayuto Kodama
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Tomohiro Hayakawa
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Riku Aoki
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| |
Collapse
|
11
|
Fujimura S, Kojima T, Okanoue Y, Shoji K, Inoue M, Hori R. Discrimination of "hot potato voice" caused by upper airway obstruction utilizing a support vector machine. Laryngoscope 2018; 129:1301-1307. [PMID: 30485441 DOI: 10.1002/lary.27584] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/28/2018] [Indexed: 11/09/2022]
Abstract
OBJECTIVES/HYPOTHESIS "Hot potato voice" (HPV) is a thick, muffled voice caused by pharyngeal or laryngeal diseases characterized by severe upper airway obstruction, including acute epiglottitis and peritonsillitis. To develop a method for determining upper-airway emergency based on this important vocal feature, we investigated the acoustic characteristics of HPV using a physical, articulatory speech synthesis model. The results of the simulation were then applied to design a computerized recognition framework using a mel-frequency cepstral coefficient domain support vector machine (SVM). STUDY DESIGN Quasi-experimental research design. METHODS Changes in the voice spectral envelope caused by upper airway obstructions were analyzed using a hybrid time-frequency model of articulatory speech synthesis. We evaluated variations in the formant structure and thresholds of critical vocal tract area functions that triggered HPV. The SVMs were trained using a dataset of 2,200 synthetic voice samples generated by an articulatory synthesizer. Voice classification experiments on test datasets of real patient voices were then performed. RESULTS On phonation of the Japanese vowel /e/, the frequency of the second formant fell and coalesced with that of the first formant as the area function of the oropharynx decreased. Changes in higher-order formants varied according to constriction location. The highest accuracy afforded by the SVM classifier trained with synthetic data was 88.3%. CONCLUSIONS HPV caused by upper airway obstruction has a highly characteristic spectral envelope. Based on this distinctive voice feature, our SVM classifier, who was trained using synthetic data, was able to diagnose upper-airway obstructions with a high degree of accuracy. LEVEL OF EVIDENCE 2c Laryngoscope, 129:1301-1307, 2019.
Collapse
Affiliation(s)
| | - Tsuyoshi Kojima
- Department of Otolaryngology , Tenri Hospital, Tenri, Nara, Japan
| | - Yusuke Okanoue
- Department of Otolaryngology , Tenri Hospital, Tenri, Nara, Japan
| | - Kazuhiko Shoji
- Department of Otolaryngology , Tenri Hospital, Tenri, Nara, Japan
| | - Masato Inoue
- Department of Electrical Engineering and Bioscience , School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryusuke Hori
- Department of Otolaryngology , Tenri Hospital, Tenri, Nara, Japan
| |
Collapse
|
12
|
Sharma P, Abrol V, Nivedita, Sao AK. Reducing footprint of unit selection based text-to-speech system using compressed sensing and sparse representation. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
|
13
|
Story BH, Vorperian HK, Bunton K, Durtschi RB. An age-dependent vocal tract model for males and females based on anatomic measurements. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:3079. [PMID: 29857736 PMCID: PMC5966313 DOI: 10.1121/1.5038264] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 04/29/2018] [Accepted: 05/01/2018] [Indexed: 05/29/2023]
Abstract
The purpose of this study was to take a first step toward constructing a developmental and sex-specific version of a parametric vocal tract area function model representative of male and female vocal tracts ranging in age from infancy to 12 yrs, as well as adults. Anatomic measurements collected from a large imaging database of male and female children and adults provided the dataset from which length warping and cross-dimension scaling functions were derived, and applied to the adult-based vocal tract model to project it backward along an age continuum. The resulting model was assessed qualitatively by projecting hypothetical vocal tract shapes onto midsagittal images from the cohort of children, and quantitatively by comparison of formant frequencies produced by the model to those reported in the literature. An additional validation of modeled vocal tract shapes was made possible by comparison to cross-sectional area measurements obtained for children and adults using acoustic pharyngometry. This initial attempt to generate a sex-specific developmental vocal tract model paves a path to study the relation of vocal tract dimensions to documented prepubertal acoustic differences.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Houri K Vorperian
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| | - Kate Bunton
- Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85718, USA
| | - Reid B Durtschi
- Vocal Tract Development Lab, Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue # 429, Madison, Wisconsin 53705, USA
| |
Collapse
|
14
|
Elie B, Laprie Y. Acoustic impact of the gradual glottal abduction degree on the production of fricatives: A numerical study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:1303. [PMID: 28964087 DOI: 10.1121/1.5000232] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The paper presents a numerical study about the acoustic impact of the gradual glottal opening on the production of fricatives. Sustained fricatives are simulated by using classic lumped circuit element methods to compute the propagation of the acoustic wave along the vocal tract. A recent glottis model is connected to the wave solver to simulate a partial abduction of the vocal folds during their self-oscillating cycles. Area functions of fricatives at the three places of articulation of French have been extracted from static MRI acquisitions. Simulations highlight the existence of three distinct regimes, named A, B, and C, depending on the degree of abduction of the glottis. They are characterized by the frication noise level: A exhibits a low frication noise level, B, which is a transitional unstable regime, is a mixed noise/voice signal, and C contains only frication noise. They have significant impacts on the first spectral moments. Simulations show that their boundaries depend on articulatory and glottal configurations. The transition regime B is shown to be unstable: it requires very specific configurations in comparison with other regimes, and acoustic features are very sensitive to small perturbations of the glottal configuration abduction in this regime.
Collapse
Affiliation(s)
- Benjamin Elie
- Laboratoire Lorrain de Recherche en Informatique et ses Applications, l'Institut National de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Vandoeuvre-les-Nancy, France
| | - Yves Laprie
- Laboratoire Lorrain de Recherche en Informatique et ses Applications, l'Institut National de Recherche en Informatique et en Automatique, Centre National de la Recherche Scientifique, Université de Lorraine, Vandoeuvre-les-Nancy, France
| |
Collapse
|
15
|
Horáček J, Radolf V, Laukkanen AM. Low frequency mechanical resonance of the vocal tract in vocal exercises that apply tubes. Biomed Signal Process Control 2017. [DOI: 10.1016/j.bspc.2017.02.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
16
|
Pasch B, Tokuda IT, Riede T. Grasshopper mice employ distinct vocal production mechanisms in different social contexts. Proc Biol Sci 2017; 284:20171158. [PMID: 28724740 PMCID: PMC5543235 DOI: 10.1098/rspb.2017.1158] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 06/20/2017] [Indexed: 01/09/2023] Open
Abstract
Functional changes in vocal organ morphology and motor control facilitate the evolution of acoustic signal diversity. Although many rodents produce vocalizations in a variety of social contexts, few studies have explored the underlying production mechanisms. Here, we describe mechanisms of audible and ultrasonic vocalizations (USVs) produced by grasshopper mice (genus Onychomys). Grasshopper mice are predatory rodents of the desert that produce both loud, long-distance advertisement calls and USVs in close-distance mating contexts. Using live-animal recording in normal air and heliox, laryngeal and vocal tract morphological investigations, and biomechanical modelling, we found that grasshopper mice employ two distinct vocal production mechanisms. In heliox, changes in higher-harmonic amplitudes of long-distance calls indicate an airflow-induced tissue vibration mechanism, whereas changes in fundamental frequency of USVs support a whistle mechanism. Vocal membranes and a thin lamina propria aid in the production of long-distance calls by increasing glottal efficiency and permitting high frequencies, respectively. In addition, tuning of fundamental frequency to the second resonance of a bell-shaped vocal tract increases call amplitude. Our findings indicate that grasshopper mice can dynamically adjust motor control to suit the social context and have novel morphological adaptations that facilitate long-distance communication.
Collapse
Affiliation(s)
- Bret Pasch
- Department of Biological Sciences, Northern Arizona University, 617 S. Beaver Street, Flagstaff, AZ 86011, USA
| | - Isao T Tokuda
- Department of Mechanical Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan
| | - Tobias Riede
- Department of Physiology, Midwestern University, 19555 North 59th Avenue, Glendale, AZ 85308, USA
| |
Collapse
|
17
|
Story BH, Bunton K. An acoustically-driven vocal tract model for stop consonant production. SPEECH COMMUNICATION 2017; 87:1-17. [PMID: 28093574 PMCID: PMC5234468 DOI: 10.1016/j.specom.2016.12.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The purpose of this study was to further develop a multi-tier model of the vocal tract area function in which the modulations of shape to produce speech are generated by the product of a vowel substrate and a consonant superposition function. The new approach consists of specifying input parameters for a target consonant as a set of directional changes in the resonance frequencies of the vowel substrate. Using calculations of acoustic sensitivity functions, these "resonance deflection patterns" are transformed into time-varying deformations of the vocal tract shape without any direct specification of location or extent of the consonant constriction along the vocal tract. The configuration of the constrictions and expansions that are generated by this process were shown to be physiologically-realistic and produce speech sounds that are easily identifiable as the target consonants. This model is a useful enhancement for area function-based synthesis and can serve as a tool for understanding how the vocal tract is shaped by a talker during speech production.
Collapse
|
18
|
Jiang W, Zheng X, Xue Q. Computational Modeling of Fluid-Structure-Acoustics Interaction during Voice Production. Front Bioeng Biotechnol 2017; 5:7. [PMID: 28243588 PMCID: PMC5304452 DOI: 10.3389/fbioe.2017.00007] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 01/27/2017] [Indexed: 11/13/2022] Open
Abstract
The paper presented a three-dimensional, first-principle based fluid-structure-acoustics interaction computer model of voice production, which employed a more realistic human laryngeal and vocal tract geometries. Self-sustained vibrations, important convergent-divergent vibration pattern of the vocal folds, and entrainment of the two dominant vibratory modes were captured. Voice quality-associated parameters including the frequency, open quotient, skewness quotient, and flow rate of the glottal flow waveform were found to be well within the normal physiological ranges. The analogy between the vocal tract and a quarter-wave resonator was demonstrated. The acoustic perturbed flux and pressure inside the glottis were found to be at the same order with their incompressible counterparts, suggesting strong source-filter interactions during voice production. Such high fidelity computational model will be useful for investigating a variety of pathological conditions that involve complex vibrations, such as vocal fold paralysis, vocal nodules, and vocal polyps. The model is also an important step toward a patient-specific surgical planning tool that can serve as a no-risk trial and error platform for different procedures, such as injection of biomaterials and thyroplastic medialization.
Collapse
Affiliation(s)
- Weili Jiang
- Mechanical Engineering Department, University of Maine , Orono, ME , USA
| | - Xudong Zheng
- Mechanical Engineering Department, University of Maine , Orono, ME , USA
| | - Qian Xue
- Mechanical Engineering Department, University of Maine , Orono, ME , USA
| |
Collapse
|
19
|
Neely KD, Bunton K, Story BH. A Modeling Study of the Effects of Vocal Tract Movement Duration and Magnitude on the F2 Trajectory in CV Words. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:1327-1334. [PMID: 27768174 PMCID: PMC5399760 DOI: 10.1044/2016_jslhr-s-14-0331] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/25/2014] [Revised: 09/11/2015] [Accepted: 02/19/2016] [Indexed: 06/06/2023]
Abstract
PURPOSE This study used a computational vocal tract model to investigate the relationship of diphthong duration and vocal tract movement magnitude to measures of the F2 trajectory in CV words. METHOD Three words (bough, boy, and buy) were simulated on the basis of an adult female vocal tract model, in which the model parameters were estimated from audio recordings of a female talker. Model parameters were then modified to generate 35 simulations of each word corresponding to 7 different durations and 5 movement magnitude settings. In addition, these simulations were repeated with vocal tract lengths representative of an adult male and an approximately 6-year-old child. RESULTS On the basis of univariate analysis, measures of frequency predicted changes in magnitude, and temporal measures predicted changes in speaking rate consistent with the hypothesis. The combined effects of duration and magnitude showed that F2 was more sensitive to changes in magnitude at shorter word durations compared with longer word durations. This finding held across words and vocal tract length. CONCLUSIONS Results suggest that there is an interaction between duration and magnitude that affects the slope of the F2 trajectory. The next step is to relate kinematics to F2 trajectory output using real speakers.
Collapse
|
20
|
Arnela M, Dabbaghchian S, Blandin R, Guasch O, Engwall O, Van Hirtum A, Pelorson X. Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1707. [PMID: 27914393 DOI: 10.1121/1.4962488] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
For many years, the vocal tract shape has been approximated by one-dimensional (1D) area functions to study the production of voice. More recently, 3D approaches allow one to deal with the complex 3D vocal tract, although area-based 3D geometries of circular cross-section are still in use. However, little is known about the influence of performing such a simplification, and some alternatives may exist between these two extreme options. To this aim, several vocal tract geometry simplifications for vowels [ɑ], [i], and [u] are investigated in this work. Six cases are considered, consisting of realistic, elliptical, and circular cross-sections interpolated through a bent or straight midline. For frequencies below 4-5 kHz, the influence of bending and cross-sectional shape has been found weak, while above these values simplified bent vocal tracts with realistic cross-sections are necessary to correctly emulate higher-order mode propagation. To perform this study, the finite element method (FEM) has been used. FEM results have also been compared to a 3D multimodal method and to a classical 1D frequency domain model.
Collapse
Affiliation(s)
- Marc Arnela
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain
| | - Saeed Dabbaghchian
- Department of Speech, Music and Hearing, School of Computer Science & Communication, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Rémi Blandin
- GIPSA-lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France
| | - Oriol Guasch
- GTM-Grup de recerca en Tecnologies Mèdia, La Salle, Universitat Ramon Llull, C/Quatre Camins 30, Barcelona, E-08022, Catalonia, Spain
| | - Olov Engwall
- Department of Speech, Music and Hearing, School of Computer Science & Communication, Kungliga Tekniska högskolan Royal Institute of Technology, Stockholm, Sweden
| | - Annemie Van Hirtum
- GIPSA-lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France
| | - Xavier Pelorson
- GIPSA-lab, Unité Mixte de Recherche au Centre National de la Recherche Scientifique 5216, Grenoble Campus, St. Martin d'Heres, F-38402, France
| |
Collapse
|
21
|
Hueber T, Bailly G. Statistical conversion of silent articulation into audible speech using full-covariance HMM. COMPUT SPEECH LANG 2016. [DOI: 10.1016/j.csl.2015.03.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
22
|
|
23
|
Koda H, Tokuda IT, Wakita M, Ito T, Nishimura T. The source-filter theory of whistle-like calls in marmosets: Acoustic analysis and simulation of helium-modulated voices. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:3068-3076. [PMID: 26093398 DOI: 10.1121/1.4921607] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Whistle-like high-pitched "phee" calls are often used as long-distance vocal advertisements by small-bodied marmosets and tamarins in the dense forests of South America. While the source-filter theory proposes that vibration of the vocal fold is modified independently from the resonance of the supralaryngeal vocal tract (SVT) in human speech, a source-filter coupling that constrains the vibration frequency to SVT resonance effectively produces loud tonal sounds in some musical instruments. Here, a combined approach of acoustic analyses and simulation with helium-modulated voices was used to show that phee calls are produced principally with the same mechanism as in human speech. The animal keeps the fundamental frequency (f0) close to the first formant (F1) of the SVT, to amplify f0. Although f0 and F1 are primarily independent, the degree of their tuning can be strengthened further by a flexible source-filter interaction, the variable strength of which depends upon the cross-sectional area of the laryngeal cavity. The results highlight the evolutionary antiquity and universality of the source-filter model in primates, but the study can also explore the diversification of vocal physiology, including source-filter interaction and its anatomical basis in non-human primates.
Collapse
Affiliation(s)
- Hiroki Koda
- Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Isao T Tokuda
- Department of Mechanical Engineering, Ritsumeikan University, Kusatsu, Shiga 525-8577, Japan
| | - Masumi Wakita
- Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan
| | - Tsuyoshi Ito
- Department of Human Biology and Anatomy, Graduate School of Medicine, University of the Ryukyus, Nishihara, Okinawa 903-0215, Japan
| | - Takeshi Nishimura
- Primate Research Institute, Kyoto University, Inuyama, Aichi 484-8506, Japan
| |
Collapse
|
24
|
Fleischer M, Pinkert S, Mattheus W, Mainka A, Mürbe D. Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall. Biomech Model Mechanobiol 2014; 14:719-33. [PMID: 25416844 PMCID: PMC4490178 DOI: 10.1007/s10237-014-0632-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 11/05/2014] [Indexed: 11/08/2022]
Abstract
The acoustical properties of the vocal tract, the air-filled cavity between the vocal folds and the mouth opening, are determined by its individual geometry, the physical properties of the air and of its boundaries. In this article, we address the necessity of complex impedance boundary conditions at the mouth opening and at the border of the acoustical domain inside the human vocal tract. Using finite element models based on MRI data for spoken and sung vowels /a/, /i/ and // and comparison of the transfer characteristics by analysis of acoustical data using an inverse filtering method, the global wall impedance showed a frequency-dependent behaviour and depends on the produced vowel and therefore on the individual vocal tract geometry. The values of the normalised inertial component (represented by the imaginary part of the impedance) ranged from \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$250\,\hbox {g}/\hbox {m}^{2}$$\end{document}250g/m2 at frequencies higher than about 3 kHz up to about \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$2.5\times 10^{5}\,\hbox {g}/\hbox {m}^{2}$$\end{document}2.5×105g/m2 in the mid-frequency range around 1.5–3 kHz. In contrast, the normalised dissipation (represented by the real part of the impedance) ranged from \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$65$$\end{document}65 to \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$4.5\times 10^{5}\,\hbox {Ns}/\hbox {m}^{3}$$\end{document}4.5×105Ns/m3. These results indicate that structures enclosing the vocal tract (e.g. oral and pharyngeal mucosa and muscle tissues), especially their mechanical properties, influence the transfer of the acoustical energy and the position and bandwidth of the formant frequencies. It implies that the timbre characteristics of vowel sounds are likely to be tuned by specific control of relaxation and strain of the surrounding structures of the vocal tract.
Collapse
Affiliation(s)
- Mario Fleischer
- Department of Otorhinolaryngology, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany,
| | | | | | | | | |
Collapse
|
25
|
Takemoto H, Adachi S, Mokhtari P, Kitamura T. Acoustic interaction between the right and left piriform fossae in generating spectral dips. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2955-2964. [PMID: 24116431 DOI: 10.1121/1.4818744] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
It is known that the right and left piriform fossae generate two deep dips on speech spectra and that acoustic interaction exists in generating the dips: if only one piriform fossa is modified, both the dips change in frequency and amplitude. In the present study, using a simple geometrical model and measured vocal tract shapes, the acoustic interaction was examined by the finite-difference time-domain method. As a result, one of the two dips was lower in frequency than the two independent dips that appeared when either of the piriform fossae was occluded, and the other dip was higher in frequency than the two dips. At the lower dip frequency, the piriform fossae resonated almost in opposite phase, while at the higher dip frequency, they resonated almost in phase. These facts indicate that the piriform fossae and the lower part of the pharynx can be modeled as a coupled two-oscillator system whose two normal vibration modes generate the two spectral dips. When the piriform fossae were identical, only the higher dip appeared. This is because the lower mode is not acoustically coupled to the main vocal tract enough to generate an absorption dip.
Collapse
Affiliation(s)
- Hironori Takemoto
- National Institute of Information and Communications Technology, 3-5, Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0289, Japan
| | | | | | | |
Collapse
|
26
|
Story BH. Phrase-level speech simulation with an airway modulation model of speech production. COMPUT SPEECH LANG 2013; 27:989-1010. [PMID: 23503742 DOI: 10.1016/j.csl.2012.10.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The development of an airway modulation model is described that simulates the time-varying changes of the glottis and vocal tract, as well as acoustic wave propagation, during speech production. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener. The primary components of the model are introduced and simulation of words and phrases are demonstrated.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Dept. of Speech, Language, and Hearing Sciences, University of Arizona, 1131 E. 2nd St., P.O. Box 210071, Tucson, AZ, 85721, United States
| |
Collapse
|
27
|
Koda H, Nishimura T, Tokuda IT, Oyakawa C, Nihonmatsu T, Masataka N. Soprano singing in gibbons. AMERICAN JOURNAL OF PHYSICAL ANTHROPOLOGY 2012; 149:347-55. [DOI: 10.1002/ajpa.22124] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2012] [Accepted: 07/05/2012] [Indexed: 11/05/2022]
|
28
|
Samlan RA, Story BH. Relation of structural and vibratory kinematics of the vocal folds to two acoustic measures of breathy voice based on computational modeling. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2011; 54:1267-83. [PMID: 21498582 PMCID: PMC3184371 DOI: 10.1044/1092-4388(2011/10-0195)] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
PURPOSE To relate vocal fold structure and kinematics to 2 acoustic measures: cepstral peak prominence (CPP) and the amplitude of the first harmonic relative to the second (H1-H2). METHOD The authors used a computational, kinematic model of the medial surfaces of the vocal folds to specify features of vocal fold structure and vibration in a manner consistent with breathy voice. Four model parameters were altered: degree of vocal fold adduction, surface bulging, vibratory nodal point, and supraglottal constriction. CPP and H1-H2 were measured from simulated glottal area, glottal flow, and acoustic waveforms and were related to the underlying vocal fold kinematics. RESULTS CPP decreased with increased separation of the vocal processes, whereas the nodal point location had little effect. H1-H2 increased as a function of separation of the vocal processes in the range of 1.0 mm to 1.5 mm and decreased with separation > 1.5 mm. CONCLUSIONS CPP is generally a function of vocal process separation. H1*-H2* (see paragraph 6 of article text for an explanation of the asterisks) will increase or decrease with vocal process separation on the basis of vocal fold shape, pivot point for the rotational mode, and supraglottal vocal tract shape, limiting its utility as an indicator of breathy voice. Future work will relate the perception of breathiness to vocal fold kinematics and acoustic measures.
Collapse
Affiliation(s)
- Robin A Samlan
- Speech Acoustics Laboratory, University of Arizona, Tucson, USA.
| | | |
Collapse
|
29
|
Turicchia L, Sarpeshkar R. An articulatory silicon vocal tract for speech and hearing prostheses. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2011; 5:339-346. [PMID: 23851948 DOI: 10.1109/tbcas.2011.2159858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We describe the concept of a bioinspired feedback loop that combines a cochlear processor with an integrated-circuit vocal tract to create what we call a speech-locked loop. We discuss how the speech-locked loop can be applied in hearing prostheses, such as cochlear implants, to help improve speech recognition in noise. We also investigate speech-coding strategies for brain-machine-interface-based speech prostheses and present an articulatory speech-synthesis system by using an integrated-circuit vocal tract that models the human vocal tract. Our articulatory silicon vocal tract makes the transmission of low bit-rate speech-coding parameters feasible over a bandwidth-constrained body sensor network. To the best of our knowledge, this is the first articulatory speech-prosthesis system reported to date. We also present a speech-prosthesis simulator as a means to generate realistic articulatory parameter sequences.
Collapse
|
30
|
Kaburagi T, Yamada N, Fukui T, Minamiya E. A methodological and preliminary study on the acoustic effect of a trumpet player's vocal tract. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:536-545. [PMID: 21786919 DOI: 10.1121/1.3596471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
A methodological study is presented to examine the acoustic role of the vocal tract in playing the trumpet. Preliminary results obtained for one professional player are also shown to demonstrate the effectiveness of the method. Images of the vocal tract with a resolution of 0.5 mm (2 mm in thickness) were recorded with magnetic resonance imaging to observe the tongue posture and estimate the vocal-tract area function during actual performance. The input impedance was then calculated for the player's air column including both the supra- and subglottal tracts using an acoustic tube model including the effect of wall losses. Finally, a time-domain blowing simulation by Adachi and Sato [J. Acoust. Soc. Am. 99, 1200-1209 (1996)] was performed with a model of the lips. In this simulation, the oscillating frequency of the lips was slightly affected by using different shapes of the vocal tract measured for the player. In particular, when the natural frequency of the lips was gradually increased, the transition to the higher mode occurred at different frequencies for different vocal-tract shapes. Furthermore, simulation results showed that the minimum blowing pressure required to attain the lip oscillation can be reduced by adjusting the vocal-tract shape properly.
Collapse
Affiliation(s)
- Tokihiko Kaburagi
- Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Fukuoka, 815-8540 Japan.
| | | | | | | |
Collapse
|
31
|
Panchapagesan S, Alwan A. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:2144-2162. [PMID: 21476670 PMCID: PMC3188964 DOI: 10.1121/1.3514544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Revised: 10/05/2010] [Accepted: 10/19/2010] [Indexed: 05/30/2023]
Abstract
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.
Collapse
Affiliation(s)
- Sankaran Panchapagesan
- Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA.
| | | |
Collapse
|
32
|
Kaburagi T. Voice production model integrating boundary-layer analysis of glottal flow and source-filter coupling. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1554-1567. [PMID: 21428519 DOI: 10.1121/1.3533732] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A voice production model is created in this work by considering essential aerodynamic and acoustic phenomena in human voice production. A precise flow analysis is performed based on a boundary-layer approximation and the viscous-inviscid interaction between the boundary layer and the core flow. This flow analysis can supply information on the separation point of the glottal flow and the thickness of the boundary layer, both of which strongly depend on the glottal configuration and yield an effective prediction of the flow behavior. When the flow analysis is combined with the modified two-mass model of the vocal fold [Pelorson et al. (1994). J. Acoust. Soc. Am. 96, 3416-3431], the resulting acoustic wave travels through the vocal tract and a pressure change develops in the vicinity of the glottis. This change can affect the glottal flow and the motion of the vocal folds, causing source-filter coupling. The property of the acoustic feedback is explicitly expressed in the frequency domain by using an acoustic tube model, allowing a clear interpretation of the coupling. Numerical experiments show that the vocal-tract input impedance and frequency responses representing the source-filter coupling have dominant peaks corresponding to the fourth and fifth formants. Results of time-domain simulations also suggest the importance of these high-frequency peaks in voice production.
Collapse
Affiliation(s)
- Tokihiko Kaburagi
- Faculty of Design, Department of Communication Design Science, Kyushu University, 4-9-1 Shiobaru, Fukuoka 815-8540, Japan.
| |
Collapse
|
33
|
Ho JC, Zañartu M, Wodicka GR. An anatomically based, time-domain acoustic model of the subglottal system for speech production. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:1531-47. [PMID: 21428517 DOI: 10.1121/1.3543971] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A time-domain model of sound wave propagation in the branching airways of the subglottal system is presented. The model is formulated as an extension to an acoustic transmission-line modeling scheme originally developed for simulating the supraglottal system in the time-domain during speech production [Maeda (1982). Speech Commun. 1, 199-229; Mokhtari et al. (2008). Speech Commun. 50, 179-190]. The approach allows for predictions of time-varying acoustic pressure and volume velocity at any point along the various generations of subglottal airways from trachea to alveoli. In addition, the model can be configured so that its overall structure simulates different geometric forms, including airways that branch in a symmetric or asymmetric pattern. Three subglottal configurations, two symmetric and one asymmetric, were represented based on reported anatomical dimensions of the subglottal airways. Estimates of the acoustic input impedances of these subglottal configurations revealed resonant characteristics similar to those found in the previous studies. Simulations of voiced sound propagation into the subglottal airways, achieved by coupling the subglottal model to a two-mass vocal fold model and a supraglottal tract configured for different vowels, yielded predictions of time-domain sound pressure waveforms below the vocal folds that compare favorably to previous measurements in human subjects.
Collapse
Affiliation(s)
- Julio C Ho
- Weldon School of Biomedical Engineering, Purdue University, 206 South Martin Jischke Drive, West Lafayette, Indiana 47907, USA.
| | | | | |
Collapse
|
34
|
Inohara K, Sumita YI, Ohbayashi N, Ino S, Kurabayashi T, Ifukube T, Taniguchi H. Standardization of Thresholding for Binary Conversion of Vocal Tract Modeling in Computed Tomography. J Voice 2010; 24:503-9. [DOI: 10.1016/j.jvoice.2008.10.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Accepted: 10/31/2008] [Indexed: 10/20/2022]
|
35
|
Titze IR, Worley AS. Modeling source-filter interaction in belting and high-pitched operatic male singing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:1530. [PMID: 19739766 PMCID: PMC2757425 DOI: 10.1121/1.3160296] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Nonlinear source-filter theory is applied to explain some acoustic differences between two contrasting male singing productions at high pitches: operatic style versus jazz belt or theater belt. Several stylized vocal tract shapes (caricatures) are discussed that form the bases of these styles. It is hypothesized that operatic singing uses vowels that are modified toward an inverted megaphone mouth shape for transitioning into the high-pitch range. This allows all the harmonics except the fundamental to be "lifted" over the first formant. Belting, on the other hand, uses vowels that are consistently modified toward the megaphone (trumpet-like) mouth shape. Both the fundamental and the second harmonic are then kept below the first formant. The vocal tract shapes provide collective reinforcement to multiple harmonics in the form of inertive supraglottal reactance and compliant subglottal reactance. Examples of lip openings from four well-known artists are used to infer vocal tract area functions and the corresponding reactances.
Collapse
Affiliation(s)
- Ingo R Titze
- National Center for Voice and Speech, The Denver Center for the Performing Arts, Denver, CO 80204, USA
| | | |
Collapse
|
36
|
|
37
|
Story BH. Vocal tract modes based on multiple area function sets from one speaker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:EL141-EL147. [PMID: 19354352 PMCID: PMC2677261 DOI: 10.1121/1.3082263] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2008] [Revised: 01/02/2009] [Accepted: 01/08/2009] [Indexed: 05/26/2023]
Abstract
The purpose of this study was to derive vocal tract modes from a wider range of vowel area functions for a specific speaker than has been previously reported. Area functions from Story et al. [(1996). J. Acoust. Soc. Am. 100, 537-554] and Story [(2008). J. Acoust. Soc. Am. 123, 327-335] were combined in a composite set from which modes were derived with principal component analysis. Along with scaling coefficients, these modes were used to generate a [F1, F2] formant space. In comparison to formant spaces similarly generated based on the two area function sets alone, the combined version provides a wider range of both F1 and F2 values. This new set of modes may be useful for inverse mapping of formant frequencies to area functions or for modeling of vocal tract shape changes.
Collapse
Affiliation(s)
- Brad H Story
- Department of Speech, Language, and Hearing Sciences, Speech Acoustics Laboratory, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
38
|
Turicchia L, Sarpeshkar R. An analog integrated-circuit vocal tract. IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS 2008; 2:316-327. [PMID: 23853134 DOI: 10.1109/tbcas.2008.2005296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
We present the first experimental integrated-circuit vocal tract by mapping fluid volume velocity to current, fluid pressure to voltage, and linear and nonlinear mechanical impedances to linear and nonlinear electrical impedances. The 275 muW analog vocal tract chip includes a 16-stage cascade of two-port pi-elements that forms a tunable transmission line, electronically variable impedances, and a current source as the glottal source. A nonlinear resistor models laminar and turbulent flow in the vocal tract. The measured SNR at the output of the analog vocal tract is 64, 66, and 63 dB for the first three formant resonances of a vocal tract with uniform cross-sectional area. The analog vocal tract can be used with auditory processors in a feedback speech locked loop-analogous to a phase locked loop-to implement speech recognition that is potentially robust in noise. Our use of a physiological model of the human vocal tract enables the analog vocal tract chip to synthesize speech signals of interest, using articulatory parameters that are intrinsically compact and linearly interpolatable.
Collapse
|
39
|
Riede T, Tokuda IT, Munger JB, Thomson SL. Mammalian laryngseal air sacs add variability to the vocal tract impedance: physical and computational modeling. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:634-47. [PMID: 18647005 PMCID: PMC2677336 DOI: 10.1121/1.2924125] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Cavities branching off the main vocal tract are ubiquitous in nonhumans. Mammalian air sacs exist in human relatives, including all four great apes, but only a substantially reduced version exists in humans. The present paper focuses on acoustical functions of the air sacs. The hypotheses are investigated on whether the air sacs affect amplitude of utterances and/or position of formants. A multilayer synthetic model of the vocal folds coupled with a vocal tract model was utilized. As an air sac model, four configurations were considered: open and closed uniform tube-like side branches, a rigid cavity, and an inflatable cavity. Results suggest that some air sac configurations can enhance the sound level. Furthermore, an air sac model introduces one or more additional resonance frequencies, shifting formants of the main vocal tract to some extent but not as strongly as previously suggested. In addition, dynamic range of vocalization can be extended by the air sacs. A new finding is also an increased variability of the vocal tract impedance, leading to strong nonlinear source-filter interaction effects. The experiments demonstrated that air-sac-like structures can destabilize the sound source. The results were validated by a transmission line computational model.
Collapse
Affiliation(s)
- Tobias Riede
- Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi-shi, Ishikawa 923-1292, Japan.
| | | | | | | |
Collapse
|
40
|
Titze IR. Nonlinear source-filter coupling in phonation: theory. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:2733-49. [PMID: 18529191 PMCID: PMC2811547 DOI: 10.1121/1.2832337] [Citation(s) in RCA: 225] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
A theory of interaction between the source of sound in phonation and the vocal tract filter is developed. The degree of interaction is controlled by the cross-sectional area of the laryngeal vestibule (epilarynx tube), which raises the inertive reactance of the supraglottal vocal tract. Both subglottal and supraglottal reactances can enhance the driving pressures of the vocal folds and the glottal flow, thereby increasing the energy level at the source. The theory predicts that instabilities in vibration modes may occur when harmonics pass through formants during pitch or vowel changes. Unlike in most musical instruments (e.g., woodwinds and brasses), a stable harmonic source spectrum is not obtained by tuning harmonics to vocal tract resonances, but rather by placing harmonics into favorable reactance regions. This allows for positive reinforcement of the harmonics by supraglottal inertive reactance (and to a lesser degree by subglottal compliant reactance) without the risk of instability. The traditional linear source-filter theory is encumbered with possible inconsistencies in the glottal flow spectrum, which is shown to be influenced by interaction. In addition, the linear theory does not predict bifurcations in the dynamical behavior of vocal fold vibration due to acoustic loading by the vocal tract.
Collapse
Affiliation(s)
- Ingo R Titze
- Department of Speech Pathology and Audiology, The University of Iowa, Iowa City, Iowa 52242, USA.
| |
Collapse
|
41
|
Story BH. Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 123:327-35. [PMID: 18177162 PMCID: PMC2377017 DOI: 10.1121/1.2805683] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
A new set of area functions for vowels has been obtained with magnetic resonance imaging from the same speaker as that previously reported in 1996 [Story et al., J. Acoust. Soc. Am. 100, 537-554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on magnetic resonance images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intraspeaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
42
|
Story BH. A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 122:EL107-14. [PMID: 17902738 PMCID: PMC2278006 DOI: 10.1121/1.2771369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
43
|
Story BH. Time dependence of vocal tract modes during production of vowels and vowel sequences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:3770-89. [PMID: 17552726 PMCID: PMC2310171 DOI: 10.1121/1.2730621] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
44
|
Birkholz P, Jackel D, Kroger BJ. Simulation of Losses Due to Turbulence in the Time-Varying Vocal System. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/tasl.2006.889731] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
45
|
Mullen J, Howard DM, Murphy DT. Real-Time Dynamic Articulations in the 2-D Waveguide Mesh Vocal Tract Model. ACTA ACUST UNITED AC 2007. [DOI: 10.1109/tasl.2006.876751] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
46
|
Kaburagi T, Kim J. Generation of the vocal tract spectrum from the underlying articulatory mechanism. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:456-68. [PMID: 17297800 DOI: 10.1121/1.2384847] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics.
Collapse
Affiliation(s)
- Tokihiko Kaburagi
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka, 815-8540 Japan
| | | |
Collapse
|
47
|
Pincas J, Jackson PJB. Amplitude modulation of turbulence noise by voicing in fricatives. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:3966-77. [PMID: 17225423 DOI: 10.1121/1.2358004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The two principal sources of sound in speech, voicing and frication, occur simultaneously in voiced fricatives as well as at the vowel-fricative boundary in phonologically voiceless fricatives. Instead of simply overlapping, the two sources interact. This paper is an acoustic study of one such interaction effect: the amplitude modulation of the frication component when voicing is present. Corpora of sustained and fluent-speech English fricatives were recorded and analyzed using a signal-processing technique designed to extract estimates of modulation depth. Results reveal a pattern, consistent across speaking style, speaker, and place of articulation, for modulation at fo to rise at low voicing strengths and subsequently saturate. Voicing strength needed to produce saturation varied 60-66 dB across subjects and experimental conditions. Modulation depths at saturation varied little across speakers but significantly for place of articulation (with [z] showing particularly strong modulation) clustering at approximately 0.4-0.5 (a 40%-50% fluctuation above and below unmodulated amplitude); spectral analysis of modulating signals revealed weak but detectable modulation at the second and third harmonics (i.e., 2fo and 3fo).
Collapse
Affiliation(s)
- Jonathan Pincas
- Centre for Vision, Speech & Signal Processing, University of Surrey, Guildford, GU2 7XH, United Kingdom.
| | | |
Collapse
|
48
|
Mullen J, Howard D, Murphy D. Waveguide physical modeling of vocal tract acoustics: flexible formant bandwidth control from increased model dimensionality. ACTA ACUST UNITED AC 2006. [DOI: 10.1109/tsa.2005.858052] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
49
|
Story BH. Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:715-8. [PMID: 16521730 DOI: 10.1121/1.2151802] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
A technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
50
|
Story BH. Synergistic modes of vocal tract articulation for American English vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:3834-59. [PMID: 16419828 DOI: 10.1121/1.2118367] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|