1
|
Perrine BL, Scherer RC. Using a vertical three-mass computational model of the vocal folds to match human phonation of three adult males. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:1505-1525. [PMID: 37695295 PMCID: PMC10497319 DOI: 10.1121/10.0020847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 08/10/2023] [Accepted: 08/19/2023] [Indexed: 09/12/2023]
Abstract
Computer models of phonation are used to study various parameters that are difficult to control, measure, and observe in human subjects. Imitating human phonation by varying the prephonatory conditions of computer models offers insight into the variations that occur across human phonatory production. In the present study, a vertical three-mass computer model of phonation [Perrine, Scherer, Fulcher, and Zhai (2020). J. Acoust. Soc. Am. 147, 1727-1737], driven by empirical pressures from a physical model of the vocal folds (model M5), with a vocal tract following the design of Ishizaka and Flanagan [(1972). Bell Sys. Tech. J. 51, 1233-1268] was used to match prolonged vowels produced by three male subjects using various pitch and loudness levels. The prephonatory conditions of tissue mass and tension, subglottal pressure, glottal diameter and angle, posterior glottal gap, false vocal fold gap, and vocal tract cross-sectional areas were varied in the model to match the model output with the fundamental frequency, alternating current airflow, direct current airflow, skewing quotient, open quotient, maximum flow negative derivative, and the first three formant frequencies from the human production. Parameters were matched between the model and human subjects with an average overall percent mismatch of 4.40% (standard deviation = 6.75%), suggesting a reasonable ability of the simple low dimensional model to mimic these variables.
Collapse
Affiliation(s)
- Brittany L Perrine
- Department of Communication Sciences and Disorders, Baylor University, One Bear Place #97332, Waco, Texas 76798, USA
| | - Ronald C Scherer
- Department of Communication Sciences and Disorders, Bowling Green State University, Ridge Street, Bowling Green, Ohio 43403, USA
| |
Collapse
|
2
|
Gorman E, Kirkham S. Dynamic acoustic-articulatory relations in back vowel fronting: Examining the effects of coda consonants in two dialects of British English. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:724. [PMID: 32872991 DOI: 10.1121/10.0001721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 07/23/2020] [Indexed: 06/11/2023]
Abstract
This study examines dynamic acoustic-articulatory relations in back vowels, focusing on the effect of different coda consonants on acoustic-articulatory dynamics in the production of vowel contrast. This paper specifically investigates the contribution of the tongue and the lips in modifying F2 in the foot-goose contrast in English, using synchronized acoustic and electromagnetic articulography data collected from 16 speakers. The vowels foot and goose were elicited in pre-coronal and pre-lateral contexts from two dialects that are reported to be at different stages of back vowel fronting: Southern Standard British English and West Yorkshire English. The results suggest similar acoustic and articulatory patterns in pre-coronal vowels, but there is stronger evidence of vowel contrast in articulation than acoustics for pre-lateral vowels. The lip protrusion data do not help to resolve these differences, suggesting that the complex gestural makeup of a vowel-lateral sequence problematizes straightforward accounts of acoustic-articulatory relations. Further analysis reveals greater between-speaker variability in lingual advancement than F2 in pre-lateral vowels.
Collapse
Affiliation(s)
- Emily Gorman
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster, LA1 4YL, United Kingdom
| | - Sam Kirkham
- Department of Linguistics and English Language, Lancaster University, County South, Lancaster, LA1 4YL, United Kingdom
| |
Collapse
|
3
|
Abstract
This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conducted on three different types of Polish emotional speech: acted out with the same content, acted out with different content, and spontaneous. A pool of descriptors, commonly utilized for emotional speech recognition, expanded with sets of various perceptual coefficients, is used as input features. This research shows that presented approach improve the performance with respect to a single classifier.
Collapse
|
4
|
Ogata K, Kodama T, Hayakawa T, Aoki R. Inverse estimation of the vocal tract shape based on a vocal tract mapping interface. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1961. [PMID: 31046355 DOI: 10.1121/1.5095409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Accepted: 03/08/2019] [Indexed: 06/09/2023]
Abstract
This paper describes the inverse estimation of the vocal tract shape for vowels by using a vocal tract mapping interface. In prior research, an interface capable of generating a vocal tract shape by clicking on its window was developed. The vocal tract shapes for five vowels are located at the vertices of a pentagonal chart and a different shape that corresponds to an arbitrary mouse-pointer position on the interface window is calculated by interpolation. In this study, an attempt was made to apply the interface to the inverse estimation of vocal tract shapes based on formant frequencies. A target formant frequency data set was searched based on the geometry of the interface window by using a coarse to fine algorithm. It was revealed that the estimated vocal tract shapes obtained from the mapping interface were close to those from magnetic resonance imaging data in another study and to lip area data captured using video recordings. The results of experiments to evaluate the estimated vocal tract shapes showed that each subject demonstrated unique trajectories on the interface window corresponding to the estimated vocal tract shapes. These results suggest the usefulness of inverse estimation using the interface.
Collapse
Affiliation(s)
- Kohichi Ogata
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Tayuto Kodama
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Tomohiro Hayakawa
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| | - Riku Aoki
- Graduate School of Science and Technology, Kumamoto University, 2-39-1 Kurokami, Chuo-ku, Kumamoto, 860-8555, Japan
| |
Collapse
|
5
|
Lammert AC, Narayanan SS. On Short-Time Estimation of Vocal Tract Length from Formant Frequencies. PLoS One 2015; 10:e0132193. [PMID: 26177102 PMCID: PMC4503663 DOI: 10.1371/journal.pone.0132193] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2014] [Accepted: 06/10/2015] [Indexed: 11/19/2022] Open
Abstract
Vocal tract length is highly variable across speakers and determines many aspects of the acoustic speech signal, making it an essential parameter to consider for explaining behavioral variability. A method for accurate estimation of vocal tract length from formant frequencies would afford normalization of interspeaker variability and facilitate acoustic comparisons across speakers. A framework for considering estimation methods is developed from the basic principles of vocal tract acoustics, and an estimation method is proposed that follows naturally from this framework. The proposed method is evaluated using acoustic characteristics of simulated vocal tracts ranging from 14 to 19 cm in length, as well as real-time magnetic resonance imaging data with synchronous audio from five speakers whose vocal tracts range from 14.5 to 18.0 cm in length. Evaluations show improvements in accuracy over previously proposed methods, with 0.631 and 1.277 cm root mean square error on simulated and human speech data, respectively. Empirical results show that the effectiveness of the proposed method is based on emphasizing higher formant frequencies, which seem less affected by speech articulation. Theoretical predictions of formant sensitivity reinforce this empirical finding. Moreover, theoretical insights are explained regarding the reason for differences in formant sensitivity.
Collapse
Affiliation(s)
- Adam C. Lammert
- Computer Science Department, Swarthmore College, Swarthmore, PA, United States of America
- * E-mail:
| | - Shrikanth S. Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, CA, United States of America
| |
Collapse
|
6
|
Story BH. Structure, Movement, Sound, and Perception. PERSPECTIVES ON SPEECH SCIENCE AND OROFACIAL DISORDERS 2014; 24:7-20. [PMID: 25383138 PMCID: PMC4222052 DOI: 10.1044/ssod24.1.7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Models that take the form of artificial talkers and speech synthesis systems have long been used as a means of understanding both speech production and speech perception. The article begins with a brief history of two artificial speaking devices that exemplify the representation of speech production as a system of modulations. The development of a recent airway modulation model is then described that simulates the time-varying changes of the vocal tract and acoustic wave propagation. The result is a type of artificial talker that can be used to study various aspects of how sound is generated by humans and how that sound is perceived by a listener.
Collapse
Affiliation(s)
- Brad H Story
- Speech, Language, and Hearing Sciences, University of Arizona
| |
Collapse
|
7
|
Gopinath B, Sondhi MM. Determination of the Shape of the Human Vocal Tract from Acoustical Measurements. ACTA ACUST UNITED AC 2013. [DOI: 10.1002/j.1538-7305.1970.tb01820.x] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
8
|
Sahrawat R, Robb MP, Kirk R, Beckert L. Effects of inhaled corticosteroids on voice production in healthy adults. LOGOP PHONIATR VOCO 2013; 39:108-16. [PMID: 23570418 DOI: 10.3109/14015439.2013.777110] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The isolated effects of inhaled corticosteroids (ICS) on voice production were examined in 30 healthy adults with no known pre-existing airway disease. All participants followed a daily ICS treatment regime of 500 μg in the morning and evening over a 6-day period. Sustained vowels and connected speech samples were audio recorded before, during, and after the ICS regime. Each participant's audio recorded samples were acoustically analysed. Results revealed that ICS has a short-term detrimental effect on various acoustic properties of voice. These effects were more evident in connected speech compared to isolated vowel productions. All acoustic parameters returned to normalcy after discontinuing the ICS. The study provides insight as to the influence of ICS on healthy voice production.
Collapse
Affiliation(s)
- Ramesh Sahrawat
- University of Canterbury, Health Sciences Centre , Private Bag 4800, Christchurch, 8140 New Zealand
| | | | | | | |
Collapse
|
9
|
Rasetshwane DM, Neely ST, Allen JB, Shera CA. Reflectance of acoustic horns and solution of the inverse problem. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1863-73. [PMID: 22423684 PMCID: PMC3316681 DOI: 10.1121/1.3681923] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Revised: 12/19/2011] [Accepted: 12/28/2011] [Indexed: 05/26/2023]
Abstract
A method is described for solving the inverse problem of determining the profile of an acoustic horn when time-domain reflectance (TDR) is known only at the entrance. The method involves recasting Webster's horn equation in terms of forward and backward propagating wave variables. An essential feature of this method is a requirement that the backward propagating wave be continuous at the wave-front at all locations beyond the entrance. Derivation of the inverse solution raises questions about the meaning of causality in the context of wave propagation in non-uniform tubes. Exact reflectance expressions are presented for infinite exponential, conical and parabolic horns based on exact solutions of the horn equation. Diameter functions obtained with the inverse solution are a good match to all three horn profiles.
Collapse
Affiliation(s)
- Daniel M Rasetshwane
- Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131, USA.
| | | | | | | |
Collapse
|
10
|
Rasetshwane DM, Neely ST. Inverse solution of ear-canal area function from reflectance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:3873-81. [PMID: 22225043 PMCID: PMC3253594 DOI: 10.1121/1.3654019] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
A number of acoustical applications require the transformation of acoustical quantities, such as impedance and pressure that are measured at the entrance of the ear canal, to quantities at the eardrum. This transformation often requires knowledge of the shape of the ear canal. Previous attempts to measure ear-canal area functions were either invasive, non-reproducible, or could only measure the area function up to a point mid-way along the canal. A method to determine the area function of the ear canal from measurements of acoustic impedance at the entrance of the ear canal is described. The method is based on a solution to the inverse problem in which measurements of impedance are used to calculate reflectance, which is then used to determine the area function of the canal. The mean ear-canal area function determined using this method is similar to mean ear-canal area functions measured by other researchers using different techniques. The advantage of the proposed method over previous methods is that it is non- invasive, fast, and reproducible.
Collapse
Affiliation(s)
- Daniel M Rasetshwane
- Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska 68131, USA.
| | | |
Collapse
|
11
|
Panchapagesan S, Alwan A. A study of acoustic-to-articulatory inversion of speech by analysis-by-synthesis using chain matrices and the Maeda articulatory model. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:2144-2162. [PMID: 21476670 PMCID: PMC3188964 DOI: 10.1121/1.3514544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Revised: 10/05/2010] [Accepted: 10/19/2010] [Indexed: 05/30/2023]
Abstract
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants.
Collapse
Affiliation(s)
- Sankaran Panchapagesan
- Department of Electrical Engineering, University of California, Los Angeles, California 90095, USA.
| | | |
Collapse
|
12
|
Honorof DN, Weihing J, Fowler CA. Articulatory events are imitated under rapid shadowing. JOURNAL OF PHONETICS 2011; 39:18-38. [PMID: 23418398 PMCID: PMC3571117 DOI: 10.1016/j.wocn.2010.10.007] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
We tested the hypothesis that rapid shadowers imitate the articulatory gestures that structure acoustic speech signals-not just acoustic patterns in the signals themselves-overcoming highly practiced motor routines and phonological conditioning in the process. In a first experiment, acoustic evidence indicated that participants reproduced allophonic differences between American English /l/ types (light and dark) in the absence of the positional variation cues more typically present with lateral allophony. However, imitative effects were small. In a second experiment, varieties of /l/ with exaggerated light/dark differences were presented by ear. Acoustic measures indicated that all participants reproduced differences between /l/ types; larger average imitative effects obtained. Finally, we examined evidence for imitation in articulation. Participants ranged in behavior from one who did not imitate to another who reproduced distinctions among light laterals, dark laterals and /w/, but displayed a slight but inconsistent tendency toward enhancing imitation of lingual gestures through a slight lip protrusion. Overall, results indicated that most rapid shadowers need not substitute familiar allophones as they imitate reorganized gestural constellations even in the absence of explicit instruction to imitate, but that the extent of the imitation is small. Implications for theories of speech perception are discussed.
Collapse
Affiliation(s)
- Douglas N. Honorof
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
| | - Jeffrey Weihing
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
- Department of Communication Sciences, University of Connecticut, Storrs, CT 06269, USA
| | - Carol A. Fowler
- Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, USA
- Department of Psychology, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
13
|
Abstract
The area function of the vocal tract in all of its spatial detail is not directly computable from the speech signal. But is partial, yet phonetically distinctive, information about articulation recoverable from the acoustic signal that arrives at the listener's ear? The answer to this question is important for phonetics, because various theories of speech perception predict different answers. Some theories assume that recovery of articulatory information must be possible, while others assume that it is impossible. However, neither type of theory provides firm evidence showing that distinctive articulatory information is or is not extractable from the acoustic signal. The present study focuses on vowel gestures and examines whether linguistically significant information, such as the constriction location, constriction degree, and rounding, is contained in the speech signal, and whether such information is recoverable from formant parameters. Perturbation theory and linear prediction were combined, in a manner similar to that in Mokhtari (1998) [Mokhtari, P. (1998). An acoustic-phonetic and articulatory study of speech-speaker dichotomy. Doctoral dissertation, University of New South Wales], to assess the accuracy of recovery of information about vowel constrictions. Distinctive constriction information estimated from the speech signal for ten American English vowels were compared to the constriction information derived from simultaneously collected X-ray microbeam articulatory data for 39 speakers [Westbury (1994). Xray microbeam speech production database user's handbook. University of Wisconsin, Madison, WI]. The recovery of distinctive articulatory information relies on a novel technique that uses formant frequencies and amplitudes, and does not depend on a principal components analysis of the articulatory data, as do most other inversion techniques. These results provide evidence that distinctive articulatory information for vowels can be recovered from the acoustic signal.
Collapse
|
14
|
McGowan RS, Berger MA. Acoustic-articulatory mapping in vowels by locally weighted regression. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:2011-2032. [PMID: 19813812 PMCID: PMC2771059 DOI: 10.1121/1.3184581] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Revised: 05/05/2009] [Accepted: 06/30/2009] [Indexed: 05/28/2023]
Abstract
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829-836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented.
Collapse
|
15
|
Story BH. A comparison of vocal tract perturbation patterns based on statistical and acoustic considerations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 122:EL107-14. [PMID: 17902738 PMCID: PMC2278006 DOI: 10.1121/1.2771369] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The purpose of this study was to investigate the relation between vocal tract deformation patterns obtained from statistical analyses of a set of area functions representative of a vowel repertoire, and the acoustic properties of a neutral vocal tract shape. Acoustic sensitivity functions were calculated for a mean area function based on seven different speakers. Specific linear combinations of the sensitivity functions corresponding to the first two formant frequencies were shown to possess essentially the same amplitude variation along the vocal tract length as the statistically derived deformation patterns reported in previous studies.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech, Language, and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA
| |
Collapse
|
16
|
Marshall DC, Lee JD, Austria RA. Alerts for in-vehicle information systems: annoyance, urgency, and appropriateness. HUMAN FACTORS 2007; 49:145-57. [PMID: 17315851 DOI: 10.1518/001872007779598145] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
OBJECTIVE This study assesses the influence of the auditory characteristics of alerts on perceived urgency and annoyance and whether these perceptions depend on the context in which the alert is received. BACKGROUND Alert parameters systematically affect perceived urgency, and mapping the urgency of a situation to the perceived urgency of an alert is a useful design consideration. Annoyance associated with environmental noise has been thoroughly studied, but little research has addressed whether alert parameters differentially affect annoyance and urgency. METHOD Three 2(3) x 3 mixed within/between factorial experiments, with a total of 72 participants, investigated nine alert parameters in three driving contexts. These parameters were formant (similar to harmonic series), pulse duration, interpulse interval, alert onset and offset, burst duty cycle, alert duty cycle, interburst period, and sound type. Imagined collision warning, navigation alert, and E-mail notification scenarios defined the driving context. RESULTS All parameters influenced both perceived urgency and annoyance (p < .05), with pulse duration, interpulse interval, alert duty cycle, and sound type influencing urgency substantially more than annoyance. There was strong relationship between perceived urgency and rated appropriateness for high-urgency driving scenarios and a strong relationship between annoyance and rated appropriateness for low-urgency driving scenarios. CONCLUSION Sound parameters differentially affect annoyance and urgency. Also, urgency and annoyance differentially affect perceived appropriateness of warnings. APPLICATION Annoyance may merit as much attention as urgency in the design of auditory warnings, particularly in systems that alert drivers to relatively low-urgency situations.
Collapse
|
17
|
Story BH. Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:715-8. [PMID: 16521730 DOI: 10.1121/1.2151802] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/07/2023]
Abstract
A technique for modifying vocal tract area functions is developed by using sum and difference combinations of acoustic sensitivity functions to perturb an initial vocal tract configuration. First, sensitivity functions [e.g., Fant and Pauli, Proc. Speech Comm. Sem. 74, 1975] are calculated for a given area function, at its specific formant frequencies. The sensitivity functions are then multiplied by scaling coefficients that are determined from the difference between a desired set of formant frequencies and those supported by the current area function. The scaled sensitivity functions are then summed together to generate a perturbation of the area function. This produces a new area function whose associated formant frequencies are closer to the desired values than the previous one. This process is repeated iteratively until the coefficients are equal to zero or are below a threshold value.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
18
|
Forbes BJ, Pike ER, Sharp DB, Aktosun T. Inverse potential scattering in duct acoustics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:65-73. [PMID: 16454265 DOI: 10.1121/1.2139618] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
The inverse problem of the noninvasive measurement of the shape of an acoustical duct in which one-dimensional wave propagation can be assumed is examined within the theoretical framework of the governing Klein-Gordon equation. Previous deterministic methods developed over the last 40 years have all required direct measurement of the reflectance or input impedance but now, by application of the methods of inverse quantum scattering to the acoustical system, it is shown that the reflectance can be algorithmically derived from the radiated wave. The potential and area functions of the duct can subsequently be reconstructed. The results are discussed with particular reference to acoustic pulse reflectometry.
Collapse
Affiliation(s)
- Barbara J Forbes
- Phonologica Ltd., PO. Box 43925, London NW2 1DJ, United Kingdom.
| | | | | | | |
Collapse
|
19
|
Story BH. Synergistic modes of vocal tract articulation for American English vowels. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2005; 118:3834-59. [PMID: 16419828 DOI: 10.1121/1.2118367] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
The purpose of this study was to investigate the spatial similarity of vocal tract shaping patterns across speakers and the similarity of their acoustic effects. Vocal tract area functions for 11 American English vowels were obtained from six speakers, three female and three male, using magnetic resonance imaging (MRI). Each speaker's set of area functions was then decomposed into mean area vectors and representative modes (eigenvectors) using principal components analysis (PCA). Three modes accounted for more than 90% of the variance in the original data sets for each speaker. The general shapes of the first two modes were found to be highly correlated across all six speakers. To demonstrate the acoustic effects of each mode, both in isolation and combined, a mapping between the mode scaling coefficients and [F1, F2] pairs was generated for each speaker. The mappings were unique for all six speakers in terms of the exact shape of the [F1, F2] vowel space, but the general effect of the modes was the same in each case. The results support the idea that the modes provide a common system for perturbing a unique underlying neutral vocal tract shape.
Collapse
Affiliation(s)
- Brad H Story
- Speech Acoustics Laboratory, Department of Speech and Hearing Sciences, University of Arizona, Tucson, Arizona 85721, USA.
| |
Collapse
|
20
|
Ru P, Chi T, Shamma S. The synergy between speech production and perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2003; 113:498-515. [PMID: 12558287 DOI: 10.1121/1.1525288] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed.
Collapse
Affiliation(s)
- Powen Ru
- Center for Auditory and Acoustics Research, Institute for Systems Research, Electrical and Computer Engineering Department, University of Maryland, College Park, Maryland 20742, USA
| | | | | |
Collapse
|
21
|
Oldham DJ. A rapid technique to determine the internal area function of finite-length ducts using maximum length sequence analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2000; 108:44-52. [PMID: 10923869 DOI: 10.1121/1.429528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
This paper describes a rapid technique for reconstruction of the internal area function of a duct using blockage-induced eigenvalue shifts determined from eigenfrequencies measured under two sets of duct termination boundary conditions. A single broad band maximum length sequence (MLS) measurement of short duration is utilized to obtain the transfer function of the duct, which in turn can be utilized to determine its eigenvalue shifts and subsequently its internal area function using an inverse perturbation technique. The reconstruction results display the same order of accuracy as those obtained previously using swept sine measurements of extended duration. An expression for the determination of the area function is presented utilizing resonant frequency information alone, thus rendering duct length determination unnecessary. A computational routine further simplifies the process such that the accuracy of the technique could be ascertained for a range of configurations including longer ducts and ducts that initially have nonuniform internal cross section over their length. Development of a relationship between obstacle length and wavelength of the lowest eigenfrequency required for successful reconstruction is also described. This is an important result for longer ducts where measurement of lower eigenfrequencies may present problems using standard measurement equipment.
Collapse
|
22
|
Savariaux C, Perrier P, Orliaguet JP, Schwartz JL. Compensation strategies for the perturbation of French [u] using a lip tube. II. Perceptual analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 1999; 106:381-393. [PMID: 10420629 DOI: 10.1121/1.427063] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/23/2023]
Abstract
A perceptual analysis of the French vowel [u] produced by 10 speakers under normal and perturbed conditions (Savariaux et al., 1995) is presented which aims at characterizing in the perceptual domain the task of a speaker for this vowel, and, then, at understanding the strategies developed by the speakers to deal with the lip perturbation. Identification and rating tests showed that the French [u] is perceptually fairly well described in the [F1, (F2-F0)] plane, and that the parameter (((F2-F0) + F1)/2) (all frequencies in bark) provides a good overall correlate of the "grave" feature classically used to describe the vowel [u] in all languages. This permitted reanalysis of the behavior of the speakers during the perturbation experiment. Three of them succeed in producing a good [u] in spite of the lip tube, thanks to a combination of limited changes on F1 and (F2-F0), but without producing the strong backward movement of the tongue, which would be necessary to keep the [F1,F2] pattern close to the one measured in normal speech. The only speaker who strongly moved his tongue back and maintained F1 and F2 at low values did not produce a perceptually well-rated [u], but additional tests demonstrate that this gesture allowed him to preserve the most important phonetic features of the French [u], which is primarily a back and rounded vowel. It is concluded that speech production is clearly guided by perceptual requirements, and that the speakers have a good representation of them, even if they are not all able to meet them in perturbed conditions.
Collapse
Affiliation(s)
- C Savariaux
- Institut de la Communication Parlée, UPRESA CNRS 5009, Grenoble, France.
| | | | | | | |
Collapse
|
23
|
Schroeter J, Man Mohan Sondhi. Techniques for estimating vocal-tract shapes from the speech signal. ACTA ACUST UNITED AC 1994. [DOI: 10.1109/89.260356] [Citation(s) in RCA: 104] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
24
|
Marshall I, Rogers M, Drummond G. Acoustic reflectometry for airway measurement. Principles, limitations and previous work. CLINICAL PHYSICS AND PHYSIOLOGICAL MEASUREMENT : AN OFFICIAL JOURNAL OF THE HOSPITAL PHYSICISTS' ASSOCIATION, DEUTSCHE GESELLSCHAFT FUR MEDIZINISCHE PHYSIK AND THE EUROPEAN FEDERATION OF ORGANISATIONS FOR MEDICAL PHYSICS 1991; 12:131-41. [PMID: 1855359 DOI: 10.1088/0143-0815/12/2/002] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Acoustic pulse reflectometry is a relatively recent technique which allows the non-invasive measurement of human airways. The technique consists of guiding an acoustic impulse through the subject's mouth and into the airway. Suitable analysis of the resulting reflection (the 'echo') allows a reconstruction of the area-distance function. The non-invasive nature of the technique offers significant advantages over the established methods of x-ray cephalometry and CT scanning, and makes it very attractive for the investigation of ENT problems and sleep apnoea, and in the anaesthetic management of patients. This paper describes the theory and limitations of acoustic reflectometry, discusses previous work, and suggests some modifications: it is currently being implemented clinically.
Collapse
Affiliation(s)
- I Marshall
- Department of Medical Physics and Medical Engineering, Western General Hospital, Edinburgh, UK
| | | | | |
Collapse
|
25
|
Milenkovic P. Vocal tract area functions from two point acoustic measurements with formant frequency constraints. ACTA ACUST UNITED AC 1984. [DOI: 10.1109/tassp.1984.1164455] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
26
|
Levitt H. Computer applications in audiology and rehabilitation of the hearing impaired. JOURNAL OF COMMUNICATION DISORDERS 1980; 13:471-481. [PMID: 6450225 DOI: 10.1016/0021-9924(80)90046-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Computers are playing an ever-increasing role in audiology. The potential applications of computers in audiology, speech pathology, and aural rehabilitation are described. These include the use of computers in adaptive testing, speech analysis and synthesis, automatic extraction of speech features, supplemental speechreading aids, analysis and synthesis of speechreading cues, language analysis, and computer simulation of the effect of specific speech training strategies.
Collapse
|
27
|
|
28
|
Wakita H, Gray A. Numerical determination of the lip impedance and vocal tract area functions. ACTA ACUST UNITED AC 1975. [DOI: 10.1109/tassp.1975.1162733] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
29
|
Wakita H. Direct estimation of the vocal tract shape by inverse filtering of acoustic speech waveforms. ACTA ACUST UNITED AC 1973. [DOI: 10.1109/tau.1973.1162506] [Citation(s) in RCA: 153] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
30
|
|
31
|
Some experiments with computer synthesized speech. Behav Res Methods 1970. [DOI: 10.3758/bf03205713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|