1
|
Serrurier A, Neuschaefer-Rube C. Morphological and acoustic modeling of the vocal tract. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:1867. [PMID: 37002095 DOI: 10.1121/10.0017356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 02/07/2023] [Indexed: 05/18/2023]
Abstract
In speech production, the anatomical morphology forms the substrate on which the speakers build their articulatory strategy to reach specific articulatory-acoustic goals. The aim of this study is to characterize morphological inter-speaker variability by building a shape model of the full vocal tract including hard and soft structures. Static magnetic resonance imaging data from 41 speakers articulating altogether 1947 phonemes were considered, and the midsagittal articulator contours were manually outlined. A phoneme-independent average-articulation representative of morphology was calculated as the speaker mean articulation. A principal component analysis-driven shape model was derived from average-articulations, leading to five morphological components, which explained 87% of the variance. Almost three-quarters of the variance was related to independent variations of the horizontal oral and vertical pharyngeal lengths, the latter capturing male-female differences. The three additional components captured shape variations related to head tilt and palate shape. Plane wave propagation acoustic simulations were run to characterize morphological components. A lengthening of 1 cm of the vocal tract in the vertical or horizontal directions led to a decrease in formant values of 7%-8%. Further analyses are required to analyze three-dimensional variability and to understand the morphological-acoustic relationships per phoneme. Average-articulations and model code are publicly available (https://github.com/tonioser/VTMorphologicalModel).
Collapse
Affiliation(s)
- Antoine Serrurier
- Clinic for Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital and Medical Faculty of the RWTH Aachen University, 52057 Aachen, Germany
| | - Christiane Neuschaefer-Rube
- Clinic for Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital and Medical Faculty of the RWTH Aachen University, 52057 Aachen, Germany
| |
Collapse
|
2
|
Barbier G, Merzouki R, Bal M, Baum SR, Shiller DM. Visual feedback of the tongue influences speech adaptation to a physical modification of the oral cavity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:718. [PMID: 34470311 DOI: 10.1121/10.0005520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 06/15/2021] [Indexed: 06/13/2023]
Abstract
Studies examining sensorimotor adaptation of speech to changing sensory conditions have demonstrated a central role for both auditory and somatosensory feedback in speech motor learning. The potential influence of visual feedback of oral articulators, which is not typically available during speech production but may nonetheless enhance oral motor control, remains poorly understood. The present study explores the influence of ultrasound visual feedback of the tongue on adaptation of speech production (focusing on the sound /s/) to a physical perturbation of the oral articulators (prosthesis altering the shape of the hard palate). Two visual feedback groups were tested that differed in the two-dimensional plane being imaged (coronal or sagittal) during practice producing /s/ words, along with a no-visual-feedback control group. Participants in the coronal condition were found to adapt their speech production across a broader range of acoustic spectral moments and syllable contexts than the no-feedback controls. In contrast, the sagittal group showed reduced adaptation compared to no-feedback controls. The results indicate that real-time visual feedback of the tongue is spontaneously integrated during speech motor adaptation, with effects that can enhance or interfere with oral motor learning depending on compatibility of the visual articulatory information with requirements of the speaking task.
Collapse
Affiliation(s)
- Guillaume Barbier
- École d'Orthophonie et d'Audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada
| | - Ryme Merzouki
- École d'Orthophonie et d'Audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada
| | - Mathilde Bal
- École d'Orthophonie et d'Audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada
| | - Shari R Baum
- School of Communication Sciences and Disorders, McGill University, 2001 McGill College Avenue, Suite 800, Montréal, Québec H3A 1G1, Canada
| | - Douglas M Shiller
- École d'Orthophonie et d'Audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec H3C 3J7, Canada
| |
Collapse
|
3
|
Dromey C, Richins M, Low T. Kinematic and Acoustic Changes to Vowels and Diphthongs in Bite Block Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1794-1801. [PMID: 33979206 DOI: 10.1044/2021_jslhr-20-00630] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Purpose We examined the effect of bite block insertion (BBI) on lingual movements and formant frequencies in corner vowel and diphthong production in a sentence context. Method Twenty young adults produced the corner vowels (/u/, /ɑ/, /æ/, /i/) and the diphthong /ɑɪ/ in sentence contexts before and after BBI. An electromagnetic articulograph measured the movements of the tongue back, middle, and front. Results There were significant decreases in the acoustic vowel articulation index and vowel space area following BBI. The kinematic vowel articulation index decreased significantly for the back and middle of the tongue but not for the front. There were no significant acoustic changes post-BBI for the diphthong, other than a longer transition duration. Diphthong kinematic changes after BBI included smaller movements for the back and middle of the tongue, but not the front. Conclusions BBI led to a smaller acoustic working space for the corner vowels. The adjustments made by the front of the tongue were sufficient to compensate for the BBI perturbation in the diphthong, resulting in unchanged formant trajectories. The back and middle of the tongue were likely biomechanically restricted in their displacement by the fixation of the jaw, whereas the tongue front showed greater movement flexibility.
Collapse
Affiliation(s)
- Christopher Dromey
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Michelle Richins
- Department of Communication Disorders, Brigham Young University, Provo, UT
| | - Tanner Low
- Department of Communication Disorders, Brigham Young University, Provo, UT
| |
Collapse
|
4
|
Bakst S. Palate shape influence depends on the segment: Articulatory and acoustic variability in American English /ɹ/ and /s/. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:960. [PMID: 33639819 DOI: 10.1121/10.0003379] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
This ultrasound and acoustics study of American English /ɹ/ and /s/ investigates whether variability in production as measured in the midsagittal plane is related to individual differences in the shape of the hard palate in the coronal plane. Both token-to-token variability and variability between different phonetic contexts were investigated. While no direct relationship was found between palate flatness and articulatory variability, a secondary analysis revealed that speakers' articulatory variability for one segment was related to their variability in the other. Speakers with flatter palates tended towards lower articulatory variability scores, but speakers with more domed palates showed both high and low variability scores.
Collapse
Affiliation(s)
- Sarah Bakst
- Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, Wisconsin 53703, USA
| |
Collapse
|
5
|
Barbier G, Baum SR, Ménard L, Shiller DM. Sensorimotor adaptation across the speech production workspace in response to a palatal perturbation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1163. [PMID: 32113266 DOI: 10.1121/10.0000672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Accepted: 01/15/2020] [Indexed: 06/10/2023]
Abstract
Talkers have been shown to adapt the production of multiple vowel sounds simultaneously in response to altered auditory feedback. The present study extends this work by exploring the adaptation of speech production to a physical alteration of the vocal tract involving a palatal prosthesis that impacts both somatosensory and auditory feedback during the production of a range of consonants and vowels. Acoustic and kinematic measures of the tongue were used to examine the impact of the physical perturbation across the various speech sounds, and to assess learned changes following 20 min of speech practice involving the production of complex, variable sentences. As in prior studies, acoustic analyses showed perturbation and adaptation effects primarily for sounds directly involving interaction with the palate. Analyses of tongue kinematics, however, revealed systematic, robust effects of the perturbation and subsequent motor learning across the full range of speech sounds. The results indicate that speakers are able to reconfigure oral motor patterns during the production of multiple speech sounds spanning the articulatory workspace following a physical alteration of the vocal tract.
Collapse
Affiliation(s)
- Guillaume Barbier
- École d'orthophonie et d'audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec, H3C 3J7, Canada
| | - Shari R Baum
- School of Communication Sciences and Disorders, McGill University, 2001 McGill College Avenue, Suite 800, Montréal, Québec, H3A 1G1, Canada
| | - Lucie Ménard
- Département de linguistique, Université du Québec à Montréal, Case Postale 8888, Succursale Centre-Ville, Montréal, Québec, H3C 3P8, Canada
| | - Douglas M Shiller
- École d'orthophonie et d'audiologie, Université de Montréal, Case Postale 6128, Succursale Centre-Ville, Montréal, Québec, H3C 3J7, Canada
| |
Collapse
|
6
|
Mirchandani B, Perrier P, Grosgogeat B, Jeannin C. Study of tongue-palate pressure patterns during the hold phase in the production of French denti-alveolar and velar stops. CLINICAL LINGUISTICS & PHONETICS 2019; 34:54-71. [PMID: 31112658 DOI: 10.1080/02699206.2019.1610978] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/18/2019] [Accepted: 04/21/2019] [Indexed: 06/09/2023]
Abstract
The hold phase of the stop consonants is crucial for a successful production of the release and the acoustic burst. Concurrently, it is also associated with weak acoustic energy and minimal movement, so that conventional acoustic and kinematic approaches are not relevant to investigate motor control. This paper presents an innovative experimental method to study speech motor control during this phase, based on meticulous measurement of the time variation of the mechanical pressure exerted by the tongue against the palate and also characterizing tongue-palate interaction. The concept is based on using miniature transducers with enhanced response characteristics inserted in different locations of the complete denture of edentulous subjects without perturbing the articulation. The study was done with a French-speaking adult whose maxillary denture was duplicated and mounted with six strain gauge transducers. The experiment was done with denti-alveolar and velar stop consonants with two vowel contexts. The results illustrate the potential of such device to analyse speech motor control when contact constrains tongue movements.
Collapse
Affiliation(s)
- Bharat Mirchandani
- GIPSA-lab, CNRS, Grenoble INP, Université Grenoble Alpes, Grenoble, France
- Laboratoire des Multimatériaux et Interfaces, UMR CNRS 5615, Department of Prosthodontics, Faculty of Odontology, Université Claude Bernard Lyon 1, Lyon, France
| | - Pascal Perrier
- GIPSA-lab, CNRS, Grenoble INP, Université Grenoble Alpes, Grenoble, France
| | - Brigitte Grosgogeat
- Laboratoire des Multimatériaux et Interfaces, UMR CNRS 5615, Department of Prosthodontics, Faculty of Odontology, Université Claude Bernard Lyon 1, Lyon, France
| | - Christophe Jeannin
- Laboratoire des Multimatériaux et Interfaces, UMR CNRS 5615, Department of Prosthodontics, Faculty of Odontology, Université Claude Bernard Lyon 1, Lyon, France
- Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
7
|
Li M, Kim J, Lammert A, Ghosh PK, Ramanarayanan V, Narayanan S. Speaker verification based on the fusion of speech acoustics and inverted articulatory signals. COMPUT SPEECH LANG 2015; 36:196-211. [PMID: 28496292 DOI: 10.1016/j.csl.2015.05.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.
Collapse
Affiliation(s)
- Ming Li
- Sun Yat-Sen University Carnegie Mellon University Joint Institute of Engineering, Sun Yat-Sen University, China.,Sun Yat-Sen University Carnegie Mellon University Shunde International Joint Research Institute, Shunde, China.,School of Mobile Information Engineering, Sun Yat-Sen University, China
| | - Jangwon Kim
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA
| | - Adam Lammert
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA
| | - Prasanta Kumar Ghosh
- Department of Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India
| | - Vikram Ramanarayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA
| |
Collapse
|
8
|
Lammert A, Proctor M, Narayanan S. Interspeaker variability in hard palate morphology and vowel production. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2013; 56:S1924-S1933. [PMID: 24687447 DOI: 10.1044/1092-4388(2013/12-0211)] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
PURPOSE Differences in vocal tract morphology have the potential to explain interspeaker variability in speech production. The potential acoustic impact of hard palate shape was examined in simulation, in addition to the interplay among morphology, articulation, and acoustics in real vowel production data. METHOD High-front vowel production from 5 speakers of American English was examined using midsagittal real-time magnetic resonance imaging data with synchronized audio. Relationships among hard palate morphology, tongue shaping, and formant frequencies were analyzed. Simulations were performed to determine the acoustical properties of vocal tracts whose area functions are altered according to prominent hard palate variations. RESULTS Simulations revealed that altering the height and position of the palatal dome alters formant frequencies. Examinations of real speech data showed that palatal morphology is not significantly correlated with any formant frequency but is correlated with major aspects of lingual articulation. CONCLUSION Certain differences in hard palate morphology can substantially affect vowel acoustics, but those effects are not noticeable in real speech. Speakers adapt their lingual articulation to accommodate palate shape differences with the potential to substantially affect formant frequencies, while ignoring palate shape differences with relatively little acoustic impact, lending support for acoustic goals of vowel production.
Collapse
|
9
|
Lammert A, Proctor M, Narayanan S. Morphological variation in the adult hard palate and posterior pharyngeal wall. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2013; 56:521-30. [PMID: 23690566 PMCID: PMC3885355 DOI: 10.1044/1092-4388(2012/12-0059)] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
PURPOSE Adult human vocal tracts display considerable morphological variation across individuals, but the nature and extent of this variation has not been extensively studied for many vocal tract structures. There exists a need to analyze morphological variation and, even more basically, to develop a methodology for morphological analysis of the vocal tract. Such analysis will facilitate fundamental characterization of the speech production system, with broad implications from modeling to explaining interspeaker variability. METHOD A data-driven methodology to automatically analyze the extent and variety of morphological variation is proposed and applied to a diverse subject pool of 36 adults. Analysis is focused on two key aspects of vocal tract structure: the midsagittal shape of the hard palate and the posterior pharyngeal wall. Result Palatal morphology varies widely in its degree of concavity but also in anteriority and sharpness. Pharyngeal wall morphology, by contrast, varies mostly in terms of concavity alone. The distribution of morphological characteristics is complex, and analysis suggests that certain variations may be categorical in nature. CONCLUSION Major modes of morphological variation are identified, including their relative magnitude, distribution, and categorical nature. Implications of these findings for speech articulation strategies and speech acoustics are discussed.
Collapse
Affiliation(s)
- Adam Lammert
- School of Humanities and Languages, University of Western Sydney, Penrith, Australia.
| | | | | |
Collapse
|
10
|
Ng IW, Ono T, Inoue-Arai MS, Honda EI, Kurabayashi T, Moriyama K. Differential articulatory movements during Japanese /s/ and /t/ as revealed by MR image sequences with tooth visualization. Arch Oral Biol 2011; 57:749-59. [PMID: 22138260 DOI: 10.1016/j.archoralbio.2011.11.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Revised: 10/17/2011] [Accepted: 11/03/2011] [Indexed: 10/14/2022]
Abstract
OBJECTIVE To evaluate the spatio-temporal relationships between articulators in the anterior oral cavity, during the production of Japanese fricative and plosive articulation using our proposed method for tooth visualization in MR image sequences. DESIGN Ten healthy adults without malocclusion participated in the study. Customized maxillary and mandibular plates with space around the central incisors that was to be filled with MR-compatible contrast medium were made. During image-acquisition by a cine magnetic resonance imaging (MRI) technique, the subjects repeated vowel-consonant-vowel syllables (/asa/ and /ata/) without wearing the plates. The subjects then wore the plates for tooth imaging. All data were acquired in the midsagittal plane. Tooth boundaries were superimposed using landmarks. Several parameters and spatio-temporal changes in the centre of gravity (CoG) of the tongue were measured. RESULTS During /t/, the duration and amount of tongue-to-palate/incisor contact were significantly greater and the radius of the inscribed circle between the tongue-maxillary incisor-mandibular incisor was significantly shorter than those during /s/. /t/ also had a more anteriorly located CoG of the tongue than /s/ during maximum constriction. The spatio-temporal changes in the CoG of the tongue were significantly different between /asa/ and /ata/. CONCLUSIONS We conclude that increased tongue-to-palate/incisor contact and greater anterior closure are necessary for the production of Japanese /t/ compared to /s/. With the use of this new method for tooth visualization in MR image sequences, it should be possible to evaluate the interaction of teeth and other articulators during speech.
Collapse
Affiliation(s)
- Inn Wo Ng
- Maxillofacial Orthognathics, Graduate School, Tokyo Medical and Dental University, Tokyo, Japan
| | | | | | | | | | | |
Collapse
|