1
|
Corsini A, Tomassini A, Pastore A, Delis I, Fadiga L, D'Ausilio A. Speech perception difficulty modulates theta-band encoding of articulatory synergies. J Neurophysiol 2024; 131:480-491. [PMID: 38323331 DOI: 10.1152/jn.00388.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Revised: 01/04/2024] [Accepted: 01/25/2024] [Indexed: 02/08/2024] Open
Abstract
The human brain tracks available speech acoustics and extrapolates missing information such as the speaker's articulatory patterns. However, the extent to which articulatory reconstruction supports speech perception remains unclear. This study explores the relationship between articulatory reconstruction and task difficulty. Participants listened to sentences and performed a speech-rhyming task. Real kinematic data of the speaker's vocal tract were recorded via electromagnetic articulography (EMA) and aligned to corresponding acoustic outputs. We extracted articulatory synergies from the EMA data with principal component analysis (PCA) and employed partial information decomposition (PID) to separate the electroencephalographic (EEG) encoding of acoustic and articulatory features into unique, redundant, and synergistic atoms of information. We median-split sentences into easy (ES) and hard (HS) based on participants' performance and found that greater task difficulty involved greater encoding of unique articulatory information in the theta band. We conclude that fine-grained articulatory reconstruction plays a complementary role in the encoding of speech acoustics, lending further support to the claim that motor processes support speech perception.NEW & NOTEWORTHY Top-down processes originating from the motor system contribute to speech perception through the reconstruction of the speaker's articulatory movement. This study investigates the role of such articulatory simulation under variable task difficulty. We show that more challenging listening tasks lead to increased encoding of articulatory kinematics in the theta band and suggest that, in such situations, fine-grained articulatory reconstruction complements acoustic encoding.
Collapse
Affiliation(s)
- Alessandro Corsini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alice Tomassini
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Aldo Pastore
- Laboratorio NEST, Scuola Normale Superiore, Pisa, Italy
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, Leeds, United Kingdom
| | - Luciano Fadiga
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| | - Alessandro D'Ausilio
- Center for Translational Neurophysiology of Speech and Communication, Istituto Italiano di Tecnologia, Ferrara, Italy
- Department of Neuroscience and Rehabilitation, Università di Ferrara, Ferrara, Italy
| |
Collapse
|
2
|
Chiu C, Weng Y, Chen BW. Tongue Postures and Tongue Centers: A Study of Acoustic-Articulatory Correspondences Across Different Head Angles. Front Psychol 2022; 12:768754. [PMID: 35111103 PMCID: PMC8801537 DOI: 10.3389/fpsyg.2021.768754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 12/24/2021] [Indexed: 11/13/2022] Open
Abstract
Recent research on body and head positions has shown that postural changes may induce varying degrees of changes on acoustic speech signals and articulatory gestures. While the preservation of formant profiles across different postures is suitably accounted for by the two-tube model and perturbation theory, it remains unclear whether it is resulted from the accommodation of tongue postures. Specifically, whether the tongue accommodates the changes in head angle to maintain the target acoustics is yet to be determined. The present study examines vowel acoustics and their correspondence with the articulatory maneuvers of the tongue, including both tongue postures and movements of the tongue center, across different head angles. The results show that vowel acoustics, including pitch and formants, are largely unaffected by upward or downward tilting of the head. These preserved acoustics may be attributed to the lingual gestures that compensate for the effects of gravity. Our results also reveal that the tongue postures in response to head movements appear to be vowel-dependent, and the tongue center may serve as an underlying drive that covariates with the head angle changes. These results imply a close relationship between vowel acoustics and tongue postures as well as a target-oriented strategy for different head angles.
Collapse
|
3
|
Alexander R, Sorensen T, Toutios A, Narayanan S. A modular architecture for articulatory synthesis from gestural specification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:4458. [PMID: 31893678 PMCID: PMC7043897 DOI: 10.1121/1.5139413] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 09/19/2019] [Accepted: 11/11/2019] [Indexed: 06/10/2023]
Abstract
This paper proposes a modular architecture for articulatory synthesis from a gestural specification comprising relatively simple models for the vocal tract, the glottis, aero-acoustics, and articulatory control. The vocal tract module combines a midsagittal statistical analysis articulatory model, derived by factor analysis of air-tissue boundaries in real-time magnetic resonance imaging data, with an αβ model for converting midsagittal section to area function specifications. The aero-acoustics and glottis models were based on a software implementation of classic work by Maeda. The articulatory control module uses dynamical systems, which implement articulatory gestures, to animate the statistical articulatory model, inspired by the task dynamics model. Results on synthesizing vowel-consonant-vowel sequences with plosive consonants, using models that were built on data from, and simulate the behavior of, two different speakers are presented.
Collapse
Affiliation(s)
- Rachel Alexander
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Tanner Sorensen
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Asterios Toutios
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| | - Shrikanth Narayanan
- Signal Analysis & Interpretation Laboratory (SAIL), University of Southern California, Los Angeles, California 90007, USA
| |
Collapse
|
4
|
Parrell B, Ramanarayanan V, Nagarajan S, Houde J. The FACTS model of speech motor control: Fusing state estimation and task-based control. PLoS Comput Biol 2019; 15:e1007321. [PMID: 31479444 PMCID: PMC6743785 DOI: 10.1371/journal.pcbi.1007321] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 09/13/2019] [Accepted: 08/02/2019] [Indexed: 11/18/2022] Open
Abstract
We present a new computational model of speech motor control: the Feedback-Aware Control of Tasks in Speech or FACTS model. FACTS employs a hierarchical state feedback control architecture to control simulated vocal tract and produce intelligible speech. The model includes higher-level control of speech tasks and lower-level control of speech articulators. The task controller is modeled as a dynamical system governing the creation of desired constrictions in the vocal tract, after Task Dynamics. Both the task and articulatory controllers rely on an internal estimate of the current state of the vocal tract to generate motor commands. This estimate is derived, based on efference copy of applied controls, from a forward model that predicts both the next vocal tract state as well as expected auditory and somatosensory feedback. A comparison between predicted feedback and actual feedback is then used to update the internal state prediction. FACTS is able to qualitatively replicate many characteristics of the human speech system: the model is robust to noise in both the sensory and motor pathways, is relatively unaffected by a loss of auditory feedback but is more significantly impacted by the loss of somatosensory feedback, and responds appropriately to externally-imposed alterations of auditory and somatosensory feedback. The model also replicates previously hypothesized trade-offs between reliance on auditory and somatosensory feedback and shows for the first time how this relationship may be mediated by acuity in each sensory domain. These results have important implications for our understanding of the speech motor control system in humans.
Collapse
Affiliation(s)
- Benjamin Parrell
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Vikram Ramanarayanan
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
- Educational Testing Service R&D, San Francisco, California, United States of America
| | - Srikantan Nagarajan
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, San Francisco, California, United States of America
| | - John Houde
- Department of Otolaryngology - Head and Neck Surgery, University of California, San Francisco, San Francisco, California, United States of America
| |
Collapse
|
5
|
Sorensen T, Toutios A, Goldstein L, Narayanan S. Task-dependence of articulator synergies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1504. [PMID: 31067947 PMCID: PMC6910022 DOI: 10.1121/1.5093538] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 02/15/2019] [Accepted: 02/19/2019] [Indexed: 06/09/2023]
Abstract
In speech production, the motor system organizes articulators such as the jaw, tongue, and lips into synergies whose function is to produce speech sounds by forming constrictions at the phonetic places of articulation. The present study tests whether synergies for different constriction tasks differ in terms of inter-articulator coordination. The test is conducted on utterances [ɑpɑ], [ɑtɑ], [ɑiɑ], and [ɑkɑ] with a real-time magnetic resonance imaging biomarker that is computed using a statistical model of the forward kinematics of the vocal tract. The present study is the first to estimate the forward kinematics of the vocal tract from speech production data. Using the imaging biomarker, the study finds that the jaw contributes least to the velar stop for [k], more to pharyngeal approximation for [ɑ], still more to palatal approximation for [i], and most to the coronal stop for [t]. Additionally, the jaw contributes more to the coronal stop for [t] than to the bilabial stop for [p]. Finally, the study investigates how this pattern of results varies by participant. The study identifies differences in inter-articulator coordination by constriction task, which support the claim that inter-articulator coordination differs depending on the active articulator synergy.
Collapse
Affiliation(s)
- Tanner Sorensen
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Asterios Toutios
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| | - Louis Goldstein
- Department of Linguistics, University of Southern California, Los Angeles, California 90089, USA
| | - Shrikanth Narayanan
- Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA
| |
Collapse
|
6
|
Ramanarayanan V, Tilsen S, Proctor M, Töger J, Goldstein L, Nayak KS, Narayanan S. Analysis of speech production real-time MRI. COMPUT SPEECH LANG 2018. [DOI: 10.1016/j.csl.2018.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
7
|
|
8
|
Toutios A, Narayanan SS. Advances in real-time magnetic resonance imaging of the vocal tract for speech science and technology research. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING 2016; 5:e6. [PMID: 27833745 PMCID: PMC5100697 DOI: 10.1017/atsip.2016.5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Real-time magnetic resonance imaging (rtMRI) of the moving vocal tract during running speech production is an important emerging tool for speech production research providing dynamic information of a speaker's upper airway from the entire mid-sagittal plane or any other scan plane of interest. There have been several advances in the development of speech rtMRI and corresponding analysis tools, and their application to domains such as phonetics and phonological theory, articulatory modeling, and speaker characterization. An important recent development has been the open release of a database that includes speech rtMRI data from five male and five female speakers of American English each producing 460 phonetically balanced sentences. The purpose of the present paper is to give an overview and outlook of the advances in rtMRI as a tool for speech research and technology development.
Collapse
Affiliation(s)
- Asterios Toutios
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| | - Shrikanth S Narayanan
- Signal Analysis and Interpretation Laboratory (SAIL), University of Southern California (USC), 3740 McClintock Avenue, Los Angeles, CA 90089, USA
| |
Collapse
|
9
|
Sattar F, Rudzicz F. Principal differential analysis for detection of bilabial closure gestures from articulatory data. COMPUT SPEECH LANG 2016. [DOI: 10.1016/j.csl.2015.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|