Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Whalen DH, Chen WR, Tiede MK, Nam H. Variability of articulator positions and formants across nine English vowels. J Phon 2018;68:1-14. [PMID: 30034052 PMCID: PMC6053058 DOI: 10.1016/j.wocn.2018.01.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

For:	Whalen DH, Chen WR, Tiede MK, Nam H. Variability of articulator positions and formants across nine English vowels. J Phon 2018;68:1-14. [PMID: 30034052 PMCID: PMC6053058 DOI: 10.1016/j.wocn.2018.01.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

Number

Cited by Other Article(s)

Shahid MS, French AP, Valstar MF, Yakubov GE. Research in methodologies for modelling the oral cavity. Biomed Phys Eng Express 2024;10:032001. [PMID: 38350128 DOI: 10.1088/2057-1976/ad28cc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Accepted: 02/13/2024] [Indexed: 02/15/2024]

Abstract

The paper aims to explore the current state of understanding surrounding in silico oral modelling. This involves exploring methodologies, technologies and approaches pertaining to the modelling of the whole oral cavity; both internally and externally visible structures that may be relevant or appropriate to oral actions. Such a model could be referred to as a 'complete model' which includes consideration of a full set of facial features (i.e. not only mouth) as well as synergistic stimuli such as audio and facial thermal data. 3D modelling technologies capable of accurately and efficiently capturing a complete representation of the mouth for an individual have broad applications in the study of oral actions, due to their cost-effectiveness and time efficiency. This review delves into the field of clinical phonetics to classify oral actions pertaining to both speech and non-speech movements, identifying how the various vocal organs play a role in the articulatory and masticatory process. Vitaly, it provides a summation of 12 articulatory recording methods, forming a tool to be used by researchers in identifying which method of recording is appropriate for their work. After addressing the cost and resource-intensive limitations of existing methods, a new system of modelling is proposed that leverages external to internal correlation modelling techniques to create a more efficient models of the oral cavity. The vision is that the outcomes will be applicable to a broad spectrum of oral functions related to physiology, health and wellbeing, including speech, oral processing of foods as well as dental health. The applications may span from speech correction, designing foods for the aging population, whilst in the dental field we would be able to gain information about patient's oral actions that would become part of creating a personalised dental treatment plan.

Collapse

Kuo C, Berry J. The Relationship Between Acoustic and Kinematic Vowel Space Areas With and Without Normalization for Speakers With and Without Dysarthria. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2023;32:1923-1937. [PMID: 37105919 PMCID: PMC10561967 DOI: 10.1044/2023_ajslp-22-00158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 09/09/2022] [Accepted: 01/17/2023] [Indexed: 06/19/2023]

Abstract

PURPOSE

Few studies have reported on the vowel space area (VSA) in both acoustic and kinematic domains. This study examined acoustic and kinematic VSAs for speakers with and without dysarthria and evaluated effects of normalization on acoustic and kinematic VSAs and the relationship between these measures.

METHOD

Vowel data from 12 speakers with and without dysarthria, presenting with a range of speech abilities, were examined. The speakers included four speakers with Parkinson's disease (PD), four speakers with brain injury (BI), and four neurotypical (NT) speakers. Speech acoustic and kinematic data were acquired simultaneously using electromagnetic articulography during a passage reading task. Raw and normalized VSAs calculated from corner vowels /i/, /æ/, /ɑ/, and /u/ were evaluated. Normalization was achieved through z score transformations to the acoustic and kinematic data. The effect of normalization on variability within and across groups was evaluated. Regression analysis was used across speakers to assess the association between acoustic and kinematic VSAs for both raw and normalized data.

RESULTS

When evaluating the speakers as three different groups (i.e., PD, BI, and NT), normalization reduced the standard deviations within each group and changed the relative differences in average magnitude between groups. Regression analysis revealed a significant relationship between normalized, but not raw, acoustic and kinematic VSAs, after the exclusion of an outlier speaker.

CONCLUSIONS

Normalization reduces the variability across speakers, within groups, and changes average magnitudes affecting speaker group comparisons. Normalization also influences the correlation between acoustic and kinematic measures. Further investigation of the impact of normalization techniques upon acoustic and kinematic measures is warranted.

SUPPLEMENTAL MATERIAL

https://doi.org/10.23641/asha.22669747.

Collapse

Nault DR, Mitsuya T, Purcell DW, Munhall KG. Perturbing the consistency of auditory feedback in speech. Front Hum Neurosci 2022;16:905365. [PMID: 36092651 PMCID: PMC9453207 DOI: 10.3389/fnhum.2022.905365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 08/04/2022] [Indexed: 11/13/2022] Open

Abstract Sensory information, including auditory feedback, is used by talkers to maintain fluent speech articulation. Current models of speech motor control posit that speakers continually adjust their motor commands based on discrepancies between the sensory predictions made by a forward model and the sensory consequences of their speech movements. Here, in two within-subject design experiments, we used a real-time formant manipulation system to explore how reliant speech articulation is on the accuracy or predictability of auditory feedback information. This involved introducing random formant perturbations during vowel production that varied systematically in their spatial location in formant space (Experiment 1) and temporal consistency (Experiment 2). Our results indicate that, on average, speakers’ responses to auditory feedback manipulations varied based on the relevance and degree of the error that was introduced in the various feedback conditions. In Experiment 1, speakers’ average production was not reliably influenced by random perturbations that were introduced every utterance to the first (F1) and second (F2) formants in various locations of formant space that had an overall average of 0 Hz. However, when perturbations were applied that had a mean of +100 Hz in F1 and −125 Hz in F2, speakers demonstrated reliable compensatory responses that reflected the average magnitude of the applied perturbations. In Experiment 2, speakers did not significantly compensate for perturbations of varying magnitudes that were held constant for one and three trials at a time. Speakers’ average productions did, however, significantly deviate from a control condition when perturbations were held constant for six trials. Within the context of these conditions, our findings provide evidence that the control of speech movements is, at least in part, dependent upon the reliability and stability of the sensory information that it receives over time. Collapse

Modern Responses to Traditional Pitfalls in Gender Affirming Behavioral Voice Modification. Otolaryngol Clin North Am 2022;55:727-738. [PMID: 35752493 DOI: 10.1016/j.otc.2022.05.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Roessig S, Winter B, Mücke D. Tracing the Phonetic Space of Prosodic Focus Marking. Front Artif Intell 2022;5:842546. [PMID: 35664509 PMCID: PMC9160369 DOI: 10.3389/frai.2022.842546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open

Tilsen S, Kim SE, Wang C. Localizing category-related information in speech with multi-scale analyses. PLoS One 2021;16:e0258178. [PMID: 34597350 PMCID: PMC8486085 DOI: 10.1371/journal.pone.0258178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 09/22/2021] [Indexed: 11/25/2022] Open

The Role of Acoustic Similarity and Non-Native Categorisation in Predicting Non-Native Discrimination: Brazilian Portuguese Vowels by English vs. Spanish Listeners. LANGUAGES 2021. [DOI: 10.3390/languages6010044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Magnotti JF, Dzeda KB, Wegner-Clemens K, Rennig J, Beauchamp MS. Weak observer-level correlation and strong stimulus-level correlation between the McGurk effect and audiovisual speech-in-noise: A causal inference explanation. Cortex 2020;133:371-383. [PMID: 33221701 DOI: 10.1016/j.cortex.2020.10.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2020] [Revised: 08/05/2020] [Accepted: 10/05/2020] [Indexed: 11/25/2022]

Abstract

The McGurk effect is a widely used measure of multisensory integration during speech perception. Two observations have raised questions about the validity of the effect as a tool for understanding speech perception. First, there is high variability in perception of the McGurk effect across different stimuli and observers. Second, across observers there is low correlation between McGurk susceptibility and recognition of visual speech paired with auditory speech-in-noise, another common measure of multisensory integration. Using the framework of the causal inference of multisensory speech (CIMS) model, we explored the relationship between the McGurk effect, syllable perception, and sentence perception in seven experiments with a total of 296 different participants. Perceptual reports revealed a relationship between the efficacy of different McGurk stimuli created from the same talker and perception of the auditory component of the McGurk stimuli presented in isolation, both with and without added noise. The CIMS model explained this strong stimulus-level correlation using the principles of noisy sensory encoding followed by optimal cue combination within a common representational space across speech types. Because the McGurk effect (but not speech-in-noise) requires the resolution of conflicting cues between modalities, there is an additional source of individual variability that can explain the weak observer-level correlation between McGurk and noisy speech. Power calculations show that detecting this weak correlation requires studies with many more participants than those conducted to-date. Perception of the McGurk effect and other types of speech can be explained by a common theoretical framework that includes causal inference, suggesting that the McGurk effect is a valid and useful experimental tool.

Collapse

Krivokapić J, Styler W, Parrell B. Pause Postures: The relationship between articulation and cognitive processes during pauses. JOURNAL OF PHONETICS 2020;79:100953. [PMID: 32218635 PMCID: PMC7098615 DOI: 10.1016/j.wocn.2019.100953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Xu Y, Prom-on S. Economy of Effort or Maximum Rate of Information? Exploring Basic Principles of Articulatory Dynamics. Front Psychol 2019;10:2469. [PMID: 31824364 PMCID: PMC6886388 DOI: 10.3389/fpsyg.2019.02469] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Accepted: 10/18/2019] [Indexed: 11/13/2022] Open

Abstract

Economy of effort, a popular notion in contemporary speech research, predicts that dynamic extremes such as the maximum speed of articulatory movement are avoided as much as possible and that approaching the dynamic extremes is necessary only when there is a need to enhance linguistic contrast, as in the case of stress or clear speech. Empirical data, however, do not always support these predictions. In the present study, we considered an alternative principle: maximum rate of information, which assumes that speech dynamics are ultimately driven by the pressure to transmit information as quickly and accurately as possible. For empirical data, we asked speakers of American English to produce repetitive syllable sequences such as wawawawawa as fast as possible by imitating recordings of the same sequences that had been artificially accelerated and to produce meaningful sentences containing the same syllables at normal and fast speaking rates. Analysis of formant trajectories shows that dynamic extremes in meaningful speech sometimes even exceeded those in the nonsense syllable sequences but that this happened more often in unstressed syllables than in stressed syllables. We then used a target approximation model based on a mass-spring system of varying orders to simulate the formant kinematics. The results show that the kind of formant kinematics found in the present study and in previous studies can only be generated by a dynamical system operating with maximal muscular force under strong time pressure and that the dynamics of this operation may hold the solution to the long-standing enigma of greater stiffness in unstressed than in stressed syllables. We conclude, therefore, that maximum rate of information can coherently explain both current and previous empirical data and could therefore be a fundamental principle of motor control in speech production.

Collapse