Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Dias JW, Cook TC, Rosenblum LD. Influences of selective adaptation on perception of audiovisual speech. J Phon 2016;56:75-84. [PMID: 27041781 PMCID: PMC4815035 DOI: 10.1016/j.wocn.2016.02.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

For:	Dias JW, Cook TC, Rosenblum LD. Influences of selective adaptation on perception of audiovisual speech. J Phon 2016;56:75-84. [PMID: 27041781 PMCID: PMC4815035 DOI: 10.1016/j.wocn.2016.02.004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Number

Cited by Other Article(s)

Yu Y, Lado A, Zhang Y, Magnotti JF, Beauchamp MS. Synthetic faces generated with the facial action coding system or deep neural networks improve speech-in-noise perception, but not as much as real faces. Front Neurosci 2024;18:1379988. [PMID: 38784097 PMCID: PMC11111898 DOI: 10.3389/fnins.2024.1379988] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 04/11/2024] [Indexed: 05/25/2024] Open

Abstract

The prevalence of synthetic talking faces in both commercial and academic environments is increasing as the technology to generate them grows more powerful and available. While it has long been known that seeing the face of the talker improves human perception of speech-in-noise, recent studies have shown that synthetic talking faces generated by deep neural networks (DNNs) are also able to improve human perception of speech-in-noise. However, in previous studies the benefit provided by DNN synthetic faces was only about half that of real human talkers. We sought to determine whether synthetic talking faces generated by an alternative method would provide a greater perceptual benefit. The facial action coding system (FACS) is a comprehensive system for measuring visually discernible facial movements. Because the action units that comprise FACS are linked to specific muscle groups, synthetic talking faces generated by FACS might have greater verisimilitude than DNN synthetic faces which do not reference an explicit model of the facial musculature. We tested the ability of human observers to identity speech-in-noise accompanied by a blank screen; the real face of the talker; and synthetic talking faces generated either by DNN or FACS. We replicated previous findings of a large benefit for seeing the face of a real talker for speech-in-noise perception and a smaller benefit for DNN synthetic faces. FACS faces also improved perception, but only to the same degree as DNN faces. Analysis at the phoneme level showed that the performance of DNN and FACS faces was particularly poor for phonemes that involve interactions between the teeth and lips, such as /f/, /v/, and /th/. Inspection of single video frames revealed that the characteristic visual features for these phonemes were weak or absent in synthetic faces. Modeling the real vs. synthetic difference showed that increasing the realism of a few phonemes could substantially increase the overall perceptual benefit of synthetic faces.

Collapse

Magnotti JF, Lado A, Zhang Y, Maasø A, Nath A, Beauchamp MS. Repeatedly experiencing the McGurk effect induces long-lasting changes in auditory speech perception. COMMUNICATIONS PSYCHOLOGY 2024;2:25. [PMID: 39242734 PMCID: PMC11332120 DOI: 10.1038/s44271-024-00073-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 03/07/2024] [Indexed: 09/09/2024]

Yu Y, Lado A, Zhang Y, Magnotti JF, Beauchamp MS. The Effect on Speech-in-Noise Perception of Real Faces and Synthetic Faces Generated with either Deep Neural Networks or the Facial Action Coding System. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.05.578468. [PMID: 38370798 PMCID: PMC10871238 DOI: 10.1101/2024.02.05.578468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]

Abstract

The prevalence of synthetic talking faces in both commercial and academic environments is increasing as the technology to generate them grows more powerful and available. While it has long been known that seeing the face of the talker improves human perception of speech-in-noise, recent studies have shown that synthetic talking faces generated by deep neural networks (DNNs) are also able to improve human perception of speech-in-noise. However, in previous studies the benefit provided by DNN synthetic faces was only about half that of real human talkers. We sought to determine whether synthetic talking faces generated by an alternative method would provide a greater perceptual benefit. The facial action coding system (FACS) is a comprehensive system for measuring visually discernible facial movements. Because the action units that comprise FACS are linked to specific muscle groups, synthetic talking faces generated by FACS might have greater verisimilitude than DNN synthetic faces which do not reference an explicit model of the facial musculature. We tested the ability of human observers to identity speech-in-noise accompanied by a blank screen; the real face of the talker; and synthetic talking face generated either by DNN or FACS. We replicated previous findings of a large benefit for seeing the face of a real talker for speech-in-noise perception and a smaller benefit for DNN synthetic faces. FACS faces also improved perception, but only to the same degree as DNN faces. Analysis at the phoneme level showed that the performance of DNN and FACS faces was particularly poor for phonemes that involve interactions between the teeth and lips, such as /f/, /v/, and /th/. Inspection of single video frames revealed that the characteristic visual features for these phonemes were weak or absent in synthetic faces. Modeling the real vs. synthetic difference showed that increasing the realism of a few phonemes could substantially increase the overall perceptual benefit of synthetic faces, providing a roadmap for improving communication in this rapidly developing domain.

Collapse

Ito T, Ogane R. Repetitive Exposure to Orofacial Somatosensory Inputs in Speech Perceptual Training Modulates Vowel Categorization in Speech Perception. Front Psychol 2022;13:839087. [PMID: 35558689 PMCID: PMC9088678 DOI: 10.3389/fpsyg.2022.839087] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 03/24/2022] [Indexed: 11/24/2022] Open

Abstract

Orofacial somatosensory inputs may play a role in the link between speech perception and production. Given the fact that speech motor learning, which involves paired auditory and somatosensory inputs, results in changes to speech perceptual representations, somatosensory inputs may also be involved in learning or adaptive processes of speech perception. Here we show that repetitive pairing of somatosensory inputs and sounds, such as occurs during speech production and motor learning, can also induce a change of speech perception. We examined whether the category boundary between /ε/ and /a/ was changed as a result of perceptual training with orofacial somatosensory inputs. The experiment consisted of three phases: Baseline, Training, and Aftereffect. In all phases, a vowel identification test was used to identify the perceptual boundary between /ε/ and /a/. In the Baseline and the Aftereffect phase, an adaptive method based on the maximum-likelihood procedure was applied to detect the category boundary using a small number of trials. In the Training phase, we used the method of constant stimuli in order to expose participants to stimulus variants which covered the range between /ε/ and /a/ evenly. In this phase, to mimic the sensory input that accompanies speech production and learning in an experimental group, somatosensory stimulation was applied in the upward direction when the stimulus sound was presented. A control group (CTL) followed the same training procedure in the absence of somatosensory stimulation. When we compared category boundaries prior to and following paired auditory-somatosensory training, the boundary for participants in the experimental group reliably changed in the direction of /ε/, indicating that the participants perceived /a/ more than /ε/ as a consequence of training. In contrast, the CTL did not show any change. Although a limited number of participants were tested, the perceptual shift was reduced and almost eliminated 1 week later. Our data suggest that repetitive exposure of somatosensory inputs in a task that simulates the sensory pairing which occurs during speech production, changes perceptual system and supports the idea that somatosensory inputs play a role in speech perceptual adaptation, probably contributing to the formation of sound representations for speech perception.

Collapse

Chodroff E, Wilson C. Acoustic-phonetic and auditory mechanisms of adaptation in the perception of sibilant fricatives. Atten Percept Psychophys 2020;82:2027-2048. [PMID: 31875314 PMCID: PMC7297833 DOI: 10.3758/s13414-019-01894-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Attentional resources contribute to the perceptual learning of talker idiosyncrasies in audiovisual speech. Atten Percept Psychophys 2019;81:1006-1019. [PMID: 30684204 DOI: 10.3758/s13414-018-01651-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]

Shahin AJ, Shen S, Kerlin JR. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech. LANGUAGE, COGNITION AND NEUROSCIENCE 2017;32:1102-1118. [PMID: 28966930 PMCID: PMC5617130 DOI: 10.1080/23273798.2017.1283428] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2016] [Accepted: 01/07/2017] [Indexed: 06/07/2023]

Rosenblum LD, Dorsi J, Dias JW. The Impact and Status of Carol Fowler's Supramodal Theory of Multisensory Speech Perception. ECOLOGICAL PSYCHOLOGY 2016. [DOI: 10.1080/10407413.2016.1230373] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Rosenblum LD, Dias JW, Dorsi J. The supramodal brain: implications for auditory perception. JOURNAL OF COGNITIVE PSYCHOLOGY 2016. [DOI: 10.1080/20445911.2016.1181691] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]