1
|
Fajardo I, Gómez-Merino N, Ferrer A, Rodríguez-Ortiz IR. Hearing What You Can't See: Influence of Face Masks on Speech Perception and Eye Movement by Adults With Hearing Loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:3841-3861. [PMID: 39302873 DOI: 10.1044/2024_jslhr-22-00562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2024]
Abstract
PURPOSE The aim of the study was to analyze how face masks influence speech perception and time spent looking at the speaker's mouth and eyes by adults with and without hearing loss. METHOD Twenty participants with hearing loss and 20 without were asked to repeat Spanish words presented in various conditions, including different types of face masks (no mask, transparent window mask, and opaque mask FFP2) and presentation modes (audiovisual, video only, and audio only). Recognition accuracy and the percentage of time looking at the speaker's eyes and mouth (dwell time) were measured. RESULTS In the audiovisual condition, participants with hearing loss had significantly better word recognition scores when the speaker wore no mask compared to when they wore an opaque face mask. However, there were no differences between the transparent mask and no mask conditions. For those with typical hearing, the type of face mask did not affect speech recognition. Audiovisual presentation consistently improved speech recognition for participants with hearing loss across all face mask conditions, but for those with typical hearing, it only improved compared to video-only mode. These participants demonstrated a ceiling effect in audiovisual and audio-only modes. Regarding eye movement patterns, participants spent less time looking at the speaker's mouth and more time at the eyes when the speaker wore an opaque mask compared to no mask or a transparent mask. CONCLUSION The use of transparent face masks (ClearMask-type model) is recommended in contexts where face masks are still used (hospitals) to prevent the hindering effect of opaque masks (FFP2-type model) in speech perception among people with hearing loss, provided that any fogging of the window of the transparent mask is controlled by wiping it off as needed and the light is in front of the speaker to minimize shadows.
Collapse
Affiliation(s)
- Inmaculada Fajardo
- Departamento de Psicología Evolutiva y de la Educación and ERI-Lectura-Atypical Research Group, Universitat de València, Spain
- Red Lectin (Inclusive Reading Network: Network for Research and Innovation in Atypical Reading)
| | - Nadina Gómez-Merino
- Departamento de Psicología Evolutiva y de la Educación and ERI-Lectura-Atypical Research Group, Universitat de València, Spain
- Red Lectin (Inclusive Reading Network: Network for Research and Innovation in Atypical Reading)
| | - Antonio Ferrer
- Departamento de Psicología Evolutiva y de la Educación and ERI-Lectura-Atypical Research Group, Universitat de València, Spain
- Red Lectin (Inclusive Reading Network: Network for Research and Innovation in Atypical Reading)
| | - Isabel R Rodríguez-Ortiz
- Departamento de Psicología Evolutiva y de la Educación and Laboratorio de Diversidad, Cognición y Lenguaje, Universidad de Sevilla, Spain
- Red Lectin (Inclusive Reading Network: Network for Research and Innovation in Atypical Reading)
| |
Collapse
|
2
|
Gao M, Zhu W, Drewes J. The temporal dynamics of conscious and unconscious audio-visual semantic integration. Heliyon 2024; 10:e33828. [PMID: 39055801 PMCID: PMC11269866 DOI: 10.1016/j.heliyon.2024.e33828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 06/11/2024] [Accepted: 06/27/2024] [Indexed: 07/28/2024] Open
Abstract
We compared the time course of cross-modal semantic effects induced by both naturalistic sounds and spoken words on the processing of visual stimuli, whether visible or suppressed form awareness through continuous flash suppression. We found that, under visible conditions, spoken words elicited audio-visual semantic effects over longer time (-1000, -500, -250 ms SOAs) than naturalistic sounds (-500, -250 ms SOAs). Performance was generally better with auditory primes, but more so with congruent stimuli. Spoken words presented in advance (-1000, -500 ms) outperformed naturalistic sounds; the opposite was true for (near-)simultaneous presentations. Congruent spoken words demonstrated superior categorization performance compared to congruent naturalistic sounds. The audio-visual semantic congruency effect still occurred with suppressed visual stimuli, although without significant variations in the temporal patterns between auditory types. These findings indicate that: 1. Semantically congruent auditory input can enhance visual processing performance, even when the visual stimulus is imperceptible to conscious awareness. 2. The temporal dynamics is contingent on the auditory types only when the visual stimulus is visible. 3. Audiovisual semantic integration requires sufficient time for processing auditory information.
Collapse
Affiliation(s)
- Mingjie Gao
- School of Information Science, Yunnan University, Kunming, China
| | - Weina Zhu
- School of Information Science, Yunnan University, Kunming, China
| | - Jan Drewes
- Institute of Brain and Psychological Sciences, Sichuan Normal University, Chengdu, China
| |
Collapse
|
3
|
Li S, Wang Y, Yu Q, Feng Y, Tang P. The Effect of Visual Articulatory Cues on the Identification of Mandarin Tones by Children With Cochlear Implants. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2024; 67:2106-2114. [PMID: 38768072 DOI: 10.1044/2024_jslhr-23-00559] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
PURPOSE This study explored the facilitatory effect of visual articulatory cues on the identification of Mandarin lexical tones by children with cochlear implants (CIs) in both quiet and noisy environments. It also explored whether early implantation is associated with better use of visual cues in tonal identification. METHOD Participants included 106 children with CIs and 100 normal-hearing (NH) controls. A tonal identification task was employed using a two-alternative forced-choice picture-pointing paradigm. Participants' tonal identification accuracies were compared between audio-only (AO) and audiovisual (AV) modalities. Correlations between implantation ages and visual benefits (accuracy differences between AO and AV modalities) were also examined. RESULTS Children with CIs demonstrated an improved identification accuracy from AO to AV modalities in the noisy environment. Additionally, earlier implantation was significantly correlated with a greater visual benefit in noise. CONCLUSIONS These findings indicated that children with CIs benefited from visual cues on tonal identification in noise, and early implantation enhanced the visual benefit. These results thus have practical implications on tonal perception interventions for Mandarin-speaking children with CIs.
Collapse
Affiliation(s)
- Shanpeng Li
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Yinuo Wang
- Department of English, Linguistics and Theatre Studies, Faculty of Arts & Social Sciences, National University of Singapore
| | - Qianxi Yu
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Yan Feng
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| | - Ping Tang
- MIIT Key Lab for Language Information Processing and Applications, School of Foreign Studies, Nanjing University of Science and Technology, China
| |
Collapse
|
4
|
Bujok R, Meyer AS, Bosker HR. Audiovisual Perception of Lexical Stress: Beat Gestures and Articulatory Cues. LANGUAGE AND SPEECH 2024:238309241258162. [PMID: 38877720 DOI: 10.1177/00238309241258162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2024]
Abstract
Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts.
Collapse
Affiliation(s)
- Ronny Bujok
- Max Planck Institute for Psycholinguistics, The Netherlands
- International Max Planck Research School for Language Sciences, MPI for Psycholinguistics, Max Planck Society, The Netherlands
| | | | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, The Netherlands
| |
Collapse
|
5
|
Perez ND, Kleiman MJ, Barenholtz E. Visual fixations during processing of time-compressed audiovisual presentations. Atten Percept Psychophys 2024; 86:367-372. [PMID: 38175327 DOI: 10.3758/s13414-023-02838-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/19/2023] [Indexed: 01/05/2024]
Abstract
Time-compression is a technique that allows users to adjust the playback speed of audio recordings, but comprehension declines at higher speeds. Previous research has shown that under challenging auditory conditions people have a greater tendency to fixate regions closer to a speaker's mouth. In the current study, we investigated whether there is a similar tendency to fixate the mouth region for time-compressed stimuli. Participants were presented with a brief audiovisual lecture at different speeds, while eye fixations were recorded, and comprehension was tested. Results showed that the 50% compressed lecture group looked more at the nose compared to eye fixations for the normal lecture, and those in the 75% compressed group looked more towards the mouth. Greater compression decreased comprehension, but audiovisual information did not reduce this deficit. These results indicate that people seek out audiovisual information to overcome time-compression, demonstrating the flexibility of the multimodal attentional system.
Collapse
Affiliation(s)
- Nicole D Perez
- Division of Undergraduate Studies, Florida Atlantic University, 777 Glades Rd., Boca Raton, FL, 33433, USA.
| | - Michael J Kleiman
- Comprehensive Center for Brain Health, University of Miami Miller School of Medicine, Miami, FL, USA
| | - Elan Barenholtz
- Department of Psychology, Center for Complex Systems and Brain Sciences, Florida Atlantic University, Boca Raton, FL, USA
| |
Collapse
|
6
|
Dorsi J, Lacey S, Sathian K. Multisensory and lexical information in speech perception. Front Hum Neurosci 2024; 17:1331129. [PMID: 38259332 PMCID: PMC10800662 DOI: 10.3389/fnhum.2023.1331129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
Both multisensory and lexical information are known to influence the perception of speech. However, an open question remains: is either source more fundamental to perceiving speech? In this perspective, we review the literature and argue that multisensory information plays a more fundamental role in speech perception than lexical information. Three sets of findings support this conclusion: first, reaction times and electroencephalographic signal latencies indicate that the effects of multisensory information on speech processing seem to occur earlier than the effects of lexical information. Second, non-auditory sensory input influences the perception of features that differentiate phonetic categories; thus, multisensory information determines what lexical information is ultimately processed. Finally, there is evidence that multisensory information helps form some lexical information as part of a phenomenon known as sound symbolism. These findings support a framework of speech perception that, while acknowledging the influential roles of both multisensory and lexical information, holds that multisensory information is more fundamental to the process.
Collapse
Affiliation(s)
- Josh Dorsi
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
| | - Simon Lacey
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
- Department of Neural and Behavioral Sciences, Penn State College of Medicine, Hershey, PA, United States
- Department of Psychology, Penn State Colleges of Medicine and Liberal Arts, Hershey, PA, United States
| | - K. Sathian
- Department of Neurology, Penn State College of Medicine, Hershey, PA, United States
- Department of Neural and Behavioral Sciences, Penn State College of Medicine, Hershey, PA, United States
- Department of Psychology, Penn State Colleges of Medicine and Liberal Arts, Hershey, PA, United States
| |
Collapse
|
7
|
Mitchel AD, Lusk LG, Wellington I, Mook AT. Segmenting Speech by Mouth: The Role of Oral Prosodic Cues for Visual Speech Segmentation. LANGUAGE AND SPEECH 2023; 66:819-832. [PMID: 36448317 DOI: 10.1177/00238309221137607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Adults are able to use visual prosodic cues in the speaker's face to segment speech. Furthermore, eye-tracking data suggest that learners will shift their gaze to the mouth during visual speech segmentation. Although these findings suggest that the mouth may be viewed more than the eyes or nose during visual speech segmentation, no study has examined the direct functional importance of individual features; thus, it is unclear which visual prosodic cues are important for word segmentation. In this study, we examined the impact of first removing (Experiment 1) and then isolating (Experiment 2) individual facial features on visual speech segmentation. Segmentation performance was above chance in all conditions except for when the visual display was restricted to the eye region (eyes only condition in Experiment 2). This suggests that participants were able to segment speech when they could visually access the mouth but not when the mouth was completely removed from the visual display, providing evidence that visual prosodic cues conveyed by the mouth are sufficient and likely necessary for visual speech segmentation.
Collapse
Affiliation(s)
| | - Laina G Lusk
- Bucknell University, USA; Children's Hospital of Philadelphia, USA
| | - Ian Wellington
- Bucknell University, USA; University of Connecticut, USA
| | | |
Collapse
|
8
|
Alemi R, Wolfe J, Neumann S, Manning J, Towler W, Koirala N, Gracco VL, Deroche M. Audiovisual integration in children with cochlear implants revealed through EEG and fNIRS. Brain Res Bull 2023; 205:110817. [PMID: 37989460 DOI: 10.1016/j.brainresbull.2023.110817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 09/22/2023] [Accepted: 11/13/2023] [Indexed: 11/23/2023]
Abstract
Sensory deprivation can offset the balance of audio versus visual information in multimodal processing. Such a phenomenon could persist for children born deaf, even after they receive cochlear implants (CIs), and could potentially explain why one modality is given priority over the other. Here, we recorded cortical responses to a single speaker uttering two syllables, presented in audio-only (A), visual-only (V), and audio-visual (AV) modes. Electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS) were successively recorded in seventy-five school-aged children. Twenty-five were children with normal hearing (NH) and fifty wore CIs, among whom 26 had relatively high language abilities (HL) comparable to those of NH children, while 24 others had low language abilities (LL). In EEG data, visual-evoked potentials were captured in occipital regions, in response to V and AV stimuli, and they were accentuated in the HL group compared to the LL group (the NH group being intermediate). Close to the vertex, auditory-evoked potentials were captured in response to A and AV stimuli and reflected a differential treatment of the two syllables but only in the NH group. None of the EEG metrics revealed any interaction between group and modality. In fNIRS data, each modality induced a corresponding activity in visual or auditory regions, but no group difference was observed in A, V, or AV stimulation. The present study did not reveal any sign of abnormal AV integration in children with CI. An efficient multimodal integrative network (at least for rudimentary speech materials) is clearly not a sufficient condition to exhibit good language and literacy.
Collapse
Affiliation(s)
- Razieh Alemi
- Department of Psychology, Concordia University, 7141 Sherbrooke St. West, Montreal, Quebec H4B 1R6, Canada.
| | - Jace Wolfe
- Oberkotter Foundation, Oklahoma City, OK, USA
| | - Sara Neumann
- Hearts for Hearing Foundation, 11500 Portland Av., Oklahoma City, OK 73120, USA
| | - Jacy Manning
- Hearts for Hearing Foundation, 11500 Portland Av., Oklahoma City, OK 73120, USA
| | - Will Towler
- Hearts for Hearing Foundation, 11500 Portland Av., Oklahoma City, OK 73120, USA
| | - Nabin Koirala
- Haskins Laboratories, 300 George St., New Haven, CT 06511, USA
| | | | - Mickael Deroche
- Department of Psychology, Concordia University, 7141 Sherbrooke St. West, Montreal, Quebec H4B 1R6, Canada
| |
Collapse
|
9
|
Datta Choudhary Z, Bruder G, Welch GF. Visual Facial Enhancements Can Significantly Improve Speech Perception in the Presence of Noise. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:4751-4760. [PMID: 37782611 DOI: 10.1109/tvcg.2023.3320247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]
Abstract
Human speech perception is generally optimal in quiet environments, however it becomes more difficult and error prone in the presence of noise, such as other humans speaking nearby or ambient noise. In such situations, human speech perception is improved by speech reading, i.e., watching the movements of a speaker's mouth and face, either consciously as done by people with hearing loss or subconsciously by other humans. While previous work focused largely on speech perception of two-dimensional videos of faces, there is a gap in the research field focusing on facial features as seen in head-mounted displays, including the impacts of display resolution, and the effectiveness of visually enhancing a virtual human face on speech perception in the presence of noise. In this paper, we present a comparative user study ( N=21) in which we investigated an audio-only condition compared to two levels of head-mounted display resolution ( 1832×1920 or 916×960 pixels per eye) and two levels of the native or visually enhanced appearance of a virtual human, the latter consisting of an up-scaled facial representation and simulated lipstick (lip coloring) added to increase contrast. To understand effects on speech perception in noise, we measured participants' speech reception thresholds (SRTs) for each audio-visual stimulus condition. These thresholds indicate the decibel levels of the speech signal that are necessary for a listener to receive the speech correctly 50% of the time. First, we show that the display resolution significantly affected participants' ability to perceive the speech signal in noise, which has practical implications for the field, especially in social virtual environments. Second, we show that our visual enhancement method was able to compensate for limited display resolution and was generally preferred by participants. Specifically, our participants indicated that they benefited from the head scaling more than the added facial contrast from the simulated lipstick. We discuss relationships, implications, and guidelines for applications that aim to leverage such enhancements.
Collapse
|
10
|
Croom K, Rumschlag JA, Erickson MA, Binder DK, Razak KA. Developmental delays in cortical auditory temporal processing in a mouse model of Fragile X syndrome. J Neurodev Disord 2023; 15:23. [PMID: 37516865 PMCID: PMC10386252 DOI: 10.1186/s11689-023-09496-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Accepted: 07/18/2023] [Indexed: 07/31/2023] Open
Abstract
BACKGROUND Autism spectrum disorders (ASD) encompass a wide array of debilitating symptoms, including sensory dysfunction and delayed language development. Auditory temporal processing is crucial for speech perception and language development. Abnormal development of temporal processing may account for the language impairments associated with ASD. Very little is known about the development of temporal processing in any animal model of ASD. METHODS In the current study, we quantify auditory temporal processing throughout development in the Fmr1 knock-out (KO) mouse model of Fragile X Syndrome (FXS), a leading genetic cause of intellectual disability and ASD-associated behaviors. Using epidural electrodes in awake and freely moving wildtype (WT) and KO mice, we recorded auditory event related potentials (ERP) and auditory temporal processing with a gap-in-noise auditory steady state response (gap-ASSR) paradigm. Mice were recorded at three different ages in a cross sectional design: postnatal (p)21, p30 and p60. Recordings were obtained from both auditory and frontal cortices. The gap-ASSR requires underlying neural generators to synchronize responses to gaps of different widths embedded in noise, providing an objective measure of temporal processing across genotypes and age groups. RESULTS We present evidence that the frontal, but not auditory, cortex shows significant temporal processing deficits at p21 and p30, with poor ability to phase lock to rapid gaps in noise. Temporal processing was similar in both genotypes in adult mice. ERP amplitudes were larger in Fmr1 KO mice in both auditory and frontal cortex, consistent with ERP data in humans with FXS. CONCLUSIONS These data indicate cortical region-specific delays in temporal processing development in Fmr1 KO mice. Developmental delays in the ability of frontal cortex to follow rapid changes in sounds may shape language delays in FXS, and more broadly in ASD.
Collapse
Affiliation(s)
- Katilynne Croom
- Graduate Neuroscience Program, University of California, Riverside, USA
| | - Jeffrey A Rumschlag
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, Charleston, USA
| | | | - Devin K Binder
- Graduate Neuroscience Program, University of California, Riverside, USA
- Biomedical Sciences, School of Medicine, University of California, Riverside, USA
| | - Khaleel A Razak
- Graduate Neuroscience Program, University of California, Riverside, USA.
- Department of Psychology, University of California, Riverside, USA.
| |
Collapse
|
11
|
Mitsven SG, Perry LK, Jerry CM, Messinger DS. Classroom language during COVID-19: Associations between mask-wearing and objectively measured teacher and preschooler vocalizations. Front Psychol 2022; 13. [PMID: 36438361 PMCID: PMC9682284 DOI: 10.3389/fpsyg.2022.874293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023] Open
Abstract
During the COVID-19 pandemic, mask-wearing in classrooms has become commonplace. However, there are little data on the effect of face-masks on children’s language input and production in educational contexts, like preschool classrooms which over half of United States children attend. Leveraging repeated objective measurements, we longitudinally examined child and teacher speech-related vocalizations in two cohorts of 3.5–4.5-year-old children enrolled in the same oral language classroom that included children with and without hearing loss. Cohort 1 was observed before COVID-19 (no face-masks, N = 20) and Cohort 2 was observed during COVID-19 (with face-masks; N = 15). Vocalization data were collected using child-worn audio recorders over 12 observations spanning two successive school years, yielding 9.09 mean hours of audio recording per child. During COVID-19 teachers produced a higher number of words per minute than teachers observed prior to COVID-19. However, teacher vocalizations during COVID-19 contained fewer unique phonemes than teacher vocalizations prior to COVID-19. Children observed during COVID-19 did not exhibit deficits in the duration, rate, or phonemic diversity of their vocalizations compared to children observed prior to COVID-19. Children observed during COVID-19 produced vocalizations that were longer in duration than vocalizations of children observed prior to COVID-19. During COVID-19 (but not before), children who were exposed to a higher number of words per minute from teachers produced more speech-related vocalizations per minute themselves. Overall, children with hearing loss were exposed to teacher vocalizations that were longer in duration, more teacher words per minute, and more phonemically diverse teacher speech than children with typical hearing. In terms of production, children with hearing loss produced vocalizations that were longer in duration than the vocalizations of children with typical hearing. Among children observed during COVID-19, children with hearing loss exhibited a higher vocalization rate than children with typical hearing. These results suggest that children’s language production is largely unaffected by mask use in the classroom and that children can benefit from the language they are exposed to despite teacher mask-wearing.
Collapse
|
12
|
Zhang F, Lei J, Gong H, Wu H, Chen L. The development of speechreading skills in Chinese students with hearing impairment. Front Psychol 2022; 13:1020211. [PMID: 36405128 PMCID: PMC9674306 DOI: 10.3389/fpsyg.2022.1020211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 10/10/2022] [Indexed: 11/06/2022] Open
Abstract
The developmental trajectory of speechreading skills is poorly understood, and existing research has revealed rather inconsistent results. In this study, 209 Chinese students with hearing impairment between 7 and 20 years old were asked to complete the Chinese Speechreading Test targeting three linguistics levels (i.e., words, phrases, and sentences). Both response time and accuracy data were collected and analyzed. Results revealed (i) no developmental change in speechreading accuracy between ages 7 and 14 after which the accuracy rate either stagnates or drops; (ii) no significant developmental pattern in speed of speechreading across all ages. Results also showed that across all age groups, speechreading accuracy was higher for phrases than words and sentences, and overall levels of speechreading speed fell for phrases, words, and sentences. These findings suggest that the development of speechreading in Chinese is not a continuous, linear process.
Collapse
Affiliation(s)
- Fen Zhang
- Central China Normal University, Wuhan, China
| | | | - Huina Gong
- Central China Normal University, Wuhan, China
| | - Hui Wu
- Shandong University, Jinan, China
- *Correspondence: Hui Wu,
| | - Liang Chen
- University of Georgia, Athens, GA, United States
| |
Collapse
|
13
|
Modulation transfer functions for audiovisual speech. PLoS Comput Biol 2022; 18:e1010273. [PMID: 35852989 PMCID: PMC9295967 DOI: 10.1371/journal.pcbi.1010273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/01/2022] [Indexed: 11/19/2022] Open
Abstract
Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.
Collapse
|
14
|
Goldenberg D, Tiede MK, Bennett RT, Whalen DH. Congruent aero-tactile stimuli bias perception of voicing continua. Front Hum Neurosci 2022; 16:879981. [PMID: 35911601 PMCID: PMC9334670 DOI: 10.3389/fnhum.2022.879981] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Accepted: 06/28/2022] [Indexed: 11/13/2022] Open
Abstract
Multimodal integration is the formation of a coherent percept from different sensory inputs such as vision, audition, and somatosensation. Most research on multimodal integration in speech perception has focused on audio-visual integration. In recent years, audio-tactile integration has also been investigated, and it has been established that puffs of air applied to the skin and timed with listening tasks shift the perception of voicing by naive listeners. The current study has replicated and extended these findings by testing the effect of air puffs on gradations of voice onset time along a continuum rather than the voiced and voiceless endpoints of the original work. Three continua were tested: bilabial (“pa/ba”), velar (“ka/ga”), and a vowel continuum (“head/hid”) used as a control. The presence of air puffs was found to significantly increase the likelihood of choosing voiceless responses for the two VOT continua but had no effect on choices for the vowel continuum. Analysis of response times revealed that the presence of air puffs lengthened responses for intermediate (ambiguous) stimuli and shortened them for endpoint (non-ambiguous) stimuli. The slowest response times were observed for the intermediate steps for all three continua, but for the bilabial continuum this effect interacted with the presence of air puffs: responses were slower in the presence of air puffs, and faster in their absence. This suggests that during integration auditory and aero-tactile inputs are weighted differently by the perceptual system, with the latter exerting greater influence in those cases where the auditory cues for voicing are ambiguous.
Collapse
Affiliation(s)
| | - Mark K. Tiede
- Haskins Laboratories, New Haven, CT, United States
- *Correspondence: Mark K. Tiede,
| | - Ryan T. Bennett
- Department of Linguistics, University of California, Santa Cruz, Santa Cruz, CA, United States
| | - D. H. Whalen
- Haskins Laboratories, New Haven, CT, United States
- The Graduate Center, City University of New York (CUNY), New York, NY, United States
- Department of Linguistics, Yale University, New Haven, CT, United States
| |
Collapse
|
15
|
Brown VA, Dillman-Hasso NH, Li Z, Ray L, Mamantov E, Van Engen KJ, Strand JF. Revisiting the target-masker linguistic similarity hypothesis. Atten Percept Psychophys 2022; 84:1772-1787. [PMID: 35474415 PMCID: PMC10701341 DOI: 10.3758/s13414-022-02486-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/25/2022] [Indexed: 02/01/2023]
Abstract
The linguistic similarity hypothesis states that it is more difficult to segregate target and masker speech when they are linguistically similar. For example, recognition of English target speech should be more impaired by the presence of Dutch masking speech than Mandarin masking speech because Dutch and English are more linguistically similar than Mandarin and English. Across four experiments, English target speech was consistently recognized more poorly when presented in English masking speech than in silence, speech-shaped noise, or an unintelligible masker (i.e., Dutch or Mandarin). However, we found no evidence for graded masking effects-Dutch did not impair performance more than Mandarin in any experiment, despite 650 participants being tested. This general pattern was consistent when using both a cross-modal paradigm (in which target speech was lipread and maskers were presented aurally; Experiments 1a and 1b) and an auditory-only paradigm (in which both the targets and maskers were presented aurally; Experiments 2a and 2b). These findings suggest that the linguistic similarity hypothesis should be refined to reflect the existing evidence: There is greater release from masking when the masker language differs from the target speech than when it is the same as the target speech. However, evidence that unintelligible maskers impair speech identification to a greater extent when they are more linguistically similar to the target language remains elusive.
Collapse
Affiliation(s)
- Violet A Brown
- Department of Psychological and Brain Sciences, Washington University in St. Louis, One Brookings Drive, St. Louis, MO, 63130, USA.
| | - Naseem H Dillman-Hasso
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - ZhaoBin Li
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Lucia Ray
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Ellen Mamantov
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, One Brookings Drive, St. Louis, MO, 63130, USA
| | - Julia F Strand
- Carleton College, Department of Psychology, One North College St, Northfield, MN, 55057, USA
| |
Collapse
|
16
|
Trudeau-Fisette P, Arnaud L, Ménard L. Visual Influence on Auditory Perception of Vowels by French-Speaking Children and Adults. Front Psychol 2022; 13:740271. [PMID: 35282186 PMCID: PMC8913716 DOI: 10.3389/fpsyg.2022.740271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 01/04/2022] [Indexed: 11/26/2022] Open
Abstract
Audiovisual interaction in speech perception is well defined in adults. Despite the large body of evidence suggesting that children are also sensitive to visual input, very few empirical studies have been conducted. To further investigate whether visual inputs influence auditory perception of phonemes in preschoolers in the same way as in adults, we conducted an audiovisual identification test. The auditory stimuli (/e/-/ø/ continuum) were presented either in an auditory condition only or simultaneously with a visual presentation of the articulation of the vowel /e/ or /ø/. The results suggest that, although all participants experienced visual influence on auditory perception, substantial individual differences exist in the 5- to 6-year-old group. While additional work is required to confirm this hypothesis, we suggest that auditory and visual systems are developing at that age and that multisensory phonological categorization of the rounding contrast took place only in children whose sensory systems and sensorimotor representations were mature.
Collapse
Affiliation(s)
- Paméla Trudeau-Fisette
- Laboratoire de Phonétique, Université du Québec à Montréal, Montreal, QC, Canada
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
- *Correspondence: Paméla Trudeau-Fisette,
| | - Laureline Arnaud
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
- Integrated Program in Neuroscience, McGill University, Montreal, QC, Canada
| | - Lucie Ménard
- Laboratoire de Phonétique, Université du Québec à Montréal, Montreal, QC, Canada
- Centre for Research on Brain, Language and Music, Montreal, QC, Canada
| |
Collapse
|
17
|
Peelle JE, Spehar B, Jones MS, McConkey S, Myerson J, Hale S, Sommers MS, Tye-Murray N. Increased Connectivity among Sensory and Motor Regions during Visual and Audiovisual Speech Perception. J Neurosci 2022; 42:435-442. [PMID: 34815317 PMCID: PMC8802926 DOI: 10.1523/jneurosci.0114-21.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Revised: 10/29/2021] [Accepted: 11/08/2021] [Indexed: 11/21/2022] Open
Abstract
In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is degraded. Here, we used fMRI to monitor brain activity while adult humans (n = 60) were presented with visual-only, auditory-only, and audiovisual words. The audiovisual words were presented in quiet and in several signal-to-noise ratios. As expected, audiovisual speech perception recruited both auditory and visual cortex, with some evidence for increased recruitment of premotor cortex in some conditions (including in substantial background noise). We then investigated neural connectivity using psychophysiological interaction analysis with seed regions in both primary auditory cortex and primary visual cortex. Connectivity between auditory and visual cortices was stronger in audiovisual conditions than in unimodal conditions, including a wide network of regions in posterior temporal cortex and prefrontal cortex. In addition to whole-brain analyses, we also conducted a region-of-interest analysis on the left posterior superior temporal sulcus (pSTS), implicated in many previous studies of audiovisual speech perception. We found evidence for both activity and effective connectivity in pSTS for visual-only and audiovisual speech, although these were not significant in whole-brain analyses. Together, our results suggest a prominent role for cross-region synchronization in understanding both visual-only and audiovisual speech that complements activity in integrative brain regions like pSTS.SIGNIFICANCE STATEMENT In everyday conversation, we usually process the talker's face as well as the sound of the talker's voice. Access to visual speech information is particularly useful when the auditory signal is hard to understand (e.g., background noise). Prior work has suggested that specialized regions of the brain may play a critical role in integrating information from visual and auditory speech. Here, we show a complementary mechanism relying on synchronized brain activity among sensory and motor regions may also play a critical role. These findings encourage reconceptualizing audiovisual integration in the context of coordinated network activity.
Collapse
Affiliation(s)
- Jonathan E Peelle
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Brent Spehar
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Michael S Jones
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Sarah McConkey
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| | - Joel Myerson
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Sandra Hale
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Mitchell S Sommers
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130
| | - Nancy Tye-Murray
- Department of Otolaryngology, Washington University in St. Louis, St. Louis, Missouri 63110
| |
Collapse
|
18
|
Clerc O, Fort M, Schwarzer G, Krasotkina A, Vilain A, Méary D, Lœvenbruck H, Pascalis O. Can language modulate perceptual narrowing for faces? Other-race face recognition in infants is modulated by language experience. INTERNATIONAL JOURNAL OF BEHAVIORAL DEVELOPMENT 2021. [DOI: 10.1177/01650254211053054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Between 6 and 9 months, while infant’s ability to discriminate faces within their own racial group is maintained, discrimination of faces within other-race groups declines to a point where 9-month-old infants fail to discriminate other-race faces. Such face perception narrowing can be overcome in various ways at 9 or 12 months of age, such as presenting faces with emotional expressions. Can language itself modulate face narrowing? Many adult studies suggest that language has an impact on the recognition of individuals. For example, adults remember faces previously paired with their native language more accurately than faces paired with a non-native language. We have previously found that from 9 months of age, own-race faces associated with the native language can be learned and recognized whereas own-race faces associated with a non-native language cannot. Based on the language familiarity effect, we hypothesized that the native language could restore recognition of other-race faces after perceptual narrowing has happened. We tested 9- and 12-month-old Caucasian infants. During a familiarization phase, infants were shown still photographs of an Asian face while audio was played either in the native or in the non-native language. Immediately after the familiarization, the familiar face and a novel one were displayed side-by-side for the recognition test. We compared the proportional looking time to the new face to the chance level. Both 9- and 12-month-old infants exhibited recognition memory for the other-race face when familiarized with non-native speech, but not with their native speech. Native language did not facilitate recognition of other-race faces after 9 months of age but a non-native language did, suggesting that 9- and 12-month-olds already have expectations about which language an individual should talk (or at least not talk). Our results confirm the strong links between face and speech processing during infancy.
Collapse
Affiliation(s)
- Olivier Clerc
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Mathilde Fort
- LPNC, Université Grenoble Alpes, Grenoble, France
- Centre de Recherche en NeuroSciences de Lyon, CRNL UMR 5292, Université Lyon 1, Lyon, France
| | - Gudrun Schwarzer
- Department of Developmental Psychology, Justus-Liebig-University Giessen, Germany
| | - Anna Krasotkina
- Department of Developmental Psychology, Justus-Liebig-University Giessen, Germany
| | - Anne Vilain
- Gipsa-Lab, Département Parole et Cognition, CNRS UMR 5216 & Université Grenoble Alpes, Grenoble, France
| | - David Méary
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Hélène Lœvenbruck
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| | - Olivier Pascalis
- LPNC, Université Grenoble Alpes, Grenoble, France
- LPNC, CNRS, Grenoble, France
| |
Collapse
|
19
|
The other-race effect on the McGurk effect in infancy. Atten Percept Psychophys 2021; 83:2924-2936. [PMID: 34386882 PMCID: PMC8460584 DOI: 10.3758/s13414-021-02342-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2021] [Indexed: 11/30/2022]
Abstract
This study investigated the difference in the McGurk effect between own-race-face and other-race-face stimuli among Japanese infants from 5 to 9 months of age. The McGurk effect results from infants using information from a speaker’s face in audiovisual speech integration. We hypothesized that the McGurk effect varies with the speaker’s race because of the other-race effect, which indicates an advantage for own-race faces in our face processing system. Experiment 1 demonstrated the other-race effect on audiovisual speech integration such that the infants ages 5–6 months and 8–9 months are likely to perceive the McGurk effect when observing an own-race-face speaker, but not when observing an other-race-face speaker. Experiment 2 found the other-race effect on audiovisual speech integration regardless of irrelevant speech identity cues. Experiment 3 confirmed the infants’ ability to differentiate two auditory syllables. These results showed that infants are likely to integrate voice with an own-race-face, but not with an other-race-face. This implies the role of experiences with own-race-faces in the development of audiovisual speech integration. Our findings also contribute to the discussion of whether perceptual narrowing is a modality-general, pan-sensory process.
Collapse
|
20
|
Ceuleers D, Dhooge I, Degeest S, Van Steen H, Keppler H, Baudonck N. The Effects of Age, Gender and Test Stimuli on Visual Speech Perception: A Preliminary Study. Folia Phoniatr Logop 2021; 74:131-140. [PMID: 34348290 DOI: 10.1159/000518205] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 06/30/2021] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION To the best of our knowledge, there is a lack of reliable, validated, and standardized (Dutch) measuring instruments to document visual speech perception in a structured way. This study aimed to: (1) evaluate the effects of age, gender, and the used word list on visual speech perception examined by a first version of the Dutch Test for (Audio-)Visual Speech Perception on word level (TAUVIS-words) and (2) assess the internal reliability of the TAUVIS-words. METHODS Thirty-nine normal-hearing adults divided into the following 3 age categories were included: (1) younger adults, age 18-39 years; (2) middle-aged adults, age 40-59 years; and (3) older adults, age >60 years. The TAUVIS-words consist of 4 word lists, i.e., 2 monosyllabic word lists (MS 1 and MS 2) and 2 polysyllabic word lists (PS 1 and PS 2). A first exploration of the effects of age, gender, and test stimuli (i.e., the used word list) on visual speech perception was conducted using the TAUVIS-words. A mixed-design analysis of variance (ANOVA) was conducted to analyze the results statistically. Lastly, the internal reliability of the TAUVIS-words was assessed by calculating the Chronbach α. RESULTS The results revealed a significant effect of the used list. More specifically, the score for MS 1 was significantly better compared to that for PS 2, and the score for PS 1 was significantly better compared to that for PS 2. Furthermore, a significant main effect of gender was found. Women scored significantly better compared to men. The effect of age was not significant. The TAUVIS-word lists were found to have good internal reliability. CONCLUSION This study was a first exploration of the effects of age, gender, and test stimuli on visual speech perception using the TAUVIS-words. Further research is necessary to optimize and validate the TAUVIS-words, making use of a larger study sample.
Collapse
Affiliation(s)
- Dorien Ceuleers
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium
| | - Ingeborg Dhooge
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium.,Department of Ear, Nose, and Throat, Ghent University, Ghent, Belgium
| | - Sofie Degeest
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium
| | | | - Hannah Keppler
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium.,Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
| | - Nele Baudonck
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
| |
Collapse
|
21
|
Chen H, Du J, Hu Y, Dai LR, Yin BC, Lee CH. Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement. Neural Netw 2021; 143:171-182. [PMID: 34157642 DOI: 10.1016/j.neunet.2021.06.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Revised: 04/17/2021] [Accepted: 06/03/2021] [Indexed: 11/26/2022]
Abstract
In this paper, we propose a visual embedding approach to improve embedding aware speech enhancement (EASE) by synchronizing visual lip frames at the phone and place of articulation levels. We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE). Next, we extract audio-visual embedding from noisy speech and lip frames in an information intersection manner, utilizing a complementarity of audio and visual features for multi-modal EASE (MEASE). Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that our proposed subword based VEASE approach is more effective than conventional embedding at the word level. Moreover, visual embedding at the articulation place level, leveraging upon a high correlation between place of articulation and lip shapes, demonstrates an even better performance than that at the phone level. Finally the experiments establish that the proposed MEASE framework, incorporating both audio and visual embeddings, yields significantly better speech quality and intelligibility than those obtained with the best visual-only and audio-only EASE systems.
Collapse
Affiliation(s)
- Hang Chen
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Jun Du
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China.
| | - Yu Hu
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Li-Rong Dai
- National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, Anhui, China
| | - Bao-Cai Yin
- iFlytek Research, iFlytek Co., Ltd., Hefei, Anhui, China
| | - Chin-Hui Lee
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, USA
| |
Collapse
|
22
|
Singh L, Tan A, Quinn PC. Infants recognize words spoken through opaque masks but not through clear masks. Dev Sci 2021; 24:e13117. [PMID: 33942441 PMCID: PMC8236912 DOI: 10.1111/desc.13117] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 04/20/2021] [Accepted: 04/22/2021] [Indexed: 12/20/2022]
Abstract
COVID-19 has modified numerous aspects of children's social environments. Many children are now spoken to through a mask. There is little empirical evidence attesting to the effects of masked language input on language processing. In addition, not much is known about the effects of clear masks (i.e., transparent face shields) versus opaque masks on language comprehension in children. In the current study, 2-year-old infants were tested on their ability to recognize familiar spoken words in three conditions: words presented with no mask, words presented through a clear mask, and words presented through an opaque mask. Infants were able to recognize familiar words presented without a mask and when hearing words through opaque masks, but not when hearing words through clear masks. Findings suggest that the ability of infants to recover spoken language input through masks varies depending on the surface properties of the mask.
Collapse
Affiliation(s)
- Leher Singh
- Department of Psychology, National University of Singapore, Singapore
| | - Agnes Tan
- Department of Psychology, National University of Singapore, Singapore
| | - Paul C Quinn
- Department of Psychological and Brain Sciences, University of Delaware, Newark, Delaware, USA
| |
Collapse
|
23
|
Abstract
Visual speech cues play an important role in speech recognition, and the McGurk effect is a classic demonstration of this. In the original McGurk & Macdonald (Nature 264, 746-748 1976) experiment, 98% of participants reported an illusory "fusion" percept of /d/ when listening to the spoken syllable /b/ and watching the visual speech movements for /g/. However, more recent work shows that subject and task differences influence the proportion of fusion responses. In the current study, we varied task (forced-choice vs. open-ended), stimulus set (including /d/ exemplars vs. not), and data collection environment (lab vs. Mechanical Turk) to investigate the robustness of the McGurk effect. Across experiments, using the same stimuli to elicit the McGurk effect, we found fusion responses ranging from 10% to 60%, thus showing large variability in the likelihood of experiencing the McGurk effect across factors that are unrelated to the perceptual information provided by the stimuli. Rather than a robust perceptual illusion, we therefore argue that the McGurk effect exists only for some individuals under specific task situations.Significance: This series of studies re-evaluates the classic McGurk effect, which shows the relevance of visual cues on speech perception. We highlight the importance of taking into account subject variables and task differences, and challenge future researchers to think carefully about the perceptual basis of the McGurk effect, how it is defined, and what it can tell us about audiovisual integration in speech.
Collapse
|
24
|
Ujiie Y, Takahashi K. Weaker McGurk Effect for Rubin's Vase-Type Speech in People With High Autistic Traits. Multisens Res 2021; 34:1-17. [PMID: 33873157 DOI: 10.1163/22134808-bja10047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2020] [Accepted: 04/05/2021] [Indexed: 11/19/2022]
Abstract
While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin's vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ-Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants' AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech.
Collapse
Affiliation(s)
- Yuta Ujiie
- Graduate School of Psychology, Chukyo University, 101-2 Yagoto Honmachi, Showa-ku, Nagoya-shi, Aichi, 466-8666, Japan
- Japan Society for the Promotion of Science, Kojimachi Business Center Building, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
- Research and Development Initiative, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan
| | - Kohske Takahashi
- School of Psychology, Chukyo University, 101-2 Yagoto Honmachi, Showa-ku, Nagoya-shi, Aichi, 466-8666, Japan
| |
Collapse
|
25
|
Dorn K, Cauvet E, Weinert S. A cross‐linguistic study of multisensory perceptual narrowing in German and Swedish infants during the first year of life. INFANT AND CHILD DEVELOPMENT 2021. [DOI: 10.1002/icd.2217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Katharina Dorn
- Department of Developmental Psychology Otto‐Friedrich University Bamberg Germany
| | - Elodie Cauvet
- Department of Women's and Children's health Karolinska Institute of Neurodevelopmental Disorders (KIND) Stockholm Sweden
| | - Sabine Weinert
- Department of Developmental Psychology Otto‐Friedrich University Bamberg Germany
| |
Collapse
|
26
|
Dorman MF, Natale SC, Agrawal S. The Benefit of Remote and On-Ear Directional Microphone Technology Persists in the Presence of Visual Information. J Am Acad Audiol 2020; 32:39-44. [PMID: 33296930 DOI: 10.1055/s-0040-1718893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
BACKGROUND Both the Roger remote microphone and on-ear, adaptive beamforming technologies (e.g., Phonak UltraZoom) have been shown to improve speech understanding in noise for cochlear implant (CI) listeners when tested in audio-only (A-only) test environments. PURPOSE Our aim was to determine if adult and pediatric CI recipients benefited from these technologies in a more common environment-one in which both audio and visual cues were available and when overall performance was high. STUDY SAMPLE Ten adult CI listeners (Experiment 1) and seven pediatric CI listeners (Experiment 2) were tested. DESIGN Adults were tested in quiet and in two levels of noise (level 1 and level 2) in A-only and audio-visual (AV) environments. There were four device conditions: (1) an ear canal-level, omnidirectional microphone (T-mic) in quiet, (2) the T-mic in noise, (3) an adaptive directional mic (UltraZoom) in noise, and (4) a wireless, remote mic (Roger Pen) in noise. Pediatric listeners were tested in quiet and in level 1 noise in A-only and AV environments. The test conditions were: (1) a behind-the-ear level omnidirectional mic (processor mic) in quiet, (2) the processor mic in noise, (3) the T-mic in noise, and (4) the Roger Pen in noise. DATA COLLECTION AND ANALYSES In each test condition, sentence understanding was assessed (percent correct) and ease of listening ratings were obtained. The sentence understanding data were entered into repeated-measures analyses of variance. RESULTS For both adult and pediatric listeners in the AV test conditions in level 1 noise, performance with the Roger Pen was significantly higher than with the T-mic. For both populations, performance in level 1 noise with the Roger Pen approached the level of baseline performance in quiet. Ease of listening in noise was rated higher in the Roger Pen conditions than in the T-mic or processor mic conditions in both A-only and AV test conditions. CONCLUSION The Roger remote mic and on-ear directional mic technologies benefit both speech understanding and ease of listening in a realistic laboratory test environment and are likely do the same in real-world listening environments.
Collapse
Affiliation(s)
- Michael F Dorman
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | - Sarah Cook Natale
- Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona
| | | |
Collapse
|
27
|
Maran T, Furtner M, Liegl S, Ravet‐Brown T, Haraped L, Sachse P. Visual Attention in Real‐World Conversation: Gaze Patterns Are Modulated by Communication and Group Size. APPLIED PSYCHOLOGY-AN INTERNATIONAL REVIEW-PSYCHOLOGIE APPLIQUEE-REVUE INTERNATIONALE 2020. [DOI: 10.1111/apps.12291] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Thomas Maran
- University of Innsbruck Austria
- LeadershipWerk Liechtenstein
| | | | | | | | | | | |
Collapse
|
28
|
Abstract
A speech signal carries information about meaning and about the talker conveying that meaning. It is now known that these two dimensions are related. There is evidence that gaining experience with a particular talker in one modality not only facilitates better phonetic perception in that modality, but also transfers across modalities to allow better phonetic perception in the other. This finding suggests that experience with a talker provides familiarity with some amodal properties of their articulation such that the experience can be shared across modalities. The present study investigates if experience with talker-specific articulatory information can also support cross-modal talker learning. In Experiment 1 we show that participants can learn to identify ten novel talkers from point-light and sinewave speech, expanding on prior work. Point-light and sinewave speech also supported similar talker identification accuracies, and similar patterns of talker confusions were found across stimulus types. Experiment 2 showed these stimuli could also support cross-modal talker matching, further expanding on prior work. Finally, in Experiment 3 we show that learning to identify talkers in one modality (visual-only point-light speech) facilitates learning of those same talkers in another modality (auditory-only sinewave speech). These results suggest that some of the information for talker identity takes a modality-independent form.
Collapse
|
29
|
He Y, Wu S, Chen C, Fan L, Li K, Wang G, Wang H, Zhou Y. Organized Resting-state Functional Dysconnectivity of the Prefrontal Cortex in Patients with Schizophrenia. Neuroscience 2020; 446:14-27. [PMID: 32858143 DOI: 10.1016/j.neuroscience.2020.08.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2020] [Revised: 07/23/2020] [Accepted: 08/16/2020] [Indexed: 12/25/2022]
Abstract
Schizophrenia has prominent functional dysconnectivity, especially in the prefrontal cortex (PFC). However, it is unclear whether in the same group of patients with schizophrenia, PFC functional dysconnectivity appears in an organized manner or is stochastically located in different subregions. By investigating the resting-state functional connectivity (rsFC) of each PFC subregion from the Brainnetome atlas in 40 schizophrenia patients and 40 healthy subjects, we found 24 altered connections in schizophrenia, and the connections were divided into four categories by a clustering analysis: increased connections within the PFC, increased connections between the inferior PFC and the thalamus/striatum, reduced connections between the PFC and the motor control areas, and reduced connections between the orbital PFC and the emotional perception regions. In addition, the four categories of rsFC showed distinct cognitive engagement patterns. Our findings suggest that PFC subregions have specific functional dysconnectivity patterns in schizophrenia and may reflect heterogeneous symptoms and cognitive deficits in schizophrenia.
Collapse
Affiliation(s)
- Yuwen He
- CAS Key Laboratory of Behavioral Science & Magnetic Resonance Imaging Research Center, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shihao Wu
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Cheng Chen
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Lingzhong Fan
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
| | - Kaixin Li
- Harbin University of Science and Technology, Harbin 150080, China
| | - Gaohua Wang
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Huiling Wang
- Department of Psychiatry, Renmin Hospital of Wuhan University, Wuhan 430060, China
| | - Yuan Zhou
- CAS Key Laboratory of Behavioral Science & Magnetic Resonance Imaging Research Center, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China; Department of Psychology, University of the Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
30
|
Ullas S, Formisano E, Eisner F, Cutler A. Audiovisual and lexical cues do not additively enhance perceptual adaptation. Psychon Bull Rev 2020; 27:707-715. [PMID: 32319002 PMCID: PMC7398951 DOI: 10.3758/s13423-020-01728-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
When listeners experience difficulty in understanding a speaker, lexical and audiovisual (or lipreading) information can be a helpful source of guidance. These two types of information embedded in speech can also guide perceptual adjustment, also known as recalibration or perceptual retuning. With retuning or recalibration, listeners can use these contextual cues to temporarily or permanently reconfigure internal representations of phoneme categories to adjust to and understand novel interlocutors more easily. These two types of perceptual learning, previously investigated in large part separately, are highly similar in allowing listeners to use speech-external information to make phoneme boundary adjustments. This study explored whether the two sources may work in conjunction to induce adaptation, thus emulating real life, in which listeners are indeed likely to encounter both types of cue together. Listeners who received combined audiovisual and lexical cues showed perceptual learning effects similar to listeners who only received audiovisual cues, while listeners who received only lexical cues showed weaker effects compared with the two other groups. The combination of cues did not lead to additive retuning or recalibration effects, suggesting that lexical and audiovisual cues operate differently with regard to how listeners use them for reshaping perceptual categories. Reaction times did not significantly differ across the three conditions, so none of the forms of adjustment were either aided or hindered by processing time differences. Mechanisms underlying these forms of perceptual learning may diverge in numerous ways despite similarities in experimental applications.
Collapse
Affiliation(s)
- Shruti Ullas
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands.
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6200 MD, Maastricht, The Netherlands
| | - Frank Eisner
- Donders Centre for Cognition, Radboud University Nijmegen, 6500 AH, Nijmegen, The Netherlands
| | - Anne Cutler
- MARCS Institute and ARC Centre of Excellence for the Dynamics of Language, Western Sydney University, Penrith, NSW, 2751, Australia
| |
Collapse
|
31
|
Georgiou GP. Speech perception in visually impaired individuals might be diminished as a consequence of monomodal cue acquisition. Med Hypotheses 2020; 143:110088. [PMID: 32679427 DOI: 10.1016/j.mehy.2020.110088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 07/03/2020] [Accepted: 07/05/2020] [Indexed: 10/23/2022]
Abstract
Several studies suggest that both the auditory and the visual modalities are important for speech perception, while absence of the visual modality (e.g., due to visual impairment) causes perceptual deficits. By contrast, a body of research portrays that visual cues are not mandatory for the perception of speech movements and thus visually impaired individuals may demonstrate enhanced perceptual abilities. The present paper supports the hypothesis that second language speech perception in individuals with visual impairments might be diminished in comparison to speech perception in individuals with normal sight. Although there is evidence against this hypothesis, we assume that most of the earlier work did not take into account several factors which may affect speech perception, while research on second language phone perception by individuals with visual impairments is limited.
Collapse
Affiliation(s)
- Georgios P Georgiou
- Department of General and Russian Linguistics, People's Friendship University of Russia (RUDN University), Moscow, Russia; Department of Languages and Literature, University of Nicosia, Nicosia, Cyprus.
| |
Collapse
|
32
|
Templeton JM, Poellabauer C, Schneider S. Enhancement of Neurocognitive Assessments Using Smartphone Capabilities: Systematic Review. JMIR Mhealth Uhealth 2020; 8:e15517. [PMID: 32442150 PMCID: PMC7381077 DOI: 10.2196/15517] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Revised: 11/26/2019] [Accepted: 03/23/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Comprehensive exams such as the Dean-Woodcock Neuropsychological Assessment System, the Global Deterioration Scale, and the Boston Diagnostic Aphasia Examination are the gold standard for doctors and clinicians in the preliminary assessment and monitoring of neurocognitive function in conditions such as neurodegenerative diseases and acquired brain injuries (ABIs). In recent years, there has been an increased focus on implementing these exams on mobile devices to benefit from their configurable built-in sensors, in addition to scoring, interpretation, and storage capabilities. As smartphones become more accepted in health care among both users and clinicians, the ability to use device information (eg, device position, screen interactions, and app usage) for subject monitoring also increases. Sensor-based assessments (eg, functional gait using a mobile device's accelerometer and/or gyroscope or collection of speech samples using recordings from the device's microphone) include the potential for enhanced information for diagnoses of neurological conditions; mapping the development of these conditions over time; and monitoring efficient, evidence-based rehabilitation programs. OBJECTIVE This paper provides an overview of neurocognitive conditions and relevant functions of interest, analysis of recent results using smartphone and/or tablet built-in sensor information for the assessment of these different neurocognitive conditions, and how human-device interactions and the assessment and monitoring of these neurocognitive functions can be enhanced for both the patient and health care provider. METHODS This survey presents a review of current mobile technological capabilities to enhance the assessment of various neurocognitive conditions, including both neurodegenerative diseases and ABIs. It explores how device features can be configured for assessments as well as the enhanced capability and data monitoring that will arise due to the addition of these features. It also recognizes the challenges that will be apparent with the transfer of these current assessments to mobile devices. RESULTS Built-in sensor information on mobile devices is found to provide information that can enhance neurocognitive assessment and monitoring across all functional categories. Configurations of positional sensors (eg, accelerometer, gyroscope, and GPS), media sensors (eg, microphone and camera), inherent sensors (eg, device timer), and participatory user-device interactions (eg, screen interactions, metadata input, app usage, and device lock and unlock) are all helpful for assessing these functions for the purposes of training, monitoring, diagnosis, or rehabilitation. CONCLUSIONS This survey discusses some of the many opportunities and challenges of implementing configured built-in sensors on mobile devices to enhance assessments and monitoring of neurocognitive functions as well as disease progression across neurodegenerative and acquired neurological conditions.
Collapse
Affiliation(s)
- John Michael Templeton
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Christian Poellabauer
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, United States
| | - Sandra Schneider
- Department of Communicative Sciences and Disorders, Saint Mary's College, Notre Dame, IN, United States
| |
Collapse
|
33
|
Age-related hearing loss influences functional connectivity of auditory cortex for the McGurk illusion. Cortex 2020; 129:266-280. [PMID: 32535378 DOI: 10.1016/j.cortex.2020.04.022] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2019] [Revised: 03/30/2020] [Accepted: 04/09/2020] [Indexed: 01/23/2023]
Abstract
Age-related hearing loss affects hearing at high frequencies and is associated with difficulties in understanding speech. Increased audio-visual integration has recently been found in age-related hearing impairment, the brain mechanisms that contribute to this effect are however unclear. We used functional magnetic resonance imaging in elderly subjects with normal hearing and mild to moderate uncompensated hearing loss. Audio-visual integration was studied using the McGurk task. In this task, an illusionary fused percept can occur if incongruent auditory and visual syllables are presented. The paradigm included unisensory stimuli (auditory only, visual only), congruent audio-visual and incongruent (McGurk) audio-visual stimuli. An illusionary precept was reported in over 60% of incongruent trials. These McGurk illusion rates were equal in both groups of elderly subjects and correlated positively with speech-in-noise perception and daily listening effort. Normal-hearing participants showed an increased neural response in left pre- and postcentral gyri and right middle frontal gyrus for incongruent stimuli (McGurk) compared to congruent audio-visual stimuli. Activation patterns were however not different between groups. Task-modulated functional connectivity differed between groups showing increased connectivity from auditory cortex to visual, parietal and frontal areas in hard of hearing participants as compared to normal-hearing participants when comparing incongruent stimuli (McGurk) with congruent audio-visual stimuli. These results suggest that changes in functional connectivity of auditory cortex rather than activation strength during processing of audio-visual McGurk stimuli accompany age-related hearing loss.
Collapse
|
34
|
Yuan Y, Wayland R, Oh Y. Visual analog of the acoustic amplitude envelope benefits speech perception in noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:EL246. [PMID: 32237828 DOI: 10.1121/10.0000737] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/29/2020] [Indexed: 06/11/2023]
Abstract
The nature of the visual input that integrates with the audio signal to yield speech processing advantages remains controversial. This study tests the hypothesis that the information extracted for audiovisual integration includes co-occurring suprasegmental dynamic changes in the acoustic and visual signal. English sentences embedded in multi-talker babble noise were presented to native English listeners in audio-only and audiovisual modalities. A significant intelligibility enhancement with the visual analogs congruent to the acoustic amplitude envelopes was observed. These results suggest that dynamic visual modulation provides speech rhythmic information that can be integrated online with the audio signal to enhance speech intelligibility.
Collapse
Affiliation(s)
- Yi Yuan
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, Florida 32610, USA
| | - Ratree Wayland
- Department of Linguistics, University of Florida, Gainesville, Florida 32611, , ,
| | - Yonghee Oh
- Department of Speech, Language, and Hearing Sciences, University of Florida, Gainesville, Florida 32610, USA
| |
Collapse
|
35
|
Liu L, Jaeger TF. Talker-specific pronunciation or speech error? Discounting (or not) atypical pronunciations during speech perception. J Exp Psychol Hum Percept Perform 2019; 45:1562-1588. [PMID: 31750716 DOI: 10.1037/xhp0000693] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Perceptual recalibration allows listeners to adapt to talker-specific pronunciations, such as atypical realizations of specific sounds. Such recalibration can facilitate robust speech recognition. However, indiscriminate recalibration following any atypically pronounced words also risks interpreting pronunciations as characteristic of a talker that are in reality because of incidental, short-lived factors (such as a speech error). We investigate whether the mechanisms underlying perceptual recalibration involve inferences about the causes for unexpected pronunciations. In 5 experiments, we ask whether perceptual recalibration is blocked if the atypical pronunciations of an unfamiliar talker can also be attributed to other incidental causes. We investigated 3 type of incidental causes for atypical pronunciations: the talker is intoxicated, the talker speaks unusually fast, or the atypical pronunciations occur only in the context of tongue twisters. In all 5 experiments, we find robust evidence for perceptual recalibration, but little evidence that the presence of incidental causes block perceptual recalibration. We discuss these results in light of other recent findings that incidental causes can block perceptual recalibration. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
Collapse
Affiliation(s)
- Linda Liu
- Department of Brain and Cognitive Sciences, University of Rochester
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester
| |
Collapse
|
36
|
Moradi S, Lidestam B, Ning Ng EH, Danielsson H, Rönnberg J. Perceptual Doping: An Audiovisual Facilitation Effect on Auditory Speech Processing, From Phonetic Feature Extraction to Sentence Identification in Noise. Ear Hear 2019; 40:312-327. [PMID: 29870521 PMCID: PMC6400397 DOI: 10.1097/aud.0000000000000616] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 04/15/2018] [Indexed: 11/25/2022]
Abstract
OBJECTIVE We have previously shown that the gain provided by prior audiovisual (AV) speech exposure for subsequent auditory (A) sentence identification in noise is relatively larger than that provided by prior A speech exposure. We have called this effect "perceptual doping." Specifically, prior AV speech processing dopes (recalibrates) the phonological and lexical maps in the mental lexicon, which facilitates subsequent phonological and lexical access in the A modality, separately from other learning and priming effects. In this article, we use data from the n200 study and aim to replicate and extend the perceptual doping effect using two different A and two different AV speech tasks and a larger sample than in our previous studies. DESIGN The participants were 200 hearing aid users with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. There were four speech tasks in the n200 study that were presented in both A and AV modalities (gated consonants, gated vowels, vowel duration discrimination, and sentence identification in noise tasks). The modality order of speech presentation was counterbalanced across participants: half of the participants completed the A modality first and the AV modality second (A1-AV2), and the other half completed the AV modality and then the A modality (AV1-A2). Based on the perceptual doping hypothesis, which assumes that the gain of prior AV exposure will be relatively larger relative to that of prior A exposure for subsequent processing of speech stimuli, we predicted that the mean A scores in the AV1-A2 modality order would be better than the mean A scores in the A1-AV2 modality order. We therefore expected a significant difference in terms of the identification of A speech stimuli between the two modality orders (A1 versus A2). As prior A exposure provides a smaller gain than AV exposure, we also predicted that the difference in AV speech scores between the two modality orders (AV1 versus AV2) may not be statistically significantly different. RESULTS In the gated consonant and vowel tasks and the vowel duration discrimination task, there were significant differences in A performance of speech stimuli between the two modality orders. The participants' mean A performance was better in the AV1-A2 than in the A1-AV2 modality order (i.e., after AV processing). In terms of mean AV performance, no significant difference was observed between the two orders. In the sentence identification in noise task, a significant difference in the A identification of speech stimuli between the two orders was observed (A1 versus A2). In addition, a significant difference in the AV identification of speech stimuli between the two orders was also observed (AV1 versus AV2). This finding was most likely because of a procedural learning effect due to the greater complexity of the sentence materials or a combination of procedural learning and perceptual learning due to the presentation of sentential materials in noisy conditions. CONCLUSIONS The findings of the present study support the perceptual doping hypothesis, as prior AV relative to A speech exposure resulted in a larger gain for the subsequent processing of speech stimuli. For complex speech stimuli that were presented in degraded listening conditions, a procedural learning effect (or a combination of procedural learning and perceptual learning effects) also facilitated the identification of speech stimuli, irrespective of whether the prior modality was A or AV.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Elaine Hoi Ning Ng
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
- Oticon A/S, Smørum, Denmark
| | - Henrik Danielsson
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Swedish Institute for Disability Research, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| |
Collapse
|
37
|
Abstract
Visual cues facilitate speech perception during face-to-face communication, particularly in noisy environments. These visual-driven enhancements arise from both automatic lip-reading behaviors and attentional tuning to auditory-visual signals. However, in crowded settings, such as a cocktail party, how do we accurately bind the correct voice to the correct face, enabling the benefit of visual cues on speech perception processes? Previous research has emphasized that spatial and temporal alignment of the auditory-visual signals determines which voice is integrated with which speaking face. Here, we present a novel illusion demonstrating that when multiple faces and voices are presented in the presence of ambiguous temporal and spatial information as to which pairs of auditory-visual signals should be integrated, our perceptual system relies on identity information extracted from each signal to determine pairings. Data from three experiments demonstrate that expectations about an individual’s voice (based on their identity) can change where individuals perceive that voice to arise from.
Collapse
Affiliation(s)
- David Brang
- Department of Psychology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
38
|
Deaf signers outperform hearing non-signers in recognizing happy facial expressions. PSYCHOLOGICAL RESEARCH 2019; 84:1485-1494. [DOI: 10.1007/s00426-019-01160-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 02/25/2019] [Indexed: 01/21/2023]
|
39
|
Saunders JL, Wehr M. Mice can learn phonetic categories. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1168. [PMID: 31067917 PMCID: PMC6910010 DOI: 10.1121/1.5091776] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2018] [Revised: 01/26/2019] [Accepted: 02/04/2019] [Indexed: 06/09/2023]
Abstract
Speech is perceived as a series of relatively invariant phonemes despite extreme variability in the acoustic signal. To be perceived as nearly-identical phonemes, speech sounds that vary continuously over a range of acoustic parameters must be perceptually discretized by the auditory system. Such many-to-one mappings of undifferentiated sensory information to a finite number of discrete categories are ubiquitous in perception. Although many mechanistic models of phonetic perception have been proposed, they remain largely unconstrained by neurobiological data. Current human neurophysiological methods lack the necessary spatiotemporal resolution to provide it: speech is too fast, and the neural circuitry involved is too small. This study demonstrates that mice are capable of learning generalizable phonetic categories, and can thus serve as a model for phonetic perception. Mice learned to discriminate consonants and generalized consonant identity across novel vowel contexts and speakers, consistent with true category learning. A mouse model, given the powerful genetic and electrophysiological tools for probing neural circuits available for them, has the potential to powerfully augment a mechanistic understanding of phonetic perception.
Collapse
Affiliation(s)
- Jonny L Saunders
- University of Oregon, Institute of Neuroscience and Department of Psychology, Eugene, Oregon 97403, USA
| | - Michael Wehr
- University of Oregon, Institute of Neuroscience and Department of Psychology, Eugene, Oregon 97403, USA
| |
Collapse
|
40
|
Lei J, Gong H, Chen L. Enhanced Speechreading Performance in Young Hearing Aid Users in China. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:307-317. [PMID: 30950700 DOI: 10.1044/2018_jslhr-s-18-0153] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Purpose The study was designed primarily to determine if the use of hearing aids (HAs) in individuals with hearing impairment in China would affect their speechreading performance. Method Sixty-seven young adults with hearing impairment with HAs and 78 young adults with hearing impairment without HAs completed newly developed Chinese speechreading tests targeting 3 linguistic levels (i.e., words, phrases, and sentences). Results Groups with HAs were more accurate at speechreading than groups without HA across the 3 linguistic levels. For both groups, speechreading accuracy was higher for phrases than words and sentences, and speechreading speed was slower for sentences than words and phrases. Furthermore, there was a positive correlation between years of HA use and the accuracy of speechreading performance; longer HA use was associated with more accurate speechreading. Conclusions Young HA users in China have enhanced speechreading performance over their peers with hearing impairment who are not HA users. This result argues against the perceptual dependence hypothesis that suggests greater dependence on visual information leads to improvement in visual speech perception.
Collapse
Affiliation(s)
- Jianghua Lei
- Department of Special Education, Central China Normal University, Wuhan
| | - Huina Gong
- Department of Special Education, Central China Normal University, Wuhan
| | - Liang Chen
- Department of Communication Sciences and Special Education, University of Georgia, Athens
| |
Collapse
|
41
|
Werchan DM, Baumgartner HA, Lewkowicz DJ, Amso D. The origins of cortical multisensory dynamics: Evidence from human infants. Dev Cogn Neurosci 2018; 34:75-81. [PMID: 30099263 PMCID: PMC6629259 DOI: 10.1016/j.dcn.2018.07.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Revised: 07/03/2018] [Accepted: 07/13/2018] [Indexed: 12/15/2022] Open
Abstract
Classic views of multisensory processing suggest that cortical sensory regions are specialized. More recent views argue that cortical sensory regions are inherently multisensory. To date, there are no published neuroimaging data that directly test these claims in infancy. Here we used fNIRS to show that temporal and occipital cortex are functionally coupled in 3.5-5-month-old infants (N = 65), and that the extent of this coupling during a synchronous, but not an asynchronous, audiovisual event predicted whether occipital cortex would subsequently respond to sound-only information. These data suggest that multisensory experience may shape cortical dynamics to adapt to the ubiquity of synchronous multisensory information in the environment, and invoke the possibility that adaptation to the environment can also reflect broadening of the computational range of sensory systems.
Collapse
Affiliation(s)
- Denise M Werchan
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, 190 Thayer St. Providence, RI, 02912, United States
| | - Heidi A Baumgartner
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, 190 Thayer St. Providence, RI, 02912, United States
| | - David J Lewkowicz
- Department of Communication Sciences and Disorders, Northeastern University, 360 Huntington Ave., Boston, MA, 02115, United States
| | - Dima Amso
- Department of Cognitive, Linguistic, and Psychological Sciences, Brown University, 190 Thayer St. Providence, RI, 02912, United States.
| |
Collapse
|
42
|
Devesse A, Dudek A, van Wieringen A, Wouters J. Speech intelligibility of virtual humans. Int J Audiol 2018; 57:908-916. [DOI: 10.1080/14992027.2018.1511922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Annelies Devesse
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Alexander Dudek
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Astrid van Wieringen
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Jan Wouters
- KU Leuven, – University of Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| |
Collapse
|
43
|
Hillairet de Boisferon A, Tift AH, Minar NJ, Lewkowicz DJ. The redeployment of attention to the mouth of a talking face during the second year of life. J Exp Child Psychol 2018; 172:189-200. [PMID: 29627481 PMCID: PMC5920681 DOI: 10.1016/j.jecp.2018.03.009] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Revised: 03/18/2018] [Accepted: 03/19/2018] [Indexed: 11/16/2022]
Abstract
Previous studies have found that when monolingual infants are exposed to a talking face speaking in a native language, 8- and 10-month-olds attend more to the talker's mouth, whereas 12-month-olds no longer do so. It has been hypothesized that the attentional focus on the talker's mouth at 8 and 10 months of age reflects reliance on the highly salient audiovisual (AV) speech cues for the acquisition of basic speech forms and that the subsequent decline of attention to the mouth by 12 months of age reflects the emergence of basic native speech expertise. Here, we investigated whether infants may redeploy their attention to the mouth once they fully enter the word-learning phase. To test this possibility, we recorded eye gaze in monolingual English-learning 14- and 18-month-olds while they saw and heard a talker producing an English or Spanish utterance in either an infant-directed (ID) or adult-directed (AD) manner. Results indicated that the 14-month-olds attended more to the talker's mouth than to the eyes when exposed to the ID utterance and that the 18-month-olds attended more to the talker's mouth when exposed to the ID and the AD utterance. These results show that infants redeploy their attention to a talker's mouth when they enter the word acquisition phase and suggest that infants rely on the greater perceptual salience of redundant AV speech cues to acquire their lexicon.
Collapse
Affiliation(s)
- Anne Hillairet de Boisferon
- Department of Psychology, Florida Atlantic University and Florida Atlantic University High School Research Program, Boca Raton, FL 33314, USA
| | - Amy H Tift
- Department of Psychology, Florida Atlantic University and Florida Atlantic University High School Research Program, Boca Raton, FL 33314, USA
| | - Nicholas J Minar
- Institute for the Study of Child Development, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ 08901, USA
| | - David J Lewkowicz
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA 02115, USA.
| |
Collapse
|
44
|
Masapollo M, Polka L, Ménard L, Franklin L, Tiede M, Morgan J. Asymmetries in unimodal visual vowel perception: The roles of oral-facial kinematics, orientation, and configuration. J Exp Psychol Hum Percept Perform 2018; 44:1103-1118. [PMID: 29517257 PMCID: PMC6037555 DOI: 10.1037/xhp0000518] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Masapollo, Polka, and Ménard (2017) recently reported a robust directional asymmetry in unimodal visual vowel perception: Adult perceivers discriminate a change from an English /u/ viseme to a French /u/ viseme significantly better than a change in the reverse direction. This asymmetry replicates a frequent pattern found in unimodal auditory vowel perception that points to a universal bias favoring more extreme vocalic articulations, which lead to acoustic signals with increased formant convergence. In the present article, the authors report 5 experiments designed to investigate whether this asymmetry in the visual realm reflects a speech-specific or general processing bias. They successfully replicated the directional effect using Masapollo et al.'s dynamically articulating faces but failed to replicate the effect when the faces were shown under static conditions. Asymmetries also emerged during discrimination of canonically oriented point-light stimuli that retained the kinematics and configuration of the articulating mouth. In contrast, no asymmetries emerged during discrimination of rotated point-light stimuli or Lissajou patterns that retained the kinematics, but not the canonical orientation or spatial configuration, of the labial gestures. These findings suggest that the perceptual processes underlying asymmetries in unimodal visual vowel discrimination are sensitive to speech-specific motion and configural properties and raise foundational questions concerning the role of specialized and general processes in vowel perception. (PsycINFO Database Record
Collapse
Affiliation(s)
- Matthew Masapollo
- Brown University
- McGill University
- Centre for Research on Brain, Language, and Music
| | - Linda Polka
- McGill University
- Centre for Research on Brain, Language, and Music
| | - Lucie Ménard
- Centre for Research on Brain, Language, and Music
- University of Quebec at Montreal
| | | | | | | |
Collapse
|
45
|
Rosemann S, Thiel CM. Audio-visual speech processing in age-related hearing loss: Stronger integration and increased frontal lobe recruitment. Neuroimage 2018; 175:425-437. [PMID: 29655940 DOI: 10.1016/j.neuroimage.2018.04.023] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Revised: 03/09/2018] [Accepted: 04/09/2018] [Indexed: 11/19/2022] Open
Abstract
Hearing loss is associated with difficulties in understanding speech, especially under adverse listening conditions. In these situations, seeing the speaker improves speech intelligibility in hearing-impaired participants. On the neuronal level, previous research has shown cross-modal plastic reorganization in the auditory cortex following hearing loss leading to altered processing of auditory, visual and audio-visual information. However, how reduced auditory input effects audio-visual speech perception in hearing-impaired subjects is largely unknown. We here investigated the impact of mild to moderate age-related hearing loss on processing audio-visual speech using functional magnetic resonance imaging. Normal-hearing and hearing-impaired participants performed two audio-visual speech integration tasks: a sentence detection task inside the scanner and the McGurk illusion outside the scanner. Both tasks consisted of congruent and incongruent audio-visual conditions, as well as auditory-only and visual-only conditions. We found a significantly stronger McGurk illusion in the hearing-impaired participants, which indicates stronger audio-visual integration. Neurally, hearing loss was associated with an increased recruitment of frontal brain areas when processing incongruent audio-visual, auditory and also visual speech stimuli, which may reflect the increased effort to perform the task. Hearing loss modulated both the audio-visual integration strength measured with the McGurk illusion and brain activation in frontal areas in the sentence task, showing stronger integration and higher brain activation with increasing hearing loss. Incongruent compared to congruent audio-visual speech revealed an opposite brain activation pattern in left ventral postcentral gyrus in both groups, with higher activation in hearing-impaired participants in the incongruent condition. Our results indicate that already mild to moderate hearing loss impacts audio-visual speech processing accompanied by changes in brain activation particularly involving frontal areas. These changes are modulated by the extent of hearing loss.
Collapse
Affiliation(s)
- Stephanie Rosemann
- Biological Psychology, Department of Psychology, Department for Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany.
| | - Christiane M Thiel
- Biological Psychology, Department of Psychology, Department for Medicine and Health Sciences, Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany; Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
46
|
Kubicek C, Gervain J, Lœvenbruck H, Pascalis O, Schwarzer G. Goldilocks versus Goldlöckchen: Visual speech preference for same-rhythm-class languages in 6-month-old infants. INFANT AND CHILD DEVELOPMENT 2018. [DOI: 10.1002/icd.2084] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Claudia Kubicek
- Department of Developmental Psychology; Justus Liebig University Giessen; Giessen Germany
| | - Judit Gervain
- CNRS, Université Paris Descartes; Sorbonne Paris Cité; Paris France
| | - Hélène Lœvenbruck
- Laboratoire de Psychologie et NeuroCognition, UMR CNRS 5105; Université Grenoble Alpes; Grenoble France
| | - Olivier Pascalis
- Laboratoire de Psychologie et NeuroCognition, UMR CNRS 5105; Université Grenoble Alpes; Grenoble France
| | - Gudrun Schwarzer
- Department of Developmental Psychology; Justus Liebig University Giessen; Giessen Germany
| |
Collapse
|
47
|
Irwin J, Avery T, Brancazio L, Turcios J, Ryherd K, Landi N. Electrophysiological Indices of Audiovisual Speech Perception: Beyond the McGurk Effect and Speech in Noise. Multisens Res 2018; 31:39-56. [PMID: 31264595 DOI: 10.1163/22134808-00002580] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/15/2017] [Indexed: 11/19/2022]
Abstract
Visual information on a talker's face can influence what a listener hears. Commonly used approaches to study this include mismatched audiovisual stimuli (e.g., McGurk type stimuli) or visual speech in auditory noise. In this paper we discuss potential limitations of these approaches and introduce a novel visual phonemic restoration method. This method always presents the same visual stimulus (e.g., /ba/) dubbed with a matched auditory stimulus (/ba/) or one that has weakened consonantal information and sounds more /a/-like). When this reduced auditory stimulus (or /a/) is dubbed with the visual /ba/, a visual influence will result in effectively 'restoring' the weakened auditory cues so that the stimulus is perceived as a /ba/. An oddball design in which participants are asked to detect the /a/ among a stream of more frequently occurring /ba/s while either a speaking face or face with no visual speech was used. In addition, the same paradigm was presented for a second contrast in which participants detected /pa/ among /ba/s, a contrast which should be unaltered by the presence of visual speech. Behavioral and some ERP findings reflect the expected phonemic restoration for the /ba/ vs. /a/ contrast; specifically, we observed reduced accuracy and P300 response in the presence of visual speech. Further, we report an unexpected finding of reduced accuracy and P300 response for both speech contrasts in the presence of visual speech, suggesting overall modulation of the auditory signal in the presence of visual speech. Consistent with this, we observed a mismatch negativity (MMN) effect for the /ba/ vs. /pa/ contrast only that was larger in absence of visual speech. We discuss the potential utility for this paradigm for listeners who cannot respond actively, such as infants and individuals with developmental disabilities.
Collapse
Affiliation(s)
- Julia Irwin
- Haskins Laboratories, New Haven, CT, USA.,Southern Connecticut State University, New Haven, CT, USA
| | - Trey Avery
- Haskins Laboratories, New Haven, CT, USA
| | - Lawrence Brancazio
- Haskins Laboratories, New Haven, CT, USA.,Southern Connecticut State University, New Haven, CT, USA
| | - Jacqueline Turcios
- Haskins Laboratories, New Haven, CT, USA.,Southern Connecticut State University, New Haven, CT, USA
| | - Kayleigh Ryherd
- Haskins Laboratories, New Haven, CT, USA.,University of Connecticut, Storrs, CT, USA
| | - Nicole Landi
- Haskins Laboratories, New Haven, CT, USA.,University of Connecticut, Storrs, CT, USA
| |
Collapse
|
48
|
Thye MD, Bednarz HM, Herringshaw AJ, Sartin EB, Kana RK. The impact of atypical sensory processing on social impairments in autism spectrum disorder. Dev Cogn Neurosci 2018; 29:151-167. [PMID: 28545994 PMCID: PMC6987885 DOI: 10.1016/j.dcn.2017.04.010] [Citation(s) in RCA: 239] [Impact Index Per Article: 34.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Revised: 02/25/2017] [Accepted: 04/18/2017] [Indexed: 02/03/2023] Open
Abstract
Altered sensory processing has been an important feature of the clinical descriptions of autism spectrum disorder (ASD). There is evidence that sensory dysregulation arises early in the progression of ASD and impacts social functioning. This paper reviews behavioral and neurobiological evidence that describes how sensory deficits across multiple modalities (vision, hearing, touch, olfaction, gustation, and multisensory integration) could impact social functions in ASD. Theoretical models of ASD and their implications for the relationship between sensory and social functioning are discussed. Furthermore, neural differences in anatomy, function, and connectivity of different regions underlying sensory and social processing are also discussed. We conclude that there are multiple mechanisms through which early sensory dysregulation in ASD could cascade into social deficits across development. Future research is needed to clarify these mechanisms, and specific focus should be given to distinguish between deficits in primary sensory processing and altered top-down attentional and cognitive processes.
Collapse
Affiliation(s)
- Melissa D Thye
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, United States
| | - Haley M Bednarz
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, United States
| | - Abbey J Herringshaw
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, United States
| | - Emma B Sartin
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, United States
| | - Rajesh K Kana
- Department of Psychology, University of Alabama at Birmingham, Birmingham, AL 35233, United States.
| |
Collapse
|
49
|
Do age and linguistic background alter the audiovisual advantage when listening to speech in the presence of energetic and informational masking? Atten Percept Psychophys 2017; 80:242-261. [DOI: 10.3758/s13414-017-1423-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
50
|
Minar NJ, Lewkowicz DJ. Overcoming the other-race effect in infancy with multisensory redundancy: 10-12-month-olds discriminate dynamic other-race faces producing speech. Dev Sci 2017; 21:e12604. [PMID: 28944541 DOI: 10.1111/desc.12604] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Accepted: 07/03/2017] [Indexed: 11/30/2022]
Abstract
We tested 4-6- and 10-12-month-old infants to investigate whether the often-reported decline in infant sensitivity to other-race faces may reflect responsiveness to static or dynamic/silent faces rather than a general process of perceptual narrowing. Across three experiments, we tested discrimination of either dynamic own-race or other-race faces which were either accompanied by a speech syllable, no sound, or a non-speech sound. Results indicated that 4-6- and 10-12-month-old infants discriminated own-race as well as other-race faces accompanied by a speech syllable, that only the 10-12-month-olds discriminated silent own-race faces, and that 4-6-month-old infants discriminated own-race and other-race faces accompanied by a non-speech sound but that 10-12-month-old infants only discriminated own-race faces accompanied by a non-speech sound. Overall, the results suggest that the ORE reported to date reflects infant responsiveness to static or dynamic/silent faces rather than a general process of perceptual narrowing.
Collapse
Affiliation(s)
- Nicholas J Minar
- Institute for the Study of Child Development, Rutgers Robert Wood Johnson Medical School, New Brunswick, NJ, USA
| | - David J Lewkowicz
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, USA
| |
Collapse
|