1
|
Çetinçelik M, Jordan-Barros A, Rowland CF, Snijders TM. The effect of visual speech cues on neural tracking of speech in 10-month-old infants. Eur J Neurosci 2024; 60:5381-5399. [PMID: 39188179 DOI: 10.1111/ejn.16492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 07/04/2024] [Accepted: 07/20/2024] [Indexed: 08/28/2024]
Abstract
While infants' sensitivity to visual speech cues and the benefit of these cues have been well-established by behavioural studies, there is little evidence on the effect of visual speech cues on infants' neural processing of continuous auditory speech. In this study, we investigated whether visual speech cues, such as the movements of the lips, jaw, and larynx, facilitate infants' neural speech tracking. Ten-month-old Dutch-learning infants watched videos of a speaker reciting passages in infant-directed speech while electroencephalography (EEG) was recorded. In the videos, either the full face of the speaker was displayed or the speaker's mouth and jaw were masked with a block, obstructing the visual speech cues. To assess neural tracking, speech-brain coherence (SBC) was calculated, focusing particularly on the stress and syllabic rates (1-1.75 and 2.5-3.5 Hz respectively in our stimuli). First, overall, SBC was compared to surrogate data, and then, differences in SBC in the two conditions were tested at the frequencies of interest. Our results indicated that infants show significant tracking at both stress and syllabic rates. However, no differences were identified between the two conditions, meaning that infants' neural tracking was not modulated further by the presence of visual speech cues. Furthermore, we demonstrated that infants' neural tracking of low-frequency information is related to their subsequent vocabulary development at 18 months. Overall, this study provides evidence that infants' neural tracking of speech is not necessarily impaired when visual speech cues are not fully visible and that neural tracking may be a potential mechanism in successful language acquisition.
Collapse
Affiliation(s)
- Melis Çetinçelik
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Department of Experimental Psychology, Utrecht University, Utrecht, The Netherlands
- Cognitive Neuropsychology Department, Tilburg University, Tilburg, The Netherlands
| | - Antonia Jordan-Barros
- Centre for Brain and Cognitive Development, Department of Psychological Science, Birkbeck, University of London, London, UK
- Experimental Psychology, University College London, London, UK
| | - Caroline F Rowland
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Cognitive Neuropsychology Department, Tilburg University, Tilburg, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
2
|
Weng Y, Rong Y, Peng G. The development of audiovisual speech perception in Mandarin-speaking children: Evidence from the McGurk paradigm. Child Dev 2024; 95:750-765. [PMID: 37843038 DOI: 10.1111/cdev.14022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2023] [Revised: 08/30/2023] [Accepted: 09/21/2023] [Indexed: 10/17/2023]
Abstract
The developmental trajectory of audiovisual speech perception in Mandarin-speaking children remains understudied. This cross-sectional study in Mandarin-speaking 3- to 4-year-old, 5- to 6-year-old, 7- to 8-year-old children, and adults from Xiamen, China (n = 87, 44 males) investigated this issue using the McGurk paradigm with three levels of auditory noise. For the identification of congruent stimuli, 3- to 4-year-olds underperformed older groups whose performances were comparable. For the perception of the incongruent stimuli, a developmental shift was observed as 3- to 4-year-olds made significantly more audio-dominant but fewer audiovisual-integrated responses to incongruent stimuli than older groups. With increasing auditory noise, the difference between children and adults widened in identifying congruent stimuli but narrowed in perceiving incongruent ones. The findings regarding noise effects agree with the statistically optimal hypothesis.
Collapse
Affiliation(s)
- Yi Weng
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Yicheng Rong
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Gang Peng
- Department of Chinese and Bilingual Studies, Research Centre for Language, Cognition, and Neuroscience, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
3
|
Çetinçelik M, Rowland CF, Snijders TM. Does the speaker's eye gaze facilitate infants' word segmentation from continuous speech? An ERP study. Dev Sci 2024; 27:e13436. [PMID: 37551932 DOI: 10.1111/desc.13436] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 04/20/2023] [Accepted: 07/04/2023] [Indexed: 08/09/2023]
Abstract
The environment in which infants learn language is multimodal and rich with social cues. Yet, the effects of such cues, such as eye contact, on early speech perception have not been closely examined. This study assessed the role of ostensive speech, signalled through the speaker's eye gaze direction, on infants' word segmentation abilities. A familiarisation-then-test paradigm was used while electroencephalography (EEG) was recorded. Ten-month-old Dutch-learning infants were familiarised with audio-visual stories in which a speaker recited four sentences with one repeated target word. The speaker addressed them either with direct or with averted gaze while speaking. In the test phase following each story, infants heard familiar and novel words presented via audio-only. Infants' familiarity with the words was assessed using event-related potentials (ERPs). As predicted, infants showed a negative-going ERP familiarity effect to the isolated familiarised words relative to the novel words over the left-frontal region of interest during the test phase. While the word familiarity effect did not differ as a function of the speaker's gaze over the left-frontal region of interest, there was also a (not predicted) positive-going early ERP familiarity effect over right fronto-central and central electrodes in the direct gaze condition only. This study provides electrophysiological evidence that infants can segment words from audio-visual speech, regardless of the ostensiveness of the speaker's communication. However, the speaker's gaze direction seems to influence the processing of familiar words. RESEARCH HIGHLIGHTS: We examined 10-month-old infants' ERP word familiarity response using audio-visual stories, in which a speaker addressed infants with direct or averted gaze while speaking. Ten-month-old infants can segment and recognise familiar words from audio-visual speech, indicated by their negative-going ERP response to familiar, relative to novel, words. This negative-going ERP word familiarity effect was present for isolated words over left-frontal electrodes regardless of whether the speaker offered eye contact while speaking. An additional positivity in response to familiar words was observed for direct gaze only, over right fronto-central and central electrodes.
Collapse
Affiliation(s)
- Melis Çetinçelik
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
| | - Caroline F Rowland
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Gelderland, The Netherlands
| | - Tineke M Snijders
- Max Planck Institute for Psycholinguistics, Nijmegen, Gelderland, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Gelderland, The Netherlands
- Cognitive Neuropsychology Department, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
4
|
Nematova S, Zinszer B, Jasinska KK. Exploring audiovisual speech perception in monolingual and bilingual children in Uzbekistan. J Exp Child Psychol 2024; 239:105808. [PMID: 37972516 DOI: 10.1016/j.jecp.2023.105808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/05/2023] [Accepted: 10/23/2023] [Indexed: 11/19/2023]
Abstract
This study aimed to investigate the development of audiovisual speech perception in monolingual Uzbek-speaking and bilingual Uzbek-Russian-speaking children, focusing on the impact of language experience on audiovisual speech perception and the role of visual phonetic (i.e., mouth movements corresponding to phonetic/lexical information) and temporal (i.e., timing of speech signals) cues. A total of 321 children aged 4 to 10 years in Tashkent, Uzbekistan, discriminated /ba/ and /da/ syllables across three conditions: auditory-only, audiovisual phonetic (i.e., sound accompanied by mouth movements), and audiovisual temporal (i.e., sound onset/offset accompanied by mouth opening/closing). Effects of modality (audiovisual phonetic, audiovisual temporal, or audio-only cues), age, group (monolingual or bilingual), and their interactions were tested using a Bayesian regression model. Overall, older participants performed better than younger participants. Participants performed better in the audiovisual phonetic modality compared with the auditory modality. However, no significant difference between monolingual and bilingual children was observed across all modalities. This finding stands in contrast to earlier studies. We attribute the contrasting findings of our study and the existing literature to the cross-linguistic similarity of the language pairs involved. When the languages spoken by bilinguals exhibit substantial linguistic similarity, there may be an increased necessity to disambiguate speech signals, leading to a greater reliance on audiovisual cues. The limited phonological similarity between Uzbek and Russian might have minimized bilinguals' need to rely on visual speech cues, contributing to the lack of group differences in our study.
Collapse
Affiliation(s)
- Shakhlo Nematova
- Department of Linguistics and Cognitive Science, University of Delaware, Newark, DE 19716, USA.
| | - Benjamin Zinszer
- Department of Psychology, Swarthmore College, Swarthmore, PA 19081, USA
| | - Kaja K Jasinska
- Department of Applied Psychology and Human Development, University of Toronto, Toronto, ON M5S 1A1, Canada
| |
Collapse
|
5
|
Lalonde K, Peng ZE, Halverson DM, Dwyer GA. Children's use of spatial and visual cues for release from perceptual masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1559-1569. [PMID: 38393738 PMCID: PMC10890829 DOI: 10.1121/10.0024766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 01/19/2024] [Accepted: 01/22/2024] [Indexed: 02/25/2024]
Abstract
This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7-9 years of age) using a color-number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at -90° azimuth, with the signal from the -90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Z Ellen Peng
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Destinee M Halverson
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Grace A Dwyer
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| |
Collapse
|
6
|
Zeng B, Yu G, Hasshim N, Hong S. Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones. J Eye Mov Res 2023; 16:10.16910/jemr.16.4.4. [PMID: 38585238 PMCID: PMC10997307 DOI: 10.16910/jemr.16.4.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024] Open
Abstract
The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers' eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants' eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information.
Collapse
Affiliation(s)
- Biao Zeng
- University of South Wales, Pontypridd, UK
| | | | | | - Shanhu Hong
- Quanzhou Preschool Education College, Quanzhou, China
| |
Collapse
|
7
|
Hong S, Wang R, Zeng B. Incongruent visual cues affect the perception of Mandarin vowel but not tone. Front Psychol 2023; 13:971979. [PMID: 36687891 PMCID: PMC9846355 DOI: 10.3389/fpsyg.2022.971979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 12/01/2022] [Indexed: 01/06/2023] Open
Abstract
Over the recent few decades, a large number of audiovisual speech studies have been focusing on the visual cues of consonants and vowels but neglecting those relating to lexical tones. In this study, we investigate whether incongruent audiovisual information interfered with the perception of lexical tones. We found that, for both Chinese and English speakers, incongruence between auditory and visemic mouth shape (i.e., visual form information) significantly interfered with reaction time and reduced the identification accuracy of vowels. However, incongruent lip movements (i.e., visual timing information) did not interfere with the perception of auditory lexical tone. We conclude that, in contrast to vowel perception, auditory tone perception seems relatively impervious to visual congruence cues, at least under these restricted laboratory conditions. The salience of visual form and timing information is discussed based on this finding.
Collapse
Affiliation(s)
- Shanhu Hong
- Institute of Foreign Language and Tourism, Quanzhou Preschool Education College, Quanzhou, China,Department of Psychology, Bournemouth University, Poole, United Kingdom
| | - Rui Wang
- School of Foreign Languages, Guangdong Pharmaceutical University, Guangzhou, China
| | - Biao Zeng
- Department of Psychology, Bournemouth University, Poole, United Kingdom,EEG Lab, Department of Psychology, University of South Wales, Newport, United Kingdom,*Correspondence: Biao Zeng ✉
| |
Collapse
|
8
|
Zamuner TS, Rabideau T, McDonald M, Yeung HH. Developmental change in children's speech processing of auditory and visual cues: An eyetracking study. JOURNAL OF CHILD LANGUAGE 2023; 50:27-51. [PMID: 36503546 DOI: 10.1017/s0305000921000684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.
Collapse
Affiliation(s)
| | | | - Margarethe McDonald
- Department of Linguistics, University of Ottawa, Canada
- School of Psychology, University of Ottawa, Canada
| | - H Henny Yeung
- Department of Linguistics, Simon Fraser University, Canada
- Integrative Neuroscience and Cognition Centre, UMR 8002, CNRS and University of Paris, France
| |
Collapse
|
9
|
Neurodevelopmental oscillatory basis of speech processing in noise. Dev Cogn Neurosci 2022; 59:101181. [PMID: 36549148 PMCID: PMC9792357 DOI: 10.1016/j.dcn.2022.101181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 10/31/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022] Open
Abstract
Humans' extraordinary ability to understand speech in noise relies on multiple processes that develop with age. Using magnetoencephalography (MEG), we characterize the underlying neuromaturational basis by quantifying how cortical oscillations in 144 participants (aged 5-27 years) track phrasal and syllabic structures in connected speech mixed with different types of noise. While the extraction of prosodic cues from clear speech was stable during development, its maintenance in a multi-talker background matured rapidly up to age 9 and was associated with speech comprehension. Furthermore, while the extraction of subtler information provided by syllables matured at age 9, its maintenance in noisy backgrounds progressively matured until adulthood. Altogether, these results highlight distinct behaviorally relevant maturational trajectories for the neuronal signatures of speech perception. In accordance with grain-size proposals, neuromaturational milestones are reached increasingly late for linguistic units of decreasing size, with further delays incurred by noise.
Collapse
|
10
|
Lozano I, López Pérez D, Laudańska Z, Malinowska‐Korczak A, Szmytke M, Radkowska A, Tomalski P. Changes in selective attention to articulating mouth across infancy: Sex differences and associations with language outcomes. INFANCY 2022; 27:1132-1153. [DOI: 10.1111/infa.12496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2021] [Revised: 05/27/2022] [Accepted: 07/15/2022] [Indexed: 11/29/2022]
Affiliation(s)
- Itziar Lozano
- Department of Cognitive Psychology and Neurocognitive Science Faculty of Psychology, University of Warsaw Warsaw Poland
- Universidad Autónoma de Madrid, Faculty of Psychology Madrid Spain
| | - David López Pérez
- Neurocognitive Development Lab, Institute of Psychology, Polish Academy of Sciences Warsaw Poland
| | - Zuzanna Laudańska
- Neurocognitive Development Lab, Institute of Psychology, Polish Academy of Sciences Warsaw Poland
| | - Anna Malinowska‐Korczak
- Neurocognitive Development Lab, Institute of Psychology, Polish Academy of Sciences Warsaw Poland
| | - Magdalena Szmytke
- Neurocognitive Development Lab, Faculty of Psychology, University of Warsaw Warsaw Poland
| | - Alicja Radkowska
- Neurocognitive Development Lab, Institute of Psychology, Polish Academy of Sciences Warsaw Poland
- Neurocognitive Development Lab, Faculty of Psychology, University of Warsaw Warsaw Poland
| | - Przemysław Tomalski
- Neurocognitive Development Lab, Institute of Psychology, Polish Academy of Sciences Warsaw Poland
| |
Collapse
|
11
|
Zhou P, Zong S, Xi X, Xiao H. Effect of wearing personal protective equipment on acoustic characteristics and speech perception during COVID-19. APPLIED ACOUSTICS. ACOUSTIQUE APPLIQUE. ANGEWANDTE AKUSTIK 2022; 197:108940. [PMID: 35892074 PMCID: PMC9304077 DOI: 10.1016/j.apacoust.2022.108940] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 05/25/2022] [Accepted: 07/17/2022] [Indexed: 06/15/2023]
Abstract
With the COVID-19 pandemic, the usage of personal protective equipment (PPE) has become 'the new normal'. Both surgical masks and N95 masks with a face shield are widely used in healthcare settings to reduce virus transmission, but the use of these masks has a negative impact on speech perception. Therefore, transparent masks are recommended to solve this dilemma. However, there is a lack of quantitative studies regarding the effect of PPE on speech perception. This study aims to compare the effect on speech perception of different types of PPE (surgical masks, N95 masks with face shield and transparent masks) in healthcare settings, for listeners with normal hearing in the audiovisual or auditory-only modality. The Bamford-Kowal-Bench (BKB)-like Mandarin speech stimuli were digitally recorded by a G.R.A.S KEMAR manikin without and with masks (surgical masks, N95 masks with face shield and transparent masks). Two variants of video display were created (with or without visual cues) and tagged to the corresponding audio recordings. The speech recording and video were presented to listeners simultaneously in each of four conditions: unattenuated speech with visual cues (no mask); surgical mask attenuated speech without visual cues; N95 mask with face shield attenuated speech without visual cues; and transparent mask attenuated speech with visual cues. The signal-to-noise ratio for 50 % correct scores (SNR50) threshold in noise was measured for each condition in the presence of four-talker babble. Twenty-four subjects completed the experiment. Acoustic spectra obtained from all types of masks were primarily attenuated at high frequencies, beyond 3 kHz, but to different extents. The mean SNR50 thresholds of the two auditory-only conditions (surgical mask and N95 mask with face shield) were higher than those of the audiovisual conditions (no mask and transparent mask). SNR50 thresholds in the surgical-mask conditions were significantly lower than those for the N95 masks with face shield. No significant difference was observed between the two audiovisual conditions. The results confirm that wearing a surgical mask or an N95 mask with face shield has a negative impact on speech perception. However, wearing a transparent mask improved speech perception to a similar level as unmasked condition for young normal-hearing listeners.
Collapse
Affiliation(s)
- Peng Zhou
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Shimin Zong
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| | - Xin Xi
- Senior Department of Otolaryngology - Head & Neck Surgery, The Sixth Medical Center of PLA General Hospital, Beijing, China
| | - Hongjun Xiao
- Department of Otorhinolaryngology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430022, China
| |
Collapse
|
12
|
Lalonde K, Buss E, Miller MK, Leibold LJ. Face Masks Impact Auditory and Audiovisual Consonant Recognition in Children With and Without Hearing Loss. Front Psychol 2022; 13:874345. [PMID: 35645844 PMCID: PMC9137424 DOI: 10.3389/fpsyg.2022.874345] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 03/22/2022] [Indexed: 11/16/2022] Open
Abstract
Teachers and students are wearing face masks in many classrooms to limit the spread of the coronavirus. Face masks disrupt speech understanding by concealing lip-reading cues and reducing transmission of high-frequency acoustic speech content. Transparent masks provide greater access to visual speech cues than opaque masks but tend to cause greater acoustic attenuation. This study examined the effects of four types of face masks on auditory-only and audiovisual speech recognition in 18 children with bilateral hearing loss, 16 children with normal hearing, and 38 adults with normal hearing tested in their homes, as well as 15 adults with normal hearing tested in the laboratory. Stimuli simulated the acoustic attenuation and visual obstruction caused by four different face masks: hospital, fabric, and two transparent masks. Participants tested in their homes completed auditory-only and audiovisual consonant recognition tests with speech-spectrum noise at 0 dB SNR. Adults tested in the lab completed the same tests at 0 and/or -10 dB SNR. A subset of participants from each group completed a visual-only consonant recognition test with no mask. Consonant recognition accuracy and transmission of three phonetic features (place of articulation, manner of articulation, and voicing) were analyzed using linear mixed-effects models. Children with hearing loss identified consonants less accurately than children with normal hearing and adults with normal hearing tested at 0 dB SNR. However, all the groups were similarly impacted by face masks. Under auditory-only conditions, results were consistent with the pattern of high-frequency acoustic attenuation; hospital masks had the least impact on performance. Under audiovisual conditions, transparent masks had less impact on performance than opaque masks. High-frequency attenuation and visual obstruction had the greatest impact on place perception. The latter finding was consistent with the visual-only feature transmission data. These results suggest that the combination of noise and face masks negatively impacts speech understanding in children. The best mask for promoting speech understanding in noisy environments depend on whether visual cues will be accessible: hospital masks are best under auditory-only conditions, but well-fit transparent masks are best when listeners have a clear, consistent view of the talker's face.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Audiovisual Speech Processing Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| | - Emily Buss
- Speech Perception and Auditory Research at Carolina Laboratory, Department of Otolaryngology Head and Neck Surgery, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Margaret K. Miller
- Human Auditory Development Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| | - Lori J. Leibold
- Human Auditory Development Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| |
Collapse
|
13
|
Audiovisual speech recognition for Kannada language using feed forward neural network. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07249-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
14
|
Cieśla K, Wolak T, Lorens A, Mentzel M, Skarżyński H, Amedi A. Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding. Sci Rep 2022; 12:3206. [PMID: 35217676 PMCID: PMC8881456 DOI: 10.1038/s41598-022-06855-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
Understanding speech in background noise is challenging. Wearing face-masks, imposed by the COVID19-pandemics, makes it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on the fingertips. The vibrations correspond to low frequencies extracted from the speech input. We trained two groups of non-native English speakers in understanding distorted speech in noise. After a short session (30-45 min) of repeating sentences, with or without concurrent matching vibrations, we showed comparable mean group improvement of 14-16 dB in Speech Reception Threshold (SRT) in two test conditions, i.e., when the participants were asked to repeat sentences only from hearing and also when matching vibrations on fingertips were present. This is a very strong effect, if one considers that a 10 dB difference corresponds to doubling of the perceived loudness. The number of sentence repetitions needed for both types of training to complete the task was comparable. Meanwhile, the mean group SNR for the audio-tactile training (14.7 ± 8.7) was significantly lower (harder) than for the auditory training (23.9 ± 11.8), which indicates a potential facilitating effect of the added vibrations. In addition, both before and after training most of the participants (70-80%) showed better performance (by mean 4-6 dB) in speech-in-noise understanding when the audio sentences were accompanied with matching vibrations. This is the same magnitude of multisensory benefit that we reported, with no training at all, in our previous study using the same experimental procedures. After training, performance in this test condition was also best in both groups (SRT ~ 2 dB). The least significant effect of both training types was found in the third test condition, i.e. when participants were repeating sentences accompanied with non-matching tactile vibrations and the performance in this condition was also poorest after training. The results indicate that both types of training may remove some level of difficulty in sound perception, which might enable a more proper use of speech inputs delivered via vibrotactile stimulation. We discuss the implications of these novel findings with respect to basic science. In particular, we show that even in adulthood, i.e. long after the classical "critical periods" of development have passed, a new pairing between a certain computation (here, speech processing) and an atypical sensory modality (here, touch) can be established and trained, and that this process can be rapid and intuitive. We further present possible applications of our training program and the SSD for auditory rehabilitation in patients with hearing (and sight) deficits, as well as healthy individuals in suboptimal acoustic situations.
Collapse
Affiliation(s)
- K Cieśla
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel. .,World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland.
| | - T Wolak
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Lorens
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - M Mentzel
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| | - H Skarżyński
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Amedi
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| |
Collapse
|
15
|
Gijbels L, Yeatman JD, Lalonde K, Lee AKC. Audiovisual Speech Processing in Relationship to Phonological and Vocabulary Skills in First Graders. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5022-5040. [PMID: 34735292 PMCID: PMC9150669 DOI: 10.1044/2021_jslhr-21-00196] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/06/2021] [Accepted: 08/11/2021] [Indexed: 06/13/2023]
Abstract
PURPOSE It is generally accepted that adults use visual cues to improve speech intelligibility in noisy environments, but findings regarding visual speech benefit in children are mixed. We explored factors that contribute to audiovisual (AV) gain in young children's speech understanding. We examined whether there is an AV benefit to speech-in-noise recognition in children in first grade and if visual salience of phonemes influences their AV benefit. We explored if individual differences in AV speech enhancement could be explained by vocabulary knowledge, phonological awareness, or general psychophysical testing performance. METHOD Thirty-seven first graders completed online psychophysical experiments. We used an online single-interval, four-alternative forced-choice picture-pointing task with age-appropriate consonant-vowel-consonant words to measure auditory-only, visual-only, and AV word recognition in noise at -2 and -8 dB SNR. We obtained standard measures of vocabulary and phonological awareness and included a general psychophysical test to examine correlations with AV benefits. RESULTS We observed a significant overall AV gain among children in first grade. This effect was mainly attributed to the benefit at -8 dB SNR, for visually distinct targets. Individual differences were not explained by any of the child variables. Boys showed lower auditory-only performances, leading to significantly larger AV gains. CONCLUSIONS This study shows AV benefit, of distinctive visual cues, to word recognition in challenging noisy conditions in first graders. The cognitive and linguistic constraints of the task may have minimized the impact of individual differences of vocabulary and phonological awareness on AV benefit. The gender difference should be studied on a larger sample and age range.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| | - Jason D. Yeatman
- Division of Developmental-Behavioral Pediatrics, School of Medicine, Stanford University, CA
- Graduate School of Education, Stanford University, CA
| | - Kaylah Lalonde
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE
| | - Adrian K. C. Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| |
Collapse
|
16
|
Gijbels L, Cai R, Donnelly PM, Kuhl PK. Designing Virtual, Moderated Studies of Early Childhood Development. Front Psychol 2021; 12:740290. [PMID: 34707545 PMCID: PMC8542922 DOI: 10.3389/fpsyg.2021.740290] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/06/2021] [Indexed: 11/13/2022] Open
Abstract
With increased public access to the Internet and digital tools, web-based research has gained prevalence over the past decades. However, digital adaptations for developmental research involving children have received relatively little attention. In 2020, as the COVID-19 pandemic led to reduced social contact, causing many developmental university research laboratories to close, the scientific community began to investigate online research methods that would allow continued work. Limited resources and documentation of factors that are essential for developmental research (e.g., caregiver involvement, informed assent, controlling environmental distractions at home for children) make the transition from in-person to online research especially difficult for developmental scientists. Recognizing this, we aim to contribute to the field by describing three separate moderated virtual behavioral assessments in children ranging from 4 to 13years of age that were highly successful. The three studies encompass speech production, speech perception, and reading fluency. However varied the domains we chose, the different age groups targeted by each study and different methodological approaches, the success of our virtual adaptations shared certain commonalities with regard to how to achieve informed consent, how to plan parental involvement, how to design studies that attract and hold children's attention and valid data collection procedures. Our combined work suggests principles for future facilitation of online developmental work. Considerations derived from these studies can serve as documented points of departure that inform and encourage additional virtual adaptations in this field.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle, WA, United States
- Institute for Learning & Brain Sciences, University of Washington, Seattle, WA, United States
| | - Ruofan Cai
- Department of Speech & Hearing Sciences, University of Washington, Seattle, WA, United States
- Institute for Learning & Brain Sciences, University of Washington, Seattle, WA, United States
| | - Patrick M. Donnelly
- Department of Speech & Hearing Sciences, University of Washington, Seattle, WA, United States
- Institute for Learning & Brain Sciences, University of Washington, Seattle, WA, United States
| | - Patricia K. Kuhl
- Department of Speech & Hearing Sciences, University of Washington, Seattle, WA, United States
- Institute for Learning & Brain Sciences, University of Washington, Seattle, WA, United States
| |
Collapse
|