1
|
Nematova S, Zinszer B, Jasinska KK. Exploring audiovisual speech perception in monolingual and bilingual children in Uzbekistan. J Exp Child Psychol 2024; 239:105808. [PMID: 37972516 DOI: 10.1016/j.jecp.2023.105808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 10/05/2023] [Accepted: 10/23/2023] [Indexed: 11/19/2023]
Abstract
This study aimed to investigate the development of audiovisual speech perception in monolingual Uzbek-speaking and bilingual Uzbek-Russian-speaking children, focusing on the impact of language experience on audiovisual speech perception and the role of visual phonetic (i.e., mouth movements corresponding to phonetic/lexical information) and temporal (i.e., timing of speech signals) cues. A total of 321 children aged 4 to 10 years in Tashkent, Uzbekistan, discriminated /ba/ and /da/ syllables across three conditions: auditory-only, audiovisual phonetic (i.e., sound accompanied by mouth movements), and audiovisual temporal (i.e., sound onset/offset accompanied by mouth opening/closing). Effects of modality (audiovisual phonetic, audiovisual temporal, or audio-only cues), age, group (monolingual or bilingual), and their interactions were tested using a Bayesian regression model. Overall, older participants performed better than younger participants. Participants performed better in the audiovisual phonetic modality compared with the auditory modality. However, no significant difference between monolingual and bilingual children was observed across all modalities. This finding stands in contrast to earlier studies. We attribute the contrasting findings of our study and the existing literature to the cross-linguistic similarity of the language pairs involved. When the languages spoken by bilinguals exhibit substantial linguistic similarity, there may be an increased necessity to disambiguate speech signals, leading to a greater reliance on audiovisual cues. The limited phonological similarity between Uzbek and Russian might have minimized bilinguals' need to rely on visual speech cues, contributing to the lack of group differences in our study.
Collapse
Affiliation(s)
- Shakhlo Nematova
- Department of Linguistics and Cognitive Science, University of Delaware, Newark, DE 19716, USA.
| | - Benjamin Zinszer
- Department of Psychology, Swarthmore College, Swarthmore, PA 19081, USA
| | - Kaja K Jasinska
- Department of Applied Psychology and Human Development, University of Toronto, Toronto, ON M5S 1A1, Canada
| |
Collapse
|
2
|
Lalonde K, Peng ZE, Halverson DM, Dwyer GA. Children's use of spatial and visual cues for release from perceptual masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1559-1569. [PMID: 38393738 PMCID: PMC10890829 DOI: 10.1121/10.0024766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2023] [Revised: 01/19/2024] [Accepted: 01/22/2024] [Indexed: 02/25/2024]
Abstract
This study examined the role of visual speech in providing release from perceptual masking in children by comparing visual speech benefit across conditions with and without a spatial separation cue. Auditory-only and audiovisual speech recognition thresholds in a two-talker speech masker were obtained from 21 children with typical hearing (7-9 years of age) using a color-number identification task. The target was presented from a loudspeaker at 0° azimuth. Masker source location varied across conditions. In the spatially collocated condition, the masker was also presented from the loudspeaker at 0° azimuth. In the spatially separated condition, the masker was presented from the loudspeaker at 0° azimuth and a loudspeaker at -90° azimuth, with the signal from the -90° loudspeaker leading the signal from the 0° loudspeaker by 4 ms. The visual stimulus (static image or video of the target talker) was presented at 0° azimuth. Children achieved better thresholds when the spatial cue was provided and when the visual cue was provided. Visual and spatial cue benefit did not differ significantly depending on the presence of the other cue. Additional studies are needed to characterize how children's preferential use of visual and spatial cues varies depending on the strength of each cue.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Z Ellen Peng
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Destinee M Halverson
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Grace A Dwyer
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| |
Collapse
|
3
|
Gijbels L, Lee AKC, Yeatman JD. Children with developmental dyslexia have equivalent audiovisual speech perception performance but their perceptual weights differ. Dev Sci 2024; 27:e13431. [PMID: 37403418 DOI: 10.1111/desc.13431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Revised: 05/18/2023] [Accepted: 06/19/2023] [Indexed: 07/06/2023]
Abstract
As reading is inherently a multisensory, audiovisual (AV) process where visual symbols (i.e., letters) are connected to speech sounds, the question has been raised whether individuals with reading difficulties, like children with developmental dyslexia (DD), have broader impairments in multisensory processing. This question has been posed before, yet it remains unanswered due to (a) the complexity and contentious etiology of DD along with (b) lack of consensus on developmentally appropriate AV processing tasks. We created an ecologically valid task for measuring multisensory AV processing by leveraging the natural phenomenon that speech perception improves when listeners are provided visual information from mouth movements (particularly when the auditory signal is degraded). We designed this AV processing task with low cognitive and linguistic demands such that children with and without DD would have equal unimodal (auditory and visual) performance. We then collected data in a group of 135 children (age 6.5-15) with an AV speech perception task to answer the following questions: (1) How do AV speech perception benefits manifest in children, with and without DD? (2) Do children all use the same perceptual weights to create AV speech perception benefits, and (3) what is the role of phonological processing in AV speech perception? We show that children with and without DD have equal AV speech perception benefits on this task, but that children with DD rely less on auditory processing in more difficult listening situations to create these benefits and weigh both incoming information streams differently. Lastly, any reported differences in speech perception in children with DD might be better explained by differences in phonological processing than differences in reading skills. RESEARCH HIGHLIGHTS: Children with versus without developmental dyslexia have equal audiovisual speech perception benefits, regardless of their phonological awareness or reading skills. Children with developmental dyslexia rely less on auditory performance to create audiovisual speech perception benefits. Individual differences in speech perception in children might be better explained by differences in phonological processing than differences in reading skills.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Adrian K C Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle, Washington, USA
- University of Washington, Institute for Learning & Brain Sciences, Seattle, Washington, USA
| | - Jason D Yeatman
- Division of Developmental-Behavioral Pediatrics, Stanford University School of Medicine, Stanford, California, USA
- Stanford University Graduate School of Education, Stanford, California, USA
- Stanford University Department of Psychology, Stanford, California, USA
| |
Collapse
|
4
|
Zamuner TS, Rabideau T, McDonald M, Yeung HH. Developmental change in children's speech processing of auditory and visual cues: An eyetracking study. JOURNAL OF CHILD LANGUAGE 2023; 50:27-51. [PMID: 36503546 DOI: 10.1017/s0305000921000684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood.
Collapse
Affiliation(s)
| | | | - Margarethe McDonald
- Department of Linguistics, University of Ottawa, Canada
- School of Psychology, University of Ottawa, Canada
| | - H Henny Yeung
- Department of Linguistics, Simon Fraser University, Canada
- Integrative Neuroscience and Cognition Centre, UMR 8002, CNRS and University of Paris, France
| |
Collapse
|
5
|
Lalonde K, Buss E, Miller MK, Leibold LJ. Face Masks Impact Auditory and Audiovisual Consonant Recognition in Children With and Without Hearing Loss. Front Psychol 2022; 13:874345. [PMID: 35645844 PMCID: PMC9137424 DOI: 10.3389/fpsyg.2022.874345] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Accepted: 03/22/2022] [Indexed: 11/16/2022] Open
Abstract
Teachers and students are wearing face masks in many classrooms to limit the spread of the coronavirus. Face masks disrupt speech understanding by concealing lip-reading cues and reducing transmission of high-frequency acoustic speech content. Transparent masks provide greater access to visual speech cues than opaque masks but tend to cause greater acoustic attenuation. This study examined the effects of four types of face masks on auditory-only and audiovisual speech recognition in 18 children with bilateral hearing loss, 16 children with normal hearing, and 38 adults with normal hearing tested in their homes, as well as 15 adults with normal hearing tested in the laboratory. Stimuli simulated the acoustic attenuation and visual obstruction caused by four different face masks: hospital, fabric, and two transparent masks. Participants tested in their homes completed auditory-only and audiovisual consonant recognition tests with speech-spectrum noise at 0 dB SNR. Adults tested in the lab completed the same tests at 0 and/or -10 dB SNR. A subset of participants from each group completed a visual-only consonant recognition test with no mask. Consonant recognition accuracy and transmission of three phonetic features (place of articulation, manner of articulation, and voicing) were analyzed using linear mixed-effects models. Children with hearing loss identified consonants less accurately than children with normal hearing and adults with normal hearing tested at 0 dB SNR. However, all the groups were similarly impacted by face masks. Under auditory-only conditions, results were consistent with the pattern of high-frequency acoustic attenuation; hospital masks had the least impact on performance. Under audiovisual conditions, transparent masks had less impact on performance than opaque masks. High-frequency attenuation and visual obstruction had the greatest impact on place perception. The latter finding was consistent with the visual-only feature transmission data. These results suggest that the combination of noise and face masks negatively impacts speech understanding in children. The best mask for promoting speech understanding in noisy environments depend on whether visual cues will be accessible: hospital masks are best under auditory-only conditions, but well-fit transparent masks are best when listeners have a clear, consistent view of the talker's face.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Audiovisual Speech Processing Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| | - Emily Buss
- Speech Perception and Auditory Research at Carolina Laboratory, Department of Otolaryngology Head and Neck Surgery, University of North Carolina School of Medicine, Chapel Hill, NC, United States
| | - Margaret K. Miller
- Human Auditory Development Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| | - Lori J. Leibold
- Human Auditory Development Laboratory, Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE, United States
| |
Collapse
|
6
|
Jessica Tan SH, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: The Relationship between Cortical Tracking of Continuous Auditory-Visual Speech and Gaze Behaviour in Infants, Children and Adults. Neuroimage 2022; 256:119217. [PMID: 35436614 DOI: 10.1016/j.neuroimage.2022.119217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 04/09/2022] [Accepted: 04/14/2022] [Indexed: 11/24/2022] Open
Abstract
An auditory-visual speech benefit, the benefit that visual speech cues bring to auditory speech perception, is experienced from early on in infancy and continues to be experienced to an increasing degree with age. While there is both behavioural and neurophysiological evidence for children and adults, only behavioural evidence exists for infants - as no neurophysiological study has provided a comprehensive examination of the auditory-visual speech benefit in infants. It is also surprising that most studies on auditory-visual speech benefit do not concurrently report looking behaviour especially since the auditory-visual speech benefit rests on the assumption that listeners attend to a speaker's talking face and that there are meaningful individual differences in looking behaviour. To address these gaps, we simultaneously recorded electroencephalographic (EEG) and eye-tracking data of 5-month-olds, 4-year-olds and adults as they were presented with a speaker in auditory-only (AO), visual-only (VO), and auditory-visual (AV) modes. Cortical tracking analyses that involved forward encoding models of the speech envelope revealed that there was an auditory-visual speech benefit [i.e., AV > (A+V)], evident in 5-month-olds and adults but not 4-year-olds. Examination of cortical tracking accuracy in relation to looking behaviour, showed that infants' relative attention to the speaker's mouth (vs. eyes) was positively correlated with cortical tracking accuracy of VO speech, whereas adults' attention to the display overall was negatively correlated with cortical tracking accuracy of VO speech. This study provides the first neurophysiological evidence of auditory-visual speech benefit in infants and our results suggest ways in which current models of speech processing can be fine-tuned.
Collapse
Affiliation(s)
- S H Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University.
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language; IKERBASQUE, Basque Foundation for Science
| | | | - Michael J Crosse
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University
| |
Collapse
|
7
|
Gijbels L, Yeatman JD, Lalonde K, Lee AKC. Audiovisual Speech Processing in Relationship to Phonological and Vocabulary Skills in First Graders. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:5022-5040. [PMID: 34735292 PMCID: PMC9150669 DOI: 10.1044/2021_jslhr-21-00196] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/02/2021] [Revised: 07/06/2021] [Accepted: 08/11/2021] [Indexed: 06/13/2023]
Abstract
PURPOSE It is generally accepted that adults use visual cues to improve speech intelligibility in noisy environments, but findings regarding visual speech benefit in children are mixed. We explored factors that contribute to audiovisual (AV) gain in young children's speech understanding. We examined whether there is an AV benefit to speech-in-noise recognition in children in first grade and if visual salience of phonemes influences their AV benefit. We explored if individual differences in AV speech enhancement could be explained by vocabulary knowledge, phonological awareness, or general psychophysical testing performance. METHOD Thirty-seven first graders completed online psychophysical experiments. We used an online single-interval, four-alternative forced-choice picture-pointing task with age-appropriate consonant-vowel-consonant words to measure auditory-only, visual-only, and AV word recognition in noise at -2 and -8 dB SNR. We obtained standard measures of vocabulary and phonological awareness and included a general psychophysical test to examine correlations with AV benefits. RESULTS We observed a significant overall AV gain among children in first grade. This effect was mainly attributed to the benefit at -8 dB SNR, for visually distinct targets. Individual differences were not explained by any of the child variables. Boys showed lower auditory-only performances, leading to significantly larger AV gains. CONCLUSIONS This study shows AV benefit, of distinctive visual cues, to word recognition in challenging noisy conditions in first graders. The cognitive and linguistic constraints of the task may have minimized the impact of individual differences of vocabulary and phonological awareness on AV benefit. The gender difference should be studied on a larger sample and age range.
Collapse
Affiliation(s)
- Liesbeth Gijbels
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| | - Jason D. Yeatman
- Division of Developmental-Behavioral Pediatrics, School of Medicine, Stanford University, CA
- Graduate School of Education, Stanford University, CA
| | - Kaylah Lalonde
- Boys Town National Research Hospital, Center for Hearing Research, Omaha, NE
| | - Adrian K. C. Lee
- Department of Speech & Hearing Sciences, University of Washington, Seattle
- Institute for Learning & Brain Sciences, University of Washington, Seattle
| |
Collapse
|
8
|
Lalonde K, McCreery RW. Audiovisual Enhancement of Speech Perception in Noise by School-Age Children Who Are Hard of Hearing. Ear Hear 2021; 41:705-719. [PMID: 32032226 PMCID: PMC7822589 DOI: 10.1097/aud.0000000000000830] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The purpose of this study was to examine age- and hearing-related differences in school-age children's benefit from visual speech cues. The study addressed three questions: (1) Do age and hearing loss affect degree of audiovisual (AV) speech enhancement in school-age children? (2) Are there age- and hearing-related differences in the mechanisms underlying AV speech enhancement in school-age children? (3) What cognitive and linguistic variables predict individual differences in AV benefit among school-age children? DESIGN Forty-eight children between 6 and 13 years of age (19 with mild to severe sensorineural hearing loss; 29 with normal hearing) and 14 adults with normal hearing completed measures of auditory and AV syllable detection and/or sentence recognition in a two-talker masker type and a spectrally matched noise. Children also completed standardized behavioral measures of receptive vocabulary, visuospatial working memory, and executive attention. Mixed linear modeling was used to examine effects of modality, listener group, and masker on sentence recognition accuracy and syllable detection thresholds. Pearson correlations were used to examine the relationship between individual differences in children's AV enhancement (AV-auditory-only) and age, vocabulary, working memory, executive attention, and degree of hearing loss. RESULTS Significant AV enhancement was observed across all tasks, masker types, and listener groups. AV enhancement of sentence recognition was similar across maskers, but children with normal hearing exhibited less AV enhancement of sentence recognition than adults with normal hearing and children with hearing loss. AV enhancement of syllable detection was greater in the two-talker masker than the noise masker, but did not vary significantly across listener groups. Degree of hearing loss positively correlated with individual differences in AV benefit on the sentence recognition task in noise, but not on the detection task. None of the cognitive and linguistic variables correlated with individual differences in AV enhancement of syllable detection or sentence recognition. CONCLUSIONS Although AV benefit to syllable detection results from the use of visual speech to increase temporal expectancy, AV benefit to sentence recognition requires that an observer extracts phonetic information from the visual speech signal. The findings from this study suggest that all listener groups were equally good at using temporal cues in visual speech to detect auditory speech, but that adults with normal hearing and children with hearing loss were better than children with normal hearing at extracting phonetic information from the visual signal and/or using visual speech information to access phonetic/lexical representations in long-term memory. These results suggest that standard, auditory-only clinical speech recognition measures likely underestimate real-world speech recognition skills of children with mild to severe hearing loss.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE, USA
| | - Ryan W. McCreery
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE, USA
| |
Collapse
|
9
|
Vos TG, Dillon MT, Buss E, Rooth MA, Bucker AL, Dillon S, Pearson A, Quinones K, Richter ME, Roth N, Young A, Dedmon MM. Influence of Protective Face Coverings on the Speech Recognition of Cochlear Implant Patients. Laryngoscope 2021; 131:E2038-E2043. [PMID: 33590898 PMCID: PMC8014501 DOI: 10.1002/lary.29447] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Revised: 01/16/2021] [Accepted: 02/01/2021] [Indexed: 12/03/2022]
Abstract
Objectives The objectives were to characterize the effects of wearing face coverings on: 1) acoustic speech cues, and 2) speech recognition of patients with hearing loss who listen with a cochlear implant. Methods A prospective cohort study was performed in a tertiary referral center between July and September 2020. A female talker recorded sentences in three conditions: no face covering, N95 mask, and N95 mask plus a face shield. Spectral differences were analyzed between speech produced in each condition. The speech recognition in each condition for twenty‐three adult patients with at least 6 months of cochlear implant use was assessed. Results Spectral analysis demonstrated preferential attenuation of high‐frequency speech information with the N95 mask plus face shield condition compared to the other conditions. Speech recognition did not differ significantly between the uncovered (median 90% [IQR 89%–94%]) and N95 mask conditions (91% [IQR 86%–94%]; P = .253); however, speech recognition was significantly worse in the N95 mask plus face shield condition (64% [IQR 48%–75%]) compared to the uncovered (P < .001) or N95 mask (P < .001) conditions. Conclusions The type and combination of protective face coverings used have differential effects on attenuation of speech information, influencing speech recognition of patients with hearing loss. In the face of the COVID‐19 pandemic, there is a need to protect patients and clinicians from spread of disease while maximizing patient speech recognition. The disruptive effect of wearing a face shield in conjunction with a mask may prompt clinicians to consider alternative eye protection, such as goggles, in appropriate clinical situations. Level of Evidence 3 Laryngoscope, 131:E2038–E2043, 2021
Collapse
Affiliation(s)
- Teresa G Vos
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Margaret T Dillon
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Emily Buss
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Meredith A Rooth
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Andrea L Bucker
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Sarah Dillon
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Adrienne Pearson
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Kristen Quinones
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Margaret E Richter
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A.,Division of Speech and Hearing Sciences, Department of Allied Health, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - Noelle Roth
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Allison Young
- Department of Audiology, UNC Health, Chapel Hill, North Carolina, U.S.A
| | - Matthew M Dedmon
- Otolaryngology and Head and Neck Surgery, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
10
|
Lalonde K, Werner LA. Development of the Mechanisms Underlying Audiovisual Speech Perception Benefit. Brain Sci 2021; 11:49. [PMID: 33466253 PMCID: PMC7824772 DOI: 10.3390/brainsci11010049] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2020] [Revised: 12/30/2020] [Accepted: 12/30/2020] [Indexed: 02/07/2023] Open
Abstract
The natural environments in which infants and children learn speech and language are noisy and multimodal. Adults rely on the multimodal nature of speech to compensate for noisy environments during speech communication. Multiple mechanisms underlie mature audiovisual benefit to speech perception, including reduced uncertainty as to when auditory speech will occur, use of correlations between the amplitude envelope of auditory and visual signals in fluent speech, and use of visual phonetic knowledge for lexical access. This paper reviews evidence regarding infants' and children's use of temporal and phonetic mechanisms in audiovisual speech perception benefit. The ability to use temporal cues for audiovisual speech perception benefit emerges in infancy. Although infants are sensitive to the correspondence between auditory and visual phonetic cues, the ability to use this correspondence for audiovisual benefit may not emerge until age four. A more cohesive account of the development of audiovisual speech perception may follow from a more thorough understanding of the development of sensitivity to and use of various temporal and phonetic cues.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE 68131, USA
| | - Lynne A. Werner
- Department of Speech and Hearing Sciences, University of Washington, Seattle, WA 98105, USA;
| |
Collapse
|
11
|
Halverson DM, Lalonde K. Does visual speech provide release from perceptual masking in children? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL221. [PMID: 33003896 PMCID: PMC7731949 DOI: 10.1121/10.0001867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Revised: 08/12/2020] [Accepted: 08/13/2020] [Indexed: 06/11/2023]
Abstract
Adults benefit more from visual speech in speech maskers than in noise maskers because visual speech helps perceptually isolate target talkers from competing talkers. To investigate whether children use visual speech to perceptually isolate target talkers, this study compared children's speech recognition thresholds in auditory and audiovisual condition across two maskers: two-talker speech and noise. Children demonstrated similar audiovisual benefit in both maskers. Individual differences in speechreading accuracy predicted audiovisual benefit in each masker to a similar degree. Results suggest that although visual speech improves children's masked speech recognition thresholds, children may use visual speech in different ways than adults.
Collapse
Affiliation(s)
- Destinee M Halverson
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68104, ,
| | - Kaylah Lalonde
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68104, ,
| |
Collapse
|
12
|
Al-Salim S, Moeller MP, McGregor KK. Performance of Children With Hearing Loss on an Audiovisual Version of a Nonword Repetition Task. Lang Speech Hear Serv Sch 2020; 51:42-54. [PMID: 31913807 DOI: 10.1044/2019_lshss-ochl-19-0016] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose The aims of this study were to (a) determine if a high-quality adaptation of an audiovisual nonword repetition task can be completed by children with wide-ranging hearing abilities and to (b) examine whether performance on that task is sensitive to child demographics, hearing status, language, working memory, and executive function abilities. Method An audiovisual version of a nonword repetition task was adapted and administered to 100 school-aged children grouped by hearing status: 35 with normal hearing, 22 with mild bilateral hearing loss, 17 with unilateral hearing loss, and 26 cochlear implant users. Participants also completed measures of vocabulary, working memory, and executive function. A generalized linear mixed-effects model was used to analyze performance on the nonword repetition task. Results All children were able to complete the nonword repetition task. Children with unilateral hearing loss and children with cochlear implants repeated nonwords with less accuracy than normal-hearing peers. After adjusting for the influence of vocabulary and working memory, main effects were found for syllable length and hearing status, but no interaction effect was observed. Conclusions The audiovisual nonword repetition task captured individual differences in the performance of children with wide-ranging hearing abilities. The task could act as a useful tool to aid in identifying children with unilateral or mild bilateral hearing loss who have language impairments beyond those imposed by the hearing loss.
Collapse
Affiliation(s)
- Sarah Al-Salim
- Center for Childhood Deafness, Language & Learning, Boys Town National Research Hospital, Omaha, NE
| | - Mary Pat Moeller
- Center for Childhood Deafness, Language & Learning, Boys Town National Research Hospital, Omaha, NE
| | - Karla K McGregor
- Center for Childhood Deafness, Language & Learning, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
13
|
Stawicki M, Majdak P, Başkent D. Ventriloquist Illusion Produced With Virtual Acoustic Spatial Cues and Asynchronous Audiovisual Stimuli in Both Young and Older Individuals. Multisens Res 2019; 32:745-770. [DOI: 10.1163/22134808-20191430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 09/03/2019] [Indexed: 11/19/2022]
Abstract
Abstract
Ventriloquist illusion, the change in perceived location of an auditory stimulus when a synchronously presented but spatially discordant visual stimulus is added, has been previously shown in young healthy populations to be a robust paradigm that mainly relies on automatic processes. Here, we propose ventriloquist illusion as a potential simple test to assess audiovisual (AV) integration in young and older individuals. We used a modified version of the illusion paradigm that was adaptive, nearly bias-free, relied on binaural stimulus representation using generic head-related transfer functions (HRTFs) instead of multiple loudspeakers, and tested with synchronous and asynchronous presentation of AV stimuli (both tone and speech). The minimum audible angle (MAA), the smallest perceptible difference in angle between two sound sources, was compared with or without the visual stimuli in young and older adults with no or minimal sensory deficits. The illusion effect, measured by means of MAAs implemented with HRTFs, was observed with both synchronous and asynchronous visual stimulus, but only with tone and not speech stimulus. The patterns were similar between young and older individuals, indicating the versatility of the modified ventriloquist illusion paradigm.
Collapse
Affiliation(s)
- Marnix Stawicki
- 1Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- 2Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences (BCN), University of Groningen, Groningen, The Netherlands
| | - Piotr Majdak
- 3Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Deniz Başkent
- 1Department of Otorhinolaryngology / Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- 2Graduate School of Medical Sciences, Research School of Behavioral and Cognitive Neurosciences (BCN), University of Groningen, Groningen, The Netherlands
| |
Collapse
|
14
|
Lalonde K, Werner LA. Infants and Adults Use Visual Cues to Improve Detection and Discrimination of Speech in Noise. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:3860-3875. [PMID: 31618097 PMCID: PMC7201336 DOI: 10.1044/2019_jslhr-h-19-0106] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 05/30/2019] [Accepted: 07/08/2019] [Indexed: 06/10/2023]
Abstract
Purpose This study assessed the extent to which 6- to 8.5-month-old infants and 18- to 30-year-old adults detect and discriminate auditory syllables in noise better in the presence of visual speech than in auditory-only conditions. In addition, we examined whether visual cues to the onset and offset of the auditory signal account for this benefit. Method Sixty infants and 24 adults were randomly assigned to speech detection or discrimination tasks and were tested using a modified observer-based psychoacoustic procedure. Each participant completed 1-3 conditions: auditory-only, with visual speech, and with a visual signal that only cued the onset and offset of the auditory syllable. Results Mixed linear modeling indicated that infants and adults benefited from visual speech on both tasks. Adults relied on the onset-offset cue for detection, but the same cue did not improve their discrimination. The onset-offset cue benefited infants for both detection and discrimination. Whereas the onset-offset cue improved detection similarly for infants and adults, the full visual speech signal benefited infants to a lesser extent than adults on the discrimination task. Conclusions These results suggest that infants' use of visual onset-offset cues is mature, but their ability to use more complex visual speech cues is still developing. Additional research is needed to explore differences in audiovisual enhancement (a) of speech discrimination across speech targets and (b) with increasingly complex tasks and stimuli.
Collapse
Affiliation(s)
- Kaylah Lalonde
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| | - Lynne A. Werner
- Department of Speech & Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
15
|
Detection and Attention for Auditory, Visual, and Audiovisual Speech in Children with Hearing Loss. Ear Hear 2019; 41:508-520. [PMID: 31592903 DOI: 10.1097/aud.0000000000000798] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Efficient multisensory speech detection is critical for children who must quickly detect/encode a rapid stream of speech to participate in conversations and have access to the audiovisual cues that underpin speech and language development, yet multisensory speech detection remains understudied in children with hearing loss (CHL). This research assessed detection, along with vigilant/goal-directed attention, for multisensory versus unisensory speech in CHL versus children with normal hearing (CNH). DESIGN Participants were 60 CHL who used hearing aids and communicated successfully aurally/orally and 60 age-matched CNH. Simple response times determined how quickly children could detect a preidentified easy-to-hear stimulus (70 dB SPL, utterance "buh" presented in auditory only [A], visual only [V], or audiovisual [AV] modes). The V mode formed two facial conditions: static versus dynamic face. Faster detection for multisensory (AV) than unisensory (A or V) input indicates multisensory facilitation. We assessed mean responses and faster versus slower responses (defined by first versus third quartiles of response-time distributions), which were respectively conceptualized as: faster responses (first quartile) reflect efficient detection with efficient vigilant/goal-directed attention and slower responses (third quartile) reflect less efficient detection associated with attentional lapses. Finally, we studied associations between these results and personal characteristics of CHL. RESULTS Unisensory A versus V modes: Both groups showed better detection and attention for A than V input. The A input more readily captured children's attention and minimized attentional lapses, which supports A-bound processing even by CHL who were processing low fidelity A input. CNH and CHL did not differ in ability to detect A input at conversational speech level. Multisensory AV versus A modes: Both groups showed better detection and attention for AV than A input. The advantage for AV input was facial effect (both static and dynamic faces), a pattern suggesting that communication is a social interaction that is more than just words. Attention did not differ between groups; detection was faster in CHL than CNH for AV input, but not for A input. Associations between personal characteristics/degree of hearing loss of CHL and results: CHL with greatest deficits in detection of V input had poorest word recognition skills and CHL with greatest reduction of attentional lapses from AV input had poorest vocabulary skills. Both outcomes are consistent with the idea that CHL who are processing low fidelity A input depend disproportionately on V and AV input to learn to identify words and associate them with concepts. As CHL aged, attention to V input improved. Degree of HL did not influence results. CONCLUSIONS Understanding speech-a daily challenge for CHL-is a complex task that demands efficient detection of and attention to AV speech cues. Our results support the clinical importance of multisensory approaches to understand and advance spoken communication by CHL.
Collapse
|
16
|
Psychobiological Responses Reveal Audiovisual Noise Differentially Challenges Speech Recognition. Ear Hear 2019; 41:268-277. [PMID: 31283529 DOI: 10.1097/aud.0000000000000755] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES In noisy environments, listeners benefit from both hearing and seeing a talker, demonstrating audiovisual (AV) cues enhance speech-in-noise (SIN) recognition. Here, we examined the relative contribution of auditory and visual cues to SIN perception and the strategies used by listeners to decipher speech in noise interference(s). DESIGN Normal-hearing listeners (n = 22) performed an open-set speech recognition task while viewing audiovisual TIMIT sentences presented under different combinations of signal degradation including visual (AVn), audio (AnV), or multimodal (AnVn) noise. Acoustic and visual noises were matched in physical signal-to-noise ratio. Eyetracking monitored participants' gaze to different parts of a talker's face during SIN perception. RESULTS As expected, behavioral performance for clean sentence recognition was better for A-only and AV compared to V-only speech. Similarly, with noise in the auditory channel (AnV and AnVn speech), performance was aided by the addition of visual cues of the talker regardless of whether the visual channel contained noise, confirming a multimodal benefit to SIN recognition. The addition of visual noise (AVn) obscuring the talker's face had little effect on speech recognition by itself. Listeners' eye gaze fixations were biased toward the eyes (decreased at the mouth) whenever the auditory channel was compromised. Fixating on the eyes was negatively associated with SIN recognition performance. Eye gazes on the mouth versus eyes of the face also depended on the gender of the talker. CONCLUSIONS Collectively, results suggest listeners (1) depend heavily on the auditory over visual channel when seeing and hearing speech and (2) alter their visual strategy from viewing the mouth to viewing the eyes of a talker with signal degradations, which negatively affects speech perception.
Collapse
|
17
|
Levy H, Konieczny L, Hanulíková A. Processing of unfamiliar accents in monolingual and bilingual children: effects of type and amount of accent experience. JOURNAL OF CHILD LANGUAGE 2019; 46:368-392. [PMID: 30616700 DOI: 10.1017/s030500091800051x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Substantial individual differences exist in regard to type and amount of experience with variable speech resulting from foreign or regional accents. Whereas prior experience helps with processing familiar accents, research on how experience with accented speech affects processing of unfamiliar accents is inconclusive, ranging from perceptual benefits to processing disadvantages. We examined how experience with accented speech modulates mono- and bilingual children's (mean age: 9;10) ease of speech comprehension for two unfamiliar accents in German, one foreign and one regional. More experience with regional accents helped children repeat sentences correctly in the regional condition and in the standard condition. More experience with foreign accents did not help in either accent condition. The results suggest that type and amount of accent experience co-determine processing ease of accented speech.
Collapse
Affiliation(s)
- Helena Levy
- GRK 'Frequency effects in language', University of Freiburg, Germany
| | | | - Adriana Hanulíková
- University of Freiburg, Germany
- Freiburg Institute for Advanced Studies (FRIAS), Freiburg, Germany
| |
Collapse
|
18
|
Jerger S, Damian MF, Karl C, Abdi H. Developmental Shifts in Detection and Attention for Auditory, Visual, and Audiovisual Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:3095-3112. [PMID: 30515515 PMCID: PMC6440305 DOI: 10.1044/2018_jslhr-h-17-0343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2017] [Revised: 01/02/2018] [Accepted: 07/16/2018] [Indexed: 06/09/2023]
Abstract
PURPOSE Successful speech processing depends on our ability to detect and integrate multisensory cues, yet there is minimal research on multisensory speech detection and integration by children. To address this need, we studied the development of speech detection for auditory (A), visual (V), and audiovisual (AV) input. METHOD Participants were 115 typically developing children clustered into age groups between 4 and 14 years. Speech detection (quantified by response times [RTs]) was determined for 1 stimulus, /buh/, presented in A, V, and AV modes (articulating vs. static facial conditions). Performance was analyzed not only in terms of traditional mean RTs but also in terms of the faster versus slower RTs (defined by the 1st vs. 3rd quartiles of RT distributions). These time regions were conceptualized respectively as reflecting optimal detection with efficient focused attention versus less optimal detection with inefficient focused attention due to attentional lapses. RESULTS Mean RTs indicated better detection (a) of multisensory AV speech than A speech only in 4- to 5-year-olds and (b) of A and AV inputs than V input in all age groups. The faster RTs revealed that AV input did not improve detection in any group. The slower RTs indicated that (a) the processing of silent V input was significantly faster for the articulating than static face and (b) AV speech or facial input significantly minimized attentional lapses in all groups except 6- to 7-year-olds (a peaked U-shaped curve). Apparently, the AV benefit observed for mean performance in 4- to 5-year-olds arose from effects of attention. CONCLUSIONS The faster RTs indicated that AV input did not enhance detection in any group, but the slower RTs indicated that AV speech and dynamic V speech (mouthing) significantly minimized attentional lapses and thus did influence performance. Overall, A and AV inputs were detected consistently faster than V input; this result endorsed stimulus-bound auditory processing by these children.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, Richardson, TX
| | - Markus F. Damian
- School of Experimental Psychology, University of Bristol, United Kingdom
| | - Cassandra Karl
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
- Callier Center for Communication Disorders, Richardson, TX
| | - Hervé Abdi
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, Richardson
| |
Collapse
|
19
|
Jerger S, Damian MF, McAlpine RP, Abdi H. Visual speech fills in both discrimination and identification of non-intact auditory speech in children. JOURNAL OF CHILD LANGUAGE 2018; 45:392-414. [PMID: 28724465 PMCID: PMC5775942 DOI: 10.1017/s0305000917000265] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. bæz) coupled to non-intact (excised onsets) auditory speech (signified by /-b/æz). Children discriminated syllable pairs that differed in intactness (i.e. bæz:/-b/æz) and identified non-intact nonwords (/-b/æz). We predicted that visual speech would cause children to perceive the non-intact onsets as intact, resulting in more same responses for discrimination and more intact (i.e. bæz) responses for identification in the audiovisual than auditory mode. Visual speech for the easy-to-speechread /b/ but not for the difficult-to-speechread /g/ boosted discrimination and identification (about 35-45%) in children from four to fourteen years. The influence of visual speech on discrimination was uniquely associated with the influence of visual speech on identification and receptive vocabulary skills.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX 75080
- Callier Center for Communication Disorders, 811 Synergy Park Blvd., Richardson, TX 75080
| | - Markus F. Damian
- University of Bristol, School of Experimental Psychology, 12a Priory Road, Room 1D20, Bristol BS8 1TU, United Kingdom
| | - Rachel P. McAlpine
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX 75080
- Callier Center for Communication Disorders, 811 Synergy Park Blvd., Richardson, TX 75080
| | - Hervé Abdi
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX 75080
| |
Collapse
|
20
|
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss. Ear Hear 2017; 39:783-794. [PMID: 29252979 DOI: 10.1097/aud.0000000000000534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. DESIGN Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. RESULTS Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. CONCLUSIONS The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Collapse
|
21
|
Modeling the Development of Audiovisual Cue Integration in Speech Perception. Brain Sci 2017; 7:brainsci7030032. [PMID: 28335558 PMCID: PMC5366831 DOI: 10.3390/brainsci7030032] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Revised: 03/03/2017] [Accepted: 03/16/2017] [Indexed: 11/22/2022] Open
Abstract
Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.
Collapse
|
22
|
Jerger S, Damian MF, McAlpine RP, Abdi H. Visual speech alters the discrimination and identification of non-intact auditory speech in children with hearing loss. Int J Pediatr Otorhinolaryngol 2017; 94:127-137. [PMID: 28167003 PMCID: PMC5308867 DOI: 10.1016/j.ijporl.2017.01.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 01/05/2017] [Accepted: 01/06/2017] [Indexed: 11/18/2022]
Abstract
OBJECTIVES Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. METHODS Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. RESULTS Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/-B/aa) and more intact onset responses for nonword repetition (Baz for/-B/az). Thus visual speech altered both discrimination and identification in the CHL-to a large extent for the/B/onsets but only minimally for the/G/onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children's discrimination skills (i.e., d' analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets-even after variation due to the other variables was controlled. CONCLUSIONS These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL.
Collapse
Affiliation(s)
- Susan Jerger
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX, 75080, USA; Callier Center for Communication Disorders, 811 Synergy Park Blvd., Richardson, TX, 75080, USA.
| | - Markus F Damian
- University of Bristol, School of Experimental Psychology, 12a Priory Road, Room 1D20, Bristol, BS8 1TU, United Kingdom.
| | - Rachel P McAlpine
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX, 75080, USA; Callier Center for Communication Disorders, 811 Synergy Park Blvd., Richardson, TX, 75080, USA.
| | - Hervé Abdi
- School of Behavioral and Brain Sciences, GR4.1, University of Texas at Dallas, 800 W. Campbell Rd, Richardson, TX, 75080, USA.
| |
Collapse
|
23
|
Lau BK, Ruggles DR, Katyal S, Engel SA, Oxenham AJ. Sustained Cortical and Subcortical Measures of Auditory and Visual Plasticity following Short-Term Perceptual Learning. PLoS One 2017; 12:e0168858. [PMID: 28107359 PMCID: PMC5249117 DOI: 10.1371/journal.pone.0168858] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2016] [Accepted: 11/06/2016] [Indexed: 12/02/2022] Open
Abstract
Short-term training can lead to improvements in behavioral discrimination of auditory and visual stimuli, as well as enhanced EEG responses to those stimuli. In the auditory domain, fluency with tonal languages and musical training has been associated with long-term cortical and subcortical plasticity, but less is known about the effects of shorter-term training. This study combined electroencephalography (EEG) and behavioral measures to investigate short-term learning and neural plasticity in both auditory and visual domains. Forty adult participants were divided into four groups. Three groups trained on one of three tasks, involving discrimination of auditory fundamental frequency (F0), auditory amplitude modulation rate (AM), or visual orientation (VIS). The fourth (control) group received no training. Pre- and post-training tests, as well as retention tests 30 days after training, involved behavioral discrimination thresholds, steady-state visually evoked potentials (SSVEP) to the flicker frequencies of visual stimuli, and auditory envelope-following responses simultaneously evoked and measured in response to rapid stimulus F0 (EFR), thought to reflect subcortical generators, and slow amplitude modulation (ASSR), thought to reflect cortical generators. Enhancement of the ASSR was observed in both auditory-trained groups, not specific to the AM-trained group, whereas enhancement of the SSVEP was found only in the visually-trained group. No evidence was found for changes in the EFR. The results suggest that some aspects of neural plasticity can develop rapidly and may generalize across tasks but not across modalities. Behaviorally, the pattern of learning was complex, with significant cross-task and cross-modal learning effects.
Collapse
Affiliation(s)
- Bonnie K. Lau
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
- * E-mail:
| | - Dorea R. Ruggles
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Sucharit Katyal
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Stephen A. Engel
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Andrew J. Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
24
|
IMAFUKU M, MYOWA M. DEVELOPMENTAL CHANGE IN SENSITIVITY TO AUDIOVISUAL SPEECH CONGRUENCY AND ITS RELATION TO LANGUAGE IN INFANTS. PSYCHOLOGIA 2016. [DOI: 10.2117/psysoc.2016.163] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Affiliation(s)
- Masahiro IMAFUKU
- Graduate School of Education, Kyoto University
- Graduate School of Arts and Sciences, The University of Tokyo
- Japan Society for the Promotion of Science
| | | |
Collapse
|