1
|
O'Hanlon B, Plack CJ, Nuttall HE. Reassessing the Benefits of Audiovisual Integration to Speech Perception and Intelligibility. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2025; 68:26-39. [PMID: 39620981 PMCID: PMC11842087 DOI: 10.1044/2024_jslhr-24-00162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Revised: 07/09/2024] [Accepted: 09/16/2024] [Indexed: 01/03/2025]
Abstract
PURPOSE In difficult listening conditions, the visual system assists with speech perception through lipreading. Stimulus onset asynchrony (SOA) is used to investigate the interaction between the two modalities in speech perception. Previous estimates of audiovisual benefit and SOA integration period differ widely. A limitation of previous research is a lack of consideration of visemes-categories of phonemes defined by similar lip movements when produced by a speaker-to ensure that selected phonemes are visually distinct. This study aimed to reassess the benefits of audiovisual lipreading to speech perception when different viseme categories are selected as stimuli and presented in noise. The study also aimed to investigate the effects of SOA on these stimuli. METHOD Sixty participants were tested online and presented with audio-only and audiovisual stimuli containing the speaker's lip movements. The speech was presented either with or without noise and had six different SOAs (0, 200, 216.6, 233.3, 250, and 266.6 ms). Participants discriminated between speech syllables with button presses. RESULTS The benefit of visual information was weaker than that in previous studies. There was a significant increase in reaction times as SOA was introduced, but there were no significant effects of SOA on accuracy. Furthermore, exploratory analyses suggest that the effect was not equal across viseme categories: "Ba" was more difficult to recognize than "ka" in noise. CONCLUSION In summary, the findings suggest that the contributions of audiovisual integration to speech processing are weaker when considering visemes but are not sufficient to identify a full integration period. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.27641064.
Collapse
Affiliation(s)
| | - Christopher J. Plack
- Department of Psychology, Lancaster University, United Kingdom
- Manchester Centre for Audiology and Deafness, The University of Manchester, United Kingdom
| | | |
Collapse
|
2
|
Yonemura Y, Katori Y. Dynamical predictive coding with reservoir computing performs noise-robust multi-sensory speech recognition. Front Comput Neurosci 2024; 18:1464603. [PMID: 39376576 PMCID: PMC11456454 DOI: 10.3389/fncom.2024.1464603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Accepted: 09/05/2024] [Indexed: 10/09/2024] Open
Abstract
Multi-sensory integration is a perceptual process through which the brain synthesizes a unified perception by integrating inputs from multiple sensory modalities. A key issue is understanding how the brain performs multi-sensory integrations using a common neural basis in the cortex. A cortical model based on reservoir computing has been proposed to elucidate the role of recurrent connectivity among cortical neurons in this process. Reservoir computing is well-suited for time series processing, such as speech recognition. This inquiry focuses on extending a reservoir computing-based cortical model to encompass multi-sensory integration within the cortex. This research introduces a dynamical model of multi-sensory speech recognition, leveraging predictive coding combined with reservoir computing. Predictive coding offers a framework for the hierarchical structure of the cortex. The model integrates reliability weighting, derived from the computational theory of multi-sensory integration, to adapt to multi-sensory time series processing. The model addresses a multi-sensory speech recognition task, necessitating the management of complex time series. We observed that the reservoir effectively recognizes speech by extracting time-contextual information and weighting sensory inputs according to sensory noise. These findings indicate that the dynamic properties of recurrent networks are applicable to multi-sensory time series processing, positioning reservoir computing as a suitable model for multi-sensory integration.
Collapse
Affiliation(s)
- Yoshihiro Yonemura
- Graduate of System Information Science, Future University Hakodate, Hakodate, Hokkaido, Japan
| | - Yuichi Katori
- Graduate of System Information Science, Future University Hakodate, Hakodate, Hokkaido, Japan
- International Research Center for Neurointelligence (IRCN), The University of Tokyo, Tokyo, Japan
| |
Collapse
|
3
|
Bolam J, Diaz JA, Andrews M, Coats RO, Philiastides MG, Astill SL, Delis I. A drift diffusion model analysis of age-related impact on multisensory decision-making processes. Sci Rep 2024; 14:14895. [PMID: 38942761 PMCID: PMC11213863 DOI: 10.1038/s41598-024-65549-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 06/20/2024] [Indexed: 06/30/2024] Open
Abstract
Older adults (OAs) are typically slower and/or less accurate in forming perceptual choices relative to younger adults. Despite perceptual deficits, OAs gain from integrating information across senses, yielding multisensory benefits. However, the cognitive processes underlying these seemingly discrepant ageing effects remain unclear. To address this knowledge gap, 212 participants (18-90 years old) performed an online object categorisation paradigm, whereby age-related differences in Reaction Times (RTs) and choice accuracy between audiovisual (AV), visual (V), and auditory (A) conditions could be assessed. Whereas OAs were slower and less accurate across sensory conditions, they exhibited greater RT decreases between AV and V conditions, showing a larger multisensory benefit towards decisional speed. Hierarchical Drift Diffusion Modelling (HDDM) was fitted to participants' behaviour to probe age-related impacts on the latent multisensory decision formation processes. For OAs, HDDM demonstrated slower evidence accumulation rates across sensory conditions coupled with increased response caution for AV trials of higher difficulty. Notably, for trials of lower difficulty we found multisensory benefits in evidence accumulation that increased with age, but not for trials of higher difficulty, in which increased response caution was instead evident. Together, our findings reconcile age-related impacts on multisensory decision-making, indicating greater multisensory evidence accumulation benefits with age underlying enhanced decisional speed.
Collapse
Affiliation(s)
- Joshua Bolam
- School of Biomedical Sciences, University of Leeds, West Yorkshire, LS2 9JT, UK.
- Institute of Neuroscience, Trinity College Dublin, Dublin, D02 PX31, Ireland.
| | - Jessica A Diaz
- School of Biomedical Sciences, University of Leeds, West Yorkshire, LS2 9JT, UK
- School of Social Sciences, Birmingham City University, West Midlands, B15 3HE, UK
| | - Mark Andrews
- School of Social Sciences, Nottingham Trent University, Nottinghamshire, NG1 4FQ, UK
| | - Rachel O Coats
- School of Psychology, University of Leeds, West Yorkshire, LS2 9JT, UK
| | - Marios G Philiastides
- School of Neuroscience and Psychology, University of Glasgow, Lanarkshire, G12 8QB, UK
| | - Sarah L Astill
- School of Biomedical Sciences, University of Leeds, West Yorkshire, LS2 9JT, UK
| | - Ioannis Delis
- School of Biomedical Sciences, University of Leeds, West Yorkshire, LS2 9JT, UK.
| |
Collapse
|
4
|
Weglage A, Layer N, Meister H, Müller V, Lang-Roth R, Walger M, Sandmann P. Changes in visually and auditory attended audiovisual speech processing in cochlear implant users: A longitudinal ERP study. Hear Res 2024; 447:109023. [PMID: 38733710 DOI: 10.1016/j.heares.2024.109023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/25/2024] [Accepted: 04/26/2024] [Indexed: 05/13/2024]
Abstract
Limited auditory input, whether caused by hearing loss or by electrical stimulation through a cochlear implant (CI), can be compensated by the remaining senses. Specifically for CI users, previous studies reported not only improved visual skills, but also altered cortical processing of unisensory visual and auditory stimuli. However, in multisensory scenarios, it is still unclear how auditory deprivation (before implantation) and electrical hearing experience (after implantation) affect cortical audiovisual speech processing. Here, we present a prospective longitudinal electroencephalography (EEG) study which systematically examined the deprivation- and CI-induced alterations of cortical processing of audiovisual words by comparing event-related potentials (ERPs) in postlingually deafened CI users before and after implantation (five weeks and six months of CI use). A group of matched normal-hearing (NH) listeners served as controls. The participants performed a word-identification task with congruent and incongruent audiovisual words, focusing their attention on either the visual (lip movement) or the auditory speech signal. This allowed us to study the (top-down) attention effect on the (bottom-up) sensory cortical processing of audiovisual speech. When compared to the NH listeners, the CI candidates (before implantation) and the CI users (after implantation) exhibited enhanced lipreading abilities and an altered cortical response at the N1 latency range (90-150 ms) that was characterized by a decreased theta oscillation power (4-8 Hz) and a smaller amplitude in the auditory cortex. After implantation, however, the auditory-cortex response gradually increased and developed a stronger intra-modal connectivity. Nevertheless, task efficiency and activation in the visual cortex was significantly modulated in both groups by focusing attention on the visual as compared to the auditory speech signal, with the NH listeners additionally showing an attention-dependent decrease in beta oscillation power (13-30 Hz). In sum, these results suggest remarkable deprivation effects on audiovisual speech processing in the auditory cortex, which partially reverse after implantation. Although even experienced CI users still show distinct audiovisual speech processing compared to NH listeners, pronounced effects of (top-down) direction of attention on (bottom-up) audiovisual processing can be observed in both groups. However, NH listeners but not CI users appear to show enhanced allocation of cognitive resources in visually as compared to auditory attended audiovisual speech conditions, which supports our behavioural observations of poorer lipreading abilities and reduced visual influence on audition in NH listeners as compared to CI users.
Collapse
Affiliation(s)
- Anna Weglage
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany.
| | - Natalie Layer
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany
| | - Hartmut Meister
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany; Jean-Uhrmacher-Institute for Clinical ENT Research, University of Cologne, Germany
| | - Verena Müller
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany
| | - Ruth Lang-Roth
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany
| | - Martin Walger
- Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Centre, University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Germany; Jean-Uhrmacher-Institute for Clinical ENT Research, University of Cologne, Germany
| | - Pascale Sandmann
- Department of Otolaryngology, Head and Neck Surgery, Carl von Ossietzky University of Oldenburg, Germany; Research Center Neurosensory Science University of Oldenburg, Germany; Cluster of Excellence "Hearing4all", University of Oldenburg, Germany
| |
Collapse
|
5
|
Ceuleers D, Keppler H, Degeest S, Baudonck N, Swinnen F, Kestens K, Dhooge I. Auditory, Visual, and Cognitive Abilities in Normal-Hearing Adults, Hearing Aid Users, and Cochlear Implant Users. Ear Hear 2024; 45:679-694. [PMID: 38192017 DOI: 10.1097/aud.0000000000001458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2024]
Abstract
OBJECTIVES Speech understanding is considered a bimodal and bidirectional process, whereby visual information (i.e., speechreading) and also cognitive functions (i.e., top-down processes) are involved. Therefore, the purpose of the present study is twofold: (1) to investigate the auditory (A), visual (V), and cognitive (C) abilities in normal-hearing individuals, hearing aid (HA) users, and cochlear implant (CI) users, and (2) to determine an auditory, visual, cognitive (AVC)-profile providing a comprehensive overview of a person's speech processing abilities, containing a broader variety of factors involved in speech understanding. DESIGN Three matched groups of subjects participated in this study: (1) 31 normal-hearing adults (mean age = 58.76), (2) 31 adults with moderate to severe hearing loss using HAs (mean age = 59.31), (3) 31 adults with a severe to profound hearing loss using a CI (mean age = 58.86). The audiological assessments consisted of pure-tone audiometry, speech audiometry in quiet and in noise. For evaluation of the (audio-) visual speech processing abilities, the Test for (Audio) Visual Speech perception was used. The cognitive test battery consisted of the letter-number sequencing task, the letter detection test, and an auditory Stroop test, measuring working memory and processing speed, selective attention, and cognitive flexibility and inhibition, respectively. Differences between the three groups were examined using a one-way analysis of variance or Kruskal-Wallis test, depending on the normality of the variables. Furthermore, a principal component analysis was conducted to determine the AVC-profile. RESULTS Normal-hearing individuals scored better for both auditory, and cognitive abilities compared to HA users and CI users, listening in a best aided condition. No significant differences were found for speech understanding in a visual condition, despite a larger audiovisual gain for the HA users and CI users. Furthermore, an AVC-profile was composed based on the different auditory, visual, and cognitive assessments. On the basis of that profile, it is possible to determine one comprehensive score for auditory, visual, and cognitive functioning. In the future, these scores could be used in auditory rehabilitation to determine specific strengths and weaknesses per individual patient for the different abilities related to the process of speech understanding in daily life. CONCLUSIONS It is suggested to evaluate individuals with hearing loss from a broader perspective, considering more than only the typical auditory abilities. Also, cognitive and visual abilities are important to take into account to have a more complete overview of the speech understanding abilities in daily life.
Collapse
Affiliation(s)
- Dorien Ceuleers
- Department of Head and Skin, Ghent University, Ghent, Belgium
| | - Hannah Keppler
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium
| | - Sofie Degeest
- Department of Head and Skin, Ghent University, Ghent, Belgium
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium
| | - Nele Baudonck
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
| | - Freya Swinnen
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
| | - Katrien Kestens
- Department of Rehabilitation Sciences, Ghent University, Ghent, Belgium
| | - Ingeborg Dhooge
- Department of Head and Skin, Ghent University, Ghent, Belgium
- Department of Otorhinolaryngology, Ghent University Hospital, Ghent, Belgium
| |
Collapse
|
6
|
Schnepel P, Paricio-Montesinos R, Ezquerra-Romano I, Haggard P, Poulet JFA. Cortical cellular encoding of thermotactile integration. Curr Biol 2024; 34:1718-1730.e3. [PMID: 38582078 DOI: 10.1016/j.cub.2024.03.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Revised: 12/24/2023] [Accepted: 03/13/2024] [Indexed: 04/08/2024]
Abstract
Recent evidence suggests that primary sensory cortical regions play a role in the integration of information from multiple sensory modalities. How primary cortical neurons integrate different sources of sensory information is unclear, partly because non-primary sensory input to a cortical sensory region is often weak or modulatory. To address this question, we take advantage of the robust representation of thermal (cooling) and tactile stimuli in mouse forelimb primary somatosensory cortex (fS1). Using a thermotactile detection task, we show that the perception of threshold-level cool or tactile information is enhanced when they are presented simultaneously, compared with presentation alone. To investigate the cortical cellular correlates of thermotactile integration, we performed in vivo extracellular recordings from fS1 in awake resting and anesthetized mice during unimodal and bimodal stimulation of the forepaw. Unimodal stimulation evoked thermal- or tactile- specific excitatory and inhibitory responses of fS1 neurons. The most prominent features of combined thermotactile stimulation are the recruitment of unimodally silent fS1 neurons, non-linear integration features, and response dynamics that favor longer response durations with additional spikes. Together, we identify quantitative and qualitative changes in cortical encoding that may underlie the improvement in perception of thermotactile surfaces during haptic exploration.
Collapse
Affiliation(s)
- Philipp Schnepel
- Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin-Buch, Robert-Rössle-Strasse 10, 13125 Berlin, Germany; Neuroscience Research Center, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Ricardo Paricio-Montesinos
- Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin-Buch, Robert-Rössle-Strasse 10, 13125 Berlin, Germany; Neuroscience Research Center, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany
| | - Ivan Ezquerra-Romano
- Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin-Buch, Robert-Rössle-Strasse 10, 13125 Berlin, Germany; Neuroscience Research Center, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany; Institute of Cognitive Neuroscience, University College London (UCL), London WC1N 3AZ, UK
| | - Patrick Haggard
- Institute of Cognitive Neuroscience, University College London (UCL), London WC1N 3AZ, UK
| | - James F A Poulet
- Max-Delbrück Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin-Buch, Robert-Rössle-Strasse 10, 13125 Berlin, Germany; Neuroscience Research Center, Charité-Universitätsmedizin Berlin, Charitéplatz 1, 10117 Berlin, Germany.
| |
Collapse
|
7
|
Seijdel N, Schoffelen JM, Hagoort P, Drijvers L. Attention Drives Visual Processing and Audiovisual Integration During Multimodal Communication. J Neurosci 2024; 44:e0870232023. [PMID: 38199864 PMCID: PMC10919203 DOI: 10.1523/jneurosci.0870-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 01/12/2024] Open
Abstract
During communication in real-life settings, our brain often needs to integrate auditory and visual information and at the same time actively focus on the relevant sources of information, while ignoring interference from irrelevant events. The interaction between integration and attention processes remains poorly understood. Here, we use rapid invisible frequency tagging and magnetoencephalography to investigate how attention affects auditory and visual information processing and integration, during multimodal communication. We presented human participants (male and female) with videos of an actress uttering action verbs (auditory; tagged at 58 Hz) accompanied by two movie clips of hand gestures on both sides of fixation (attended stimulus tagged at 65 Hz; unattended stimulus tagged at 63 Hz). Integration difficulty was manipulated by a lower-order auditory factor (clear/degraded speech) and a higher-order visual semantic factor (matching/mismatching gesture). We observed an enhanced neural response to the attended visual information during degraded speech compared to clear speech. For the unattended information, the neural response to mismatching gestures was enhanced compared to matching gestures. Furthermore, signal power at the intermodulation frequencies of the frequency tags, indexing nonlinear signal interactions, was enhanced in the left frontotemporal and frontal regions. Focusing on the left inferior frontal gyrus, this enhancement was specific for the attended information, for those trials that benefitted from integration with a matching gesture. Together, our results suggest that attention modulates audiovisual processing and interaction, depending on the congruence and quality of the sensory input.
Collapse
Affiliation(s)
- Noor Seijdel
- Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands
| | - Jan-Mathijs Schoffelen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 HT, The Netherlands
| | - Peter Hagoort
- Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 HT, The Netherlands
| | - Linda Drijvers
- Neurobiology of Language Department - The Communicative Brain, Max Planck Institute for Psycholinguistics, Nijmegen 6525 XD, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 HT, The Netherlands
| |
Collapse
|
8
|
Le Rhun L, Llorach G, Delmas T, Suied C, Arnal LH, Lazard DS. A standardised test to evaluate audio-visual speech intelligibility in French. Heliyon 2024; 10:e24750. [PMID: 38312568 PMCID: PMC10835303 DOI: 10.1016/j.heliyon.2024.e24750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 12/07/2023] [Accepted: 01/12/2024] [Indexed: 02/06/2024] Open
Abstract
Objective Lipreading, which plays a major role in the communication of the hearing impaired, lacked a French standardised tool. Our aim was to create and validate an audio-visual (AV) version of the French Matrix Sentence Test (FrMST). Design Video recordings were created by dubbing the existing audio files. Sample Thirty-five young, normal-hearing participants were tested in auditory and visual modalities alone (Ao, Vo) and in AV conditions, in quiet, noise, and open and closed-set response formats. Results Lipreading ability (Vo) ranged from 1 % to 77%-word comprehension. The absolute AV benefit was 9.25 dB SPL in quiet and 4.6 dB SNR in noise. The response format did not influence the results in the AV noise condition, except during the training phase. Lipreading ability and AV benefit were significantly correlated. Conclusions The French video material achieved similar AV benefits as those described in the literature for AV MST in other languages. For clinical purposes, we suggest targeting SRT80 to avoid ceiling effects, and performing two training lists in the AV condition in noise, followed by one AV list in noise, one Ao list in noise and one Vo list, in a randomised order, in open or close set-format.
Collapse
Affiliation(s)
- Loïc Le Rhun
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
| | - Gerard Llorach
- Auditory Signal Processing, Dept. of Medical Physics and Acoustics, University of Oldenburg Oldenburg, Germany
| | - Tanguy Delmas
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
- ECLEAR, Audition Lefeuvre – Audition Marc Boulet, Athis-Mons, France
| | - Clara Suied
- Institut de Recherche Biomédicale des Armées, Département Neurosciences et Sciences Cognitives, Brétigny-sur-Orge, France
| | - Luc H. Arnal
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
| | - Diane S. Lazard
- Institut Pasteur, Université Paris Cité, Inserm UA06, Institut de l’Audition, Paris, France
- Princess Grace Hospital, ENT & Maxillo-facial Surgery Department, Monaco
- Institut Arthur Vernes, ENT Surgery Department, Paris, France
| |
Collapse
|
9
|
Drouin JR, Rojas JA. Influence of face masks on recalibration of phonetic categories. Atten Percept Psychophys 2023; 85:2700-2717. [PMID: 37188863 PMCID: PMC10185375 DOI: 10.3758/s13414-023-02715-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2023] [Indexed: 05/17/2023]
Abstract
Previous research demonstrates listeners dynamically adjust phonetic categories in line with lexical context. While listeners show flexibility in adapting speech categories, recalibration may be constrained when variability can be attributed externally. It has been hypothesized that when listeners attribute atypical speech input to a causal factor, phonetic recalibration is attenuated. The current study investigated this theory directly by examining the influence of face masks, an external factor that affects both visual and articulatory cues, on the magnitude of phonetic recalibration. Across four experiments, listeners completed a lexical decision exposure phase in which they heard an ambiguous sound in either /s/-biasing or /ʃ/-biasing lexical contexts, while simultaneously viewing a speaker with a mask off, mask on the chin, or mask over the mouth. Following exposure, all listeners completed an auditory phonetic categorization test along an /ʃ/-/s/ continuum. In Experiment 1 (when no face mask was present during exposure trials), Experiment 2 (when the face mask was on the chin), Experiment 3 (when the face mask was on the mouth during ambiguous items), and Experiment 4 (when the face mask was on the mouth during the entire exposure phase), listeners showed a robust and equivalent phonetic recalibration effect. Recalibration manifested as greater proportion /s/ responses for listeners in the /s/-biased exposure group, relative to listeners in the /ʃ/-biased exposure group. Results support the notion that listeners do not causally attribute face masks with speech idiosyncrasies, which may reflect a general speech learning adjustment during the COVID-19 pandemic.
Collapse
Affiliation(s)
- Julia R Drouin
- Division of Speech and Hearing Sciences, University of North Carolina School of Medicine, Chapel Hill, NC, USA.
- Department of Communication Sciences and Disorders, California State University Fullerton, Fullerton, CA, USA.
| | - Jose A Rojas
- Department of Communication Sciences and Disorders, California State University Fullerton, Fullerton, CA, USA
| |
Collapse
|
10
|
Layer N, Abdel-Latif KHA, Radecke JO, Müller V, Weglage A, Lang-Roth R, Walger M, Sandmann P. Effects of noise and noise reduction on audiovisual speech perception in cochlear implant users: An ERP study. Clin Neurophysiol 2023; 154:141-156. [PMID: 37611325 DOI: 10.1016/j.clinph.2023.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 06/19/2023] [Accepted: 07/14/2023] [Indexed: 08/25/2023]
Abstract
OBJECTIVE Hearing with a cochlear implant (CI) is difficult in noisy environments, but the use of noise reduction algorithms, specifically ForwardFocus, can improve speech intelligibility. The current event-related potentials (ERP) study examined the electrophysiological correlates of this perceptual improvement. METHODS Ten bimodal CI users performed a syllable-identification task in auditory and audiovisual conditions, with syllables presented from the front and stationary noise presented from the sides. Brainstorm was used for spatio-temporal evaluation of ERPs. RESULTS CI users revealed an audiovisual benefit as reflected by shorter response times and greater activation in temporal and occipital regions at P2 latency. However, in auditory and audiovisual conditions, background noise hampered speech processing, leading to longer response times and delayed auditory-cortex-activation at N1 latency. Nevertheless, activating ForwardFocus resulted in shorter response times, reduced listening effort and enhanced superior-frontal-cortex-activation at P2 latency, particularly in audiovisual conditions. CONCLUSIONS ForwardFocus enhances speech intelligibility in audiovisual speech conditions by potentially allowing the reallocation of attentional resources to relevant auditory speech cues. SIGNIFICANCE This study shows for CI users that background noise and ForwardFocus differentially affect spatio-temporal cortical response patterns, both in auditory and audiovisual speech conditions.
Collapse
Affiliation(s)
- Natalie Layer
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany.
| | | | - Jan-Ole Radecke
- Dept. of Psychiatry and Psychotherapy, University of Lübeck, Germany; Center for Brain, Behaviour and Metabolism (CBBM), University of Lübeck, Germany
| | - Verena Müller
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Anna Weglage
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Ruth Lang-Roth
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Martin Walger
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany; Jean-Uhrmacher-Institute for Clinical ENT Research, University of Cologne, Germany
| | - Pascale Sandmann
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany; Department of Otolaryngology, Head and Neck Surgery, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
11
|
Tiippana K, Ujiie Y, Peromaa T, Takahashi K. Investigation of Cross-Language and Stimulus-Dependent Effects on the McGurk Effect with Finnish and Japanese Speakers and Listeners. Brain Sci 2023; 13:1198. [PMID: 37626554 PMCID: PMC10452414 DOI: 10.3390/brainsci13081198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 07/21/2023] [Accepted: 08/11/2023] [Indexed: 08/27/2023] Open
Abstract
In the McGurk effect, perception of a spoken consonant is altered when an auditory (A) syllable is presented with an incongruent visual (V) syllable (e.g., A/pa/V/ka/ is often heard as /ka/ or /ta/). The McGurk effect provides a measure for visual influence on speech perception, becoming stronger the lower the proportion of auditory correct responses. Cross-language effects are studied to understand processing differences between one's own and foreign languages. Regarding the McGurk effect, it has sometimes been found to be stronger with foreign speakers. However, other studies have shown the opposite, or no difference between languages. Most studies have compared English with other languages. We investigated cross-language effects with native Finnish and Japanese speakers and listeners. Both groups of listeners had 49 participants. The stimuli (/ka/, /pa/, /ta/) were uttered by two female and male Finnish and Japanese speakers and presented in A, V and AV modality, including a McGurk stimulus A/pa/V/ka/. The McGurk effect was stronger with Japanese stimuli in both groups. Differences in speech perception were prominent between individual speakers but less so between native languages. Unisensory perception correlated with McGurk perception. These findings suggest that stimulus-dependent features contribute to the McGurk effect. This may have a stronger influence on syllable perception than cross-language factors.
Collapse
Affiliation(s)
- Kaisa Tiippana
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Yuta Ujiie
- Department of Psychology, College of Contemporary Psychology, Rikkyo University, Saitama 352-8558, Japan
- Research Organization of Open Innovation and Collaboration, Ritsumeikan University, Osaka 567-8570, Japan
| | - Tarja Peromaa
- Department of Psychology and Logopedics, University of Helsinki, 00014 Helsinki, Finland
| | - Kohske Takahashi
- College of Comprehensive Psychology, Ritsumeikan University, Osaka 567-8570, Japan
| |
Collapse
|
12
|
Sun Y, Fu Q. How do irrelevant stimuli from another modality influence responses to the targets in a same-different task. Conscious Cogn 2023; 107:103455. [PMID: 36586291 DOI: 10.1016/j.concog.2022.103455] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/13/2022] [Accepted: 12/13/2022] [Indexed: 12/30/2022]
Abstract
It remains unclear whether multisensory interaction can implicitly occur at the abstract level. To address this issue, a same-different task was used to select comparable images and sounds in Experiment 1. Then, the stimuli with various levels of discrimination difficulty were adopted in a modified same-different task in Experiments 2, 3, and 4. The resultsshowed that only when the irrelevant stimuli were easily distinguishable, aconsistency effectcould beobservedin the testing phase. Moreover, when easily distinguishableirrelevant stimuliwere simultaneously presented with difficulttarget stimuli, irrelevant auditorystimuli facilitated responses to visual targets whereas irrelevant visual stimuli interfered with responses to auditorytargetsin the training phase,indicating an asymmetry in the role of visual and auditory in abstract multisensory integration. The results suggested that abstract multisensory information could be implicitly integrated and the inverse effectiveness principle might not apply to high-level processing of abstract multisensory integration.
Collapse
Affiliation(s)
- Ying Sun
- State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Qiufang Fu
- State Key Laboratory of Brain and Cognitive Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
13
|
Sharma S, H.M. Mens L, F.M. Snik A, van Opstal AJ, van Wanrooij MM. Hearing Asymmetry Biases Spatial Hearing in Bimodal Cochlear-Implant Users Despite Bilateral Low-Frequency Hearing Preservation. Trends Hear 2023; 27:23312165221143907. [PMID: 36605011 PMCID: PMC9829999 DOI: 10.1177/23312165221143907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 11/12/2022] [Accepted: 11/18/2022] [Indexed: 01/07/2023] Open
Abstract
Many cochlear implant users with binaural residual (acoustic) hearing benefit from combining electric and acoustic stimulation (EAS) in the implanted ear with acoustic amplification in the other. These bimodal EAS listeners can potentially use low-frequency binaural cues to localize sounds. However, their hearing is generally asymmetric for mid- and high-frequency sounds, perturbing or even abolishing binaural cues. Here, we investigated the effect of a frequency-dependent binaural asymmetry in hearing thresholds on sound localization by seven bimodal EAS listeners. Frequency dependence was probed by presenting sounds with power in low-, mid-, high-, or mid-to-high-frequency bands. Frequency-dependent hearing asymmetry was present in the bimodal EAS listening condition (when using both devices) but was also induced by independently switching devices on or off. Using both devices, hearing was near symmetric for low frequencies, asymmetric for mid frequencies with better hearing thresholds in the implanted ear, and monaural for high frequencies with no hearing in the non-implanted ear. Results show that sound-localization performance was poor in general. Typically, localization was strongly biased toward the better hearing ear. We observed that hearing asymmetry was a good predictor for these biases. Notably, even when hearing was symmetric a preferential bias toward the ear using the hearing aid was revealed. We discuss how frequency dependence of any hearing asymmetry may lead to binaural cues that are spatially inconsistent as the spectrum of a sound changes. We speculate that this inconsistency may prevent accurate sound-localization even after long-term exposure to the hearing asymmetry.
Collapse
Affiliation(s)
- Snandan Sharma
- Department of Biophysics, Radboud
University, Donders Institute for
Brain, Cognition and Behavior, Nijmegen, The
Netherlands
| | - Lucas H.M. Mens
- Department of Otorhinolaryngology, Radboud University Medical
Centre, Donders Institute for
Brain, Cognition and Behavior, Nijmegen, The
Netherlands
| | - Ad F.M. Snik
- Department of Biophysics, Radboud
University, Donders Institute for
Brain, Cognition and Behavior, Nijmegen, The
Netherlands
| | - A. John van Opstal
- Department of Biophysics, Radboud
University, Donders Institute for
Brain, Cognition and Behavior, Nijmegen, The
Netherlands
| | - Marc M. van Wanrooij
- Department of Biophysics, Radboud
University, Donders Institute for
Brain, Cognition and Behavior, Nijmegen, The
Netherlands
| |
Collapse
|
14
|
Ross LA, Molholm S, Butler JS, Bene VAD, Foxe JJ. Neural correlates of multisensory enhancement in audiovisual narrative speech perception: a fMRI investigation. Neuroimage 2022; 263:119598. [PMID: 36049699 DOI: 10.1016/j.neuroimage.2022.119598] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/26/2022] [Accepted: 08/28/2022] [Indexed: 11/25/2022] Open
Abstract
This fMRI study investigated the effect of seeing articulatory movements of a speaker while listening to a naturalistic narrative stimulus. It had the goal to identify regions of the language network showing multisensory enhancement under synchronous audiovisual conditions. We expected this enhancement to emerge in regions known to underlie the integration of auditory and visual information such as the posterior superior temporal gyrus as well as parts of the broader language network, including the semantic system. To this end we presented 53 participants with a continuous narration of a story in auditory alone, visual alone, and both synchronous and asynchronous audiovisual speech conditions while recording brain activity using BOLD fMRI. We found multisensory enhancement in an extensive network of regions underlying multisensory integration and parts of the semantic network as well as extralinguistic regions not usually associated with multisensory integration, namely the primary visual cortex and the bilateral amygdala. Analysis also revealed involvement of thalamic brain regions along the visual and auditory pathways more commonly associated with early sensory processing. We conclude that under natural listening conditions, multisensory enhancement not only involves sites of multisensory integration but many regions of the wider semantic network and includes regions associated with extralinguistic sensory, perceptual and cognitive processing.
Collapse
Affiliation(s)
- Lars A Ross
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; Department of Imaging Sciences, University of Rochester Medical Center, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| | - Sophie Molholm
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA
| | - John S Butler
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; School of Mathematical Sciences, Technological University Dublin, Kevin Street Campus, Dublin, Ireland
| | - Victor A Del Bene
- The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA; University of Alabama at Birmingham, Heersink School of Medicine, Department of Neurology, Birmingham, Alabama, 35233, USA
| | - John J Foxe
- The Frederick J. and Marion A. Schindler Cognitive Neurophysiology Laboratory, The Ernest J. Del Monte Institute for Neuroscience, Department of Neuroscience, University of Rochester School of Medicine and Dentistry, Rochester, New York, 14642, USA; The Cognitive Neurophysiology Laboratory, Departments of Pediatrics and Neuroscience, Albert Einstein College of Medicine & Montefiore Medical Center, Bronx, New York, 10461, USA.
| |
Collapse
|
15
|
Schwarz J, Li KK, Sim JH, Zhang Y, Buchanan-Worster E, Post B, Gibson JL, McDougall K. Semantic Cues Modulate Children’s and Adults’ Processing of Audio-Visual Face Mask Speech. Front Psychol 2022; 13:879156. [PMID: 35928422 PMCID: PMC9343587 DOI: 10.3389/fpsyg.2022.879156] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 05/25/2022] [Indexed: 12/03/2022] Open
Abstract
During the COVID-19 pandemic, questions have been raised about the impact of face masks on communication in classroom settings. However, it is unclear to what extent visual obstruction of the speaker’s mouth or changes to the acoustic signal lead to speech processing difficulties, and whether these effects can be mitigated by semantic predictability, i.e., the availability of contextual information. The present study investigated the acoustic and visual effects of face masks on speech intelligibility and processing speed under varying semantic predictability. Twenty-six children (aged 8-12) and twenty-six adults performed an internet-based cued shadowing task, in which they had to repeat aloud the last word of sentences presented in audio-visual format. The results showed that children and adults made more mistakes and responded more slowly when listening to face mask speech compared to speech produced without a face mask. Adults were only significantly affected by face mask speech when both the acoustic and the visual signal were degraded. While acoustic mask effects were similar for children, removal of visual speech cues through the face mask affected children to a lesser degree. However, high semantic predictability reduced audio-visual mask effects, leading to full compensation of the acoustically degraded mask speech in the adult group. Even though children did not fully compensate for face mask speech with high semantic predictability, overall, they still profited from semantic cues in all conditions. Therefore, in classroom settings, strategies that increase contextual information such as building on students’ prior knowledge, using keywords, and providing visual aids, are likely to help overcome any adverse face mask effects.
Collapse
Affiliation(s)
- Julia Schwarz
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- *Correspondence: Julia Schwarz,
| | - Katrina Kechun Li
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
- Katrina Kechun Li,
| | - Jasper Hong Sim
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Yixin Zhang
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | - Elizabeth Buchanan-Worster
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | - Brechtje Post
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| | | | - Kirsty McDougall
- Faculty of Modern and Medieval Languages and Linguistics, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
16
|
Ręk P, Magrath RD. Reality and illusion: the assessment of angular separation of multi-modal signallers in a duetting bird. Proc Biol Sci 2022; 289:20220680. [PMID: 35858056 PMCID: PMC9277264 DOI: 10.1098/rspb.2022.0680] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The spatial distribution of cooperating individuals plays a strategic role in territorial interactions of many group-living animals, and can indicate group cohesion. Vocalizations are commonly used to judge the distribution of signallers, but the spatial resolution of sounds is poor. Many species therefore accompany calls with movement; however, little is known about the role of audio-visual perception in natural interactions. We studied the effect of angular separation on the efficacy of multimodal duets in the Australian magpie-lark, Grallina cyanoleuca. We tested specifically whether conspicuous wing movements, which typically accompany duets, affect responses to auditory angular separation. Multimodal playbacks of duets using robotic models and speakers showed that birds relied primarily on acoustic cues when visual and auditory angular separations were congruent, but used both modalities to judge separation between the signallers when modalities were spatially incongruent. The visual component modified the effect of acoustic separation: robotic models that were apart weakened the response when speakers were together, while models that were together strengthened responses when speakers were apart. Our results show that responses are stronger when signallers are together, and suggest that males were are able to bind information cross-modally on the senders' spatial location, which is consistent with a multisensory illusion.
Collapse
Affiliation(s)
- Paweł Ręk
- Department of Behavioural Ecology, Institute of Environmental Biology, Faculty of Biology, Adam Mickiewicz University, 61‐614 Poznan, Poland,Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory 2614, Australia
| | - Robert D. Magrath
- Division of Ecology and Evolution, Research School of Biology, The Australian National University, Canberra, Australian Capital Territory 2614, Australia
| |
Collapse
|
17
|
Semantically congruent audiovisual integration with modal-based attention accelerates auditory short-term memory retrieval. Atten Percept Psychophys 2022; 84:1625-1634. [PMID: 35641858 DOI: 10.3758/s13414-021-02437-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/28/2021] [Indexed: 11/08/2022]
Abstract
Evidence has shown that multisensory integration benefits to unisensory perception performance are asymmetric and that auditory perception performance can receive more multisensory benefits, especially when the attention focus is directed toward a task-irrelevant visual stimulus. At present, whether the benefits of semantically (in)congruent multisensory integration with modal-based attention for subsequent unisensory short-term memory (STM) retrieval are also asymmetric remains unclear. Using a delayed matching-to-sample paradigm, the present study investigated this issue by manipulating the attention focus during multisensory memory encoding. The results revealed that both visual and auditory STM retrieval reaction times were faster under semantically congruent multisensory conditions than under unisensory memory encoding conditions. We suggest that coherent multisensory representation formation might be optimized by restricted multisensory encoding and can be rapidly triggered by subsequent unisensory memory retrieval demands. Crucially, auditory STM retrieval is exclusively accelerated by semantically congruent multisensory memory encoding, indicating that the less effective sensory modality of memory retrieval relies more on the coherent prior formation of a multisensory representation optimized by modal-based attention.
Collapse
|
18
|
The timecourse of multisensory speech processing in unilaterally stimulated cochlear implant users revealed by ERPs. Neuroimage Clin 2022; 34:102982. [PMID: 35303598 PMCID: PMC8927996 DOI: 10.1016/j.nicl.2022.102982] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Revised: 03/01/2022] [Accepted: 03/02/2022] [Indexed: 11/21/2022]
Abstract
Both normal-hearing (NH) and cochlear implant (CI) users show a clear benefit in multisensory speech processing. Group differences in ERP topographies and cortical source activation suggest distinct audiovisual speech processing in CI users when compared to NH listeners. Electrical neuroimaging, including topographic and ERP source analysis, provides a suitable tool to study the timecourse of multisensory speech processing in CI users. A cochlear implant (CI) is an auditory prosthesis which can partially restore the auditory function in patients with severe to profound hearing loss. However, this bionic device provides only limited auditory information, and CI patients may compensate for this limitation by means of a stronger interaction between the auditory and visual system. To better understand the electrophysiological correlates of audiovisual speech perception, the present study used electroencephalography (EEG) and a redundant target paradigm. Postlingually deafened CI users and normal-hearing (NH) listeners were compared in auditory, visual and audiovisual speech conditions. The behavioural results revealed multisensory integration for both groups, as indicated by shortened response times for the audiovisual as compared to the two unisensory conditions. The analysis of the N1 and P2 event-related potentials (ERPs), including topographic and source analyses, confirmed a multisensory effect for both groups and showed a cortical auditory response which was modulated by the simultaneous processing of the visual stimulus. Nevertheless, the CI users in particular revealed a distinct pattern of N1 topography, pointing to a strong visual impact on auditory speech processing. Apart from these condition effects, the results revealed ERP differences between CI users and NH listeners, not only in N1/P2 ERP topographies, but also in the cortical source configuration. When compared to the NH listeners, the CI users showed an additional activation in the visual cortex at N1 latency, which was positively correlated with CI experience, and a delayed auditory-cortex activation with a reversed, rightward functional lateralisation. In sum, our behavioural and ERP findings demonstrate a clear audiovisual benefit for both groups, and a CI-specific alteration in cortical activation at N1 latency when auditory and visual input is combined. These cortical alterations may reflect a compensatory strategy to overcome the limited CI input, which allows the CI users to improve the lip-reading skills and to approximate the behavioural performance of NH listeners in audiovisual speech conditions. Our results are clinically relevant, as they highlight the importance of assessing the CI outcome not only in auditory-only, but also in audiovisual speech conditions.
Collapse
|
19
|
Sou KL, Say A, Xu H. Unity Assumption in Audiovisual Emotion Perception. Front Neurosci 2022; 16:782318. [PMID: 35310087 PMCID: PMC8931414 DOI: 10.3389/fnins.2022.782318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 02/09/2022] [Indexed: 11/29/2022] Open
Abstract
We experience various sensory stimuli every day. How does this integration occur? What are the inherent mechanisms in this integration? The “unity assumption” proposes a perceiver’s belief of unity in individual unisensory information to modulate the degree of multisensory integration. However, this has yet to be verified or quantified in the context of semantic emotion integration. In the present study, we investigate the ability of subjects to judge the intensities and degrees of similarity in faces and voices of two emotions (angry and happy). We found more similar stimulus intensities to be associated with stronger likelihoods of the face and voice being integrated. More interestingly, multisensory integration in emotion perception was observed to follow a Gaussian distribution as a function of the emotion intensity difference between the face and voice—the optimal cut-off at about 2.50 points difference on a 7-point Likert scale. This provides a quantitative estimation of the multisensory integration function in audio-visual semantic emotion perception with regards to stimulus intensity. Moreover, to investigate the variation of multisensory integration across the population, we examined the effects of personality and autistic traits of participants. Here, we found no correlation of autistic traits with unisensory processing in a nonclinical population. Our findings shed light on the current understanding of multisensory integration mechanisms.
Collapse
Affiliation(s)
- Ka Lon Sou
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
- Humanities, Arts and Social Sciences, Singapore University of Technology and Design, Singapore, Singapore
| | - Ashley Say
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
| | - Hong Xu
- Psychology, School of Social Sciences, Nanyang Technological University, Singapore, Singapore
- *Correspondence: Hong Xu,
| |
Collapse
|
20
|
Senkowski D, Moran JK. Early evoked brain activity underlies auditory and audiovisual speech recognition deficits in schizophrenia. Neuroimage Clin 2022; 33:102909. [PMID: 34915330 PMCID: PMC8683777 DOI: 10.1016/j.nicl.2021.102909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 12/02/2021] [Accepted: 12/03/2021] [Indexed: 11/04/2022]
Abstract
Reduced N1 amplitudes reflect speech processing deficits in schizophrenia (SZ). Crossmodal N1 amplitude suppression in audiovisual speech is preserved in SZ. N1 amplitudes correlate with speech recognition performance in controls but not in SZ. Objectives People with Schizophrenia (SZ) show deficits in auditory and audiovisual speech recognition. It is possible that these deficits are related to aberrant early sensory processing, combined with an impaired ability to utilize visual cues to improve speech recognition. In this electroencephalography study we tested this by having SZ and healthy controls (HC) identify different unisensory auditory and bisensory audiovisual syllables at different auditory noise levels. Methods SZ (N = 24) and HC (N = 21) identified one of three different syllables (/da/, /ga/, /ta/) at three different noise levels (no, low, high). Half the trials were unisensory auditory and the other half provided additional visual input of moving lips. Task-evoked mediofrontal N1 and P2 brain potentials triggered to the onset of the auditory syllables were derived and related to behavioral performance. Results In comparison to HC, SZ showed speech recognition deficits for unisensory and bisensory stimuli. These deficits were primarily found in the no noise condition. Paralleling these observations, reduced N1 amplitudes to unisensory and bisensory stimuli in SZ were found in the no noise condition. In HC the N1 amplitudes were positively related to the speech recognition performance, whereas no such relationships were found in SZ. Moreover, no group differences in multisensory speech recognition benefits and N1 suppression effects for bisensory stimuli were observed. Conclusion Our study suggests that reduced N1 amplitudes reflect early auditory and audiovisual speech processing deficits in SZ. The findings that the amplitude effects were confined to salient speech stimuli and the attenuated relationship with behavioral performance in patients compared to HC, indicates a diminished decoding of the auditory speech signals in SZs. Our study also revealed relatively intact multisensory benefits in SZs, which implies that the observed auditory and audiovisual speech recognition deficits were primarily related to aberrant processing of the auditory syllables.
Collapse
Affiliation(s)
- Daniel Senkowski
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Psychiatry and Psychotherapy, Charitéplatz 1, 10117 Berlin, Germany.
| | - James K Moran
- Charité - Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Department of Psychiatry and Psychotherapy, Charitéplatz 1, 10117 Berlin, Germany
| |
Collapse
|
21
|
Cieśla K, Wolak T, Lorens A, Mentzel M, Skarżyński H, Amedi A. Effects of training and using an audio-tactile sensory substitution device on speech-in-noise understanding. Sci Rep 2022; 12:3206. [PMID: 35217676 PMCID: PMC8881456 DOI: 10.1038/s41598-022-06855-8] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 01/28/2022] [Indexed: 11/09/2022] Open
Abstract
Understanding speech in background noise is challenging. Wearing face-masks, imposed by the COVID19-pandemics, makes it even harder. We developed a multi-sensory setup, including a sensory substitution device (SSD) that can deliver speech simultaneously through audition and as vibrations on the fingertips. The vibrations correspond to low frequencies extracted from the speech input. We trained two groups of non-native English speakers in understanding distorted speech in noise. After a short session (30-45 min) of repeating sentences, with or without concurrent matching vibrations, we showed comparable mean group improvement of 14-16 dB in Speech Reception Threshold (SRT) in two test conditions, i.e., when the participants were asked to repeat sentences only from hearing and also when matching vibrations on fingertips were present. This is a very strong effect, if one considers that a 10 dB difference corresponds to doubling of the perceived loudness. The number of sentence repetitions needed for both types of training to complete the task was comparable. Meanwhile, the mean group SNR for the audio-tactile training (14.7 ± 8.7) was significantly lower (harder) than for the auditory training (23.9 ± 11.8), which indicates a potential facilitating effect of the added vibrations. In addition, both before and after training most of the participants (70-80%) showed better performance (by mean 4-6 dB) in speech-in-noise understanding when the audio sentences were accompanied with matching vibrations. This is the same magnitude of multisensory benefit that we reported, with no training at all, in our previous study using the same experimental procedures. After training, performance in this test condition was also best in both groups (SRT ~ 2 dB). The least significant effect of both training types was found in the third test condition, i.e. when participants were repeating sentences accompanied with non-matching tactile vibrations and the performance in this condition was also poorest after training. The results indicate that both types of training may remove some level of difficulty in sound perception, which might enable a more proper use of speech inputs delivered via vibrotactile stimulation. We discuss the implications of these novel findings with respect to basic science. In particular, we show that even in adulthood, i.e. long after the classical "critical periods" of development have passed, a new pairing between a certain computation (here, speech processing) and an atypical sensory modality (here, touch) can be established and trained, and that this process can be rapid and intuitive. We further present possible applications of our training program and the SSD for auditory rehabilitation in patients with hearing (and sight) deficits, as well as healthy individuals in suboptimal acoustic situations.
Collapse
Affiliation(s)
- K Cieśla
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel. .,World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland.
| | - T Wolak
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Lorens
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - M Mentzel
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| | - H Skarżyński
- World Hearing Centre, Institute of Physiology and Pathology of Hearing, Warsaw, Poland
| | - A Amedi
- The Baruch Ivcher Institute for Brain, Cognition & Technology, The Baruch Ivcher School of Psychology and the Ruth and Meir Rosental Brain Imaging Center, Reichman University, Herzliya, Israel
| |
Collapse
|
22
|
Longin L, Deroy O. Augmenting perception: How artificial intelligence transforms sensory substitution. Conscious Cogn 2022; 99:103280. [PMID: 35114632 DOI: 10.1016/j.concog.2022.103280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 11/26/2021] [Accepted: 01/12/2022] [Indexed: 01/28/2023]
Abstract
What happens when artificial sensors are coupled with the human senses? Using technology to extend the senses is an old human dream, on which sensory substitution and other augmentation technologies have already delivered. Laser tactile canes, corneal implants and magnetic belts can correct or extend what individuals could otherwise perceive. Here we show why accommodating intelligent sensory augmentation devices not just improves but also changes the way of thinking and classifying former sensory augmentation devices. We review the benefits in terms of signal processing and show why non-linear transformation is more than a mere improvement compared to classical linear transformation.
Collapse
Affiliation(s)
- Louis Longin
- Faculty of Philosophy, Philosophy of Science and the Study of Religion, LMU-Munich, Geschwister-Scholl-Platz 1, 80359 Munich, Germany.
| | - Ophelia Deroy
- Faculty of Philosophy, Philosophy of Science and the Study of Religion, LMU-Munich, Geschwister-Scholl-Platz 1, 80359 Munich, Germany; Munich Center for Neurosciences-Brain & Mind, Großhaderner Str. 2, 82152 Planegg-Martinsried, Germany; Institute of Philosophy, School of Advanced Study, University of London, London WC1E 7HU, United Kingdom
| |
Collapse
|
23
|
Veugen LCE, van Opstal AJ, van Wanrooij MM. Reaction Time Sensitivity to Spectrotemporal Modulations of Sound. Trends Hear 2022; 26:23312165221127589. [PMID: 36172759 PMCID: PMC9523861 DOI: 10.1177/23312165221127589] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 07/18/2022] [Accepted: 09/02/2022] [Indexed: 11/24/2022] Open
Abstract
We tested whether sensitivity to acoustic spectrotemporal modulations can be observed from reaction times for normal-hearing and impaired-hearing conditions. In a manual reaction-time task, normal-hearing listeners had to detect the onset of a ripple (with density between 0-8 cycles/octave and a fixed modulation depth of 50%), that moved up or down the log-frequency axis at constant velocity (between 0-64 Hz), in an otherwise-unmodulated broadband white-noise. Spectral and temporal modulations elicited band-pass filtered sensitivity characteristics, with fastest detection rates around 1 cycle/oct and 32 Hz for normal-hearing conditions. These results closely resemble data from other studies that typically used the modulation-depth threshold as a sensitivity criterion. To simulate hearing-impairment, stimuli were processed with a 6-channel cochlear-implant vocoder, and a hearing-aid simulation that introduced separate spectral smearing and low-pass filtering. Reaction times were always much slower compared to normal hearing, especially for the highest spectral densities. Binaural performance was predicted well by the benchmark race model of binaural independence, which models statistical facilitation of independent monaural channels. For the impaired-hearing simulations this implied a "best-of-both-worlds" principle in which the listeners relied on the hearing-aid ear to detect spectral modulations, and on the cochlear-implant ear for temporal-modulation detection. Although singular-value decomposition indicated that the joint spectrotemporal sensitivity matrix could be largely reconstructed from independent temporal and spectral sensitivity functions, in line with time-spectrum separability, a substantial inseparable spectral-temporal interaction was present in all hearing conditions. These results suggest that the reaction-time task yields a valid and effective objective measure of acoustic spectrotemporal-modulation sensitivity.
Collapse
Affiliation(s)
- Lidwien C. E. Veugen
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - A. John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| | - Marc M. van Wanrooij
- Department of Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
24
|
Yu H, Wang A, Li Q, Liu Y, Yang J, Takahashi S, Ejima Y, Zhang M, Wu J. Semantically Congruent Bimodal Presentation with Divided-Modality Attention Accelerates Unisensory Working Memory Retrieval. Perception 2021; 50:917-932. [PMID: 34841972 DOI: 10.1177/03010066211052943] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Although previous studies have shown that semantic multisensory integration can be differentially modulated by attention focus, it remains unclear whether attentionally mediated multisensory perceptual facilitation could impact further cognitive performance. Using a delayed matching-to-sample paradigm, the present study investigated the effect of semantically congruent bimodal presentation on subsequent unisensory working memory (WM) performance by manipulating attention focus. The results showed that unisensory WM retrieval was faster in the semantically congruent condition than in the incongruent multisensory encoding condition. However, such a result was only found in the divided-modality attention condition. This result indicates that a robust multisensory representation was constructed during semantically congruent multisensory encoding with divided-modality attention; this representation then accelerated unisensory WM performance, especially auditory WM retrieval. Additionally, an overall faster unisensory WM retrieval was observed under the modality-specific selective attention condition compared with the divided-modality condition, indicating that the division of attention to address two modalities demanded more central executive resources to encode and integrate crossmodal information and to maintain a constructed multisensory representation, leaving few resources for WM retrieval. Additionally, the present finding may support the amodal view that WM has an amodal central storage component that is used to maintain modal-based attention-optimized multisensory representations.
Collapse
Affiliation(s)
- Hongtao Yu
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, 12997Okayama University, Japan
| | - Aijun Wang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, 12582Soochow University, Suzhou, China
| | | | | | | | | | - Yoshimichi Ejima
- Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, 12997Okayama University, Japan
| | - Ming Zhang
- Department of Psychology, Research Center for Psychology and Behavioral Sciences, Soochow University, Suzhou, China; Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, 12997Okayama University, Japan
| | - Jinglong Wu
- Research Center for Medical Artificial Intelligence, Shenzhen Institute of Advanced Technology, Chinese Academy of Science, Shenzhen, China; Cognitive Neuroscience Laboratory, Graduate School of Interdisciplinary Science and Engineering in Health Systems, 12997Okayama University, Japan
| |
Collapse
|
25
|
van de Rijt LPH, van Opstal AJ, van Wanrooij MM. Multisensory Integration-Attention Trade-Off in Cochlear-Implanted Deaf Individuals. Front Neurosci 2021; 15:683804. [PMID: 34393707 PMCID: PMC8358073 DOI: 10.3389/fnins.2021.683804] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 06/21/2021] [Indexed: 11/13/2022] Open
Abstract
The cochlear implant (CI) allows profoundly deaf individuals to partially recover hearing. Still, due to the coarse acoustic information provided by the implant, CI users have considerable difficulties in recognizing speech, especially in noisy environments. CI users therefore rely heavily on visual cues to augment speech recognition, more so than normal-hearing individuals. However, it is unknown how attention to one (focused) or both (divided) modalities plays a role in multisensory speech recognition. Here we show that unisensory speech listening and reading were negatively impacted in divided-attention tasks for CI users-but not for normal-hearing individuals. Our psychophysical experiments revealed that, as expected, listening thresholds were consistently better for the normal-hearing, while lipreading thresholds were largely similar for the two groups. Moreover, audiovisual speech recognition for normal-hearing individuals could be described well by probabilistic summation of auditory and visual speech recognition, while CI users were better integrators than expected from statistical facilitation alone. Our results suggest that this benefit in integration comes at a cost. Unisensory speech recognition is degraded for CI users when attention needs to be divided across modalities. We conjecture that CI users exhibit an integration-attention trade-off. They focus solely on a single modality during focused-attention tasks, but need to divide their limited attentional resources in situations with uncertainty about the upcoming stimulus modality. We argue that in order to determine the benefit of a CI for speech recognition, situational factors need to be discounted by presenting speech in realistic or complex audiovisual environments.
Collapse
Affiliation(s)
- Luuk P. H. van de Rijt
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboudumc, Nijmegen, Netherlands
| | - A. John van Opstal
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Marc M. van Wanrooij
- Department of Biophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| |
Collapse
|
26
|
Llorach G, Kirschner F, Grimm G, Zokoll MA, Wagener KC, Hohmann V. Development and evaluation of video recordings for the OLSA matrix sentence test. Int J Audiol 2021; 61:311-321. [PMID: 34109902 DOI: 10.1080/14992027.2021.1930205] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
OBJECTIVE The aim was to create and validate an audiovisual version of the German matrix sentence test (MST), which uses the existing audio-only speech material. DESIGN Video recordings were recorded and dubbed with the audio of the existing German MST. The current study evaluates the MST in conditions including audio and visual modalities, speech in quiet and noise, and open and closed-set response formats. SAMPLE One female talker recorded repetitions of the German MST sentences. Twenty-eight young normal-hearing participants completed the evaluation study. RESULTS The audiovisual benefit in quiet was 7.0 dB in sound pressure level (SPL). In noise, the audiovisual benefit was 4.9 dB in signal-to-noise ratio (SNR). Speechreading scores ranged from 0% to 84% speech reception in visual-only sentences (mean = 50%). Audiovisual speech reception thresholds (SRTs) had a larger standard deviation than audio-only SRTs. Audiovisual SRTs improved successively with increasing number of lists performed. The final video recordings are openly available. CONCLUSIONS The video material achieved similar results as the literature in terms of gross speech intelligibility, despite the inherent asynchronies of dubbing. Due to ceiling effects, adaptive procedures targeting 80% intelligibility should be used. At least one or two training lists should be performed.
Collapse
Affiliation(s)
- Gerard Llorach
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Frederike Kirschner
- Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Giso Grimm
- Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Melanie A Zokoll
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Kirsten C Wagener
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Hörtech gGmbH, Oldenburg, Germany
| | - Volker Hohmann
- Hörzentrum Oldenburg GmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
27
|
Dias JW, McClaskey CM, Harris KC. Audiovisual speech is more than the sum of its parts: Auditory-visual superadditivity compensates for age-related declines in audible and lipread speech intelligibility. Psychol Aging 2021; 36:520-530. [PMID: 34124922 PMCID: PMC8427734 DOI: 10.1037/pag0000613] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Multisensory input can improve perception of ambiguous unisensory information. For example, speech heard in noise can be more accurately identified when listeners see a speaker's articulating face. Importantly, these multisensory effects can be superadditive to listeners' ability to process unisensory speech, such that audiovisual speech identification is better than the sum of auditory-only and visual-only speech identification. Age-related declines in auditory and visual speech perception have been hypothesized to be concomitant with stronger cross-sensory influences on audiovisual speech identification, but little evidence exists to support this. Currently, studies do not account for the multisensory superadditive benefit of auditory-visual input in their metrics of the auditory or visual influence on audiovisual speech perception. Here we treat multisensory superadditivity as independent from unisensory auditory and visual processing. In the current investigation, older and younger adults identified auditory, visual, and audiovisual speech in noisy listening conditions. Performance across these conditions was used to compute conventional metrics of the auditory and visual influence on audiovisual speech identification and a metric of auditory-visual superadditivity. Consistent with past work, auditory and visual speech identification declined with age, audiovisual speech identification was preserved, and no age-related differences in the auditory or visual influence on audiovisual speech identification were observed. However, we found that auditory-visual superadditivity improved with age. The novel findings suggest that multisensory superadditivity is independent of unisensory processing. As auditory and visual speech identification decline with age, compensatory changes in multisensory superadditivity may preserve audiovisual speech identification in older adults. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
Collapse
Affiliation(s)
- James W Dias
- Department of Otolaryngology-Head and Neck Surgery
| | | | | |
Collapse
|
28
|
Lee HJ, Lee JM, Choi JY, Jung J. The Effects of Preoperative Audiovisual Speech Perception on the Audiologic Outcomes of Cochlear Implantation in Patients with Postlingual Deafness. Audiol Neurootol 2020; 26:149-156. [PMID: 33352550 DOI: 10.1159/000509969] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 07/06/2020] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION Patients with postlingual deafness usually depend on visual information for communication, and their lipreading ability could influence cochlear implantation (CI) outcomes. However, it is unclear whether preoperative visual dependency in postlingual deafness positively or negatively affects auditory rehabilitation after CI. Herein, we investigated the influence of preoperative audiovisual per-ception on CI outcomes. METHOD In this retrospective case-comparison study, 118 patients with postlingual deafness who underwent unilateral CI were enrolled. Evaluation of speech perception was performed under both audiovisual (AV) and audio-only (AO) conditions before and after CI. Before CI, the speech perception test was performed under hearing aid (HA)-assisted conditions. After CI, the speech perception test was performed under the CI-only condition. Only patients with a 10% or less preoperative AO speech perception score were included. RESULTS Multivariable regression analysis showed that age, gender, residual hearing, operation side, education level, and HA usage were not correlated with either postoperative AV (pAV) or AO (pAO) speech perception. However, duration of deafness showed a significant negative correlation with both pAO (p = 0.003) and pAV (p = 0.015) speech perceptions. Notably, the preoperative AV speech perception score was not correlated with pAO speech perception (R2 = 0.00134, p = 0.693) but was positively associated with pAV speech perception (R2 = 0.0731, p = 0.003). CONCLUSION Preoperative dependency on audiovisual information may positively influence pAV speech perception in patients with postlingual deafness.
Collapse
Affiliation(s)
- Hyun Jin Lee
- Department of Otolaryngology-Head and Neck Surgery, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Jeon Mi Lee
- Department of Otorhinolaryngology, Ilsan Paik Hospital, Inje University College of Medicine, Goyang, Republic of Korea
| | - Jae Young Choi
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jinsei Jung
- Department of Otorhinolaryngology, Yonsei University College of Medicine, Seoul, Republic of Korea,
| |
Collapse
|
29
|
Kirkels LAMH, Dorman R, Wezel RJAV. Perceptual Coupling Based on Depth and Motion Cues in Stereovision-Impaired Subjects. Perception 2020; 49:1101-1114. [PMID: 32903161 PMCID: PMC7605051 DOI: 10.1177/0301006620952058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
When an object is partially occluded, the different parts of the object
have to be perceptually coupled. Cues that can be used for perceptual
coupling are, for instance, depth ordering and visual motion
information. In subjects with impaired stereovision, the brain is less
able to use stereoscopic depth cues, making them more reliant on other
cues. Therefore, our hypothesis is that stereovision-impaired subjects
have stronger motion coupling than stereoscopic subjects. We compared
perceptual coupling in 8 stereoscopic and 10 stereovision-impaired
subjects, using random moving dot patterns that defined an ambiguous
rotating cylinder and a coaxially presented nonambiguous half
cylinder. Our results show that, whereas stereoscopic subjects exhibit
significant coupling in the far plane, stereovision-impaired subjects
show no coupling and under our conditions also no stronger motion
coupling than stereoscopic subjects.
Collapse
Affiliation(s)
- Laurens A M H Kirkels
- Donders Institute for Brain, Cognition and Behaviour, Department of Biophysisc, Radboud University, The Netherlands
| | - Reinder Dorman
- Swammerdam Institute for Life Sciences, University of Amsterdam, The Netherlands
| | - Richard J A van Wezel
- Donders Institute for Brain, Cognition and Behaviour, Department of Biophysisc, Radboud University, The Netherlands.,TechMed Centre, Department of Biomedical Signals and Systems, University of Twente, The Netherlands
| |
Collapse
|