1
|
Cusimano M, Hewitt LB, McDermott JH. Listening with generative models. Cognition 2024; 253:105874. [PMID: 39216190 DOI: 10.1016/j.cognition.2024.105874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 03/31/2024] [Accepted: 07/03/2024] [Indexed: 09/04/2024]
Abstract
Perception has long been envisioned to use an internal model of the world to explain the causes of sensory signals. However, such accounts have historically not been testable, typically requiring intractable search through the space of possible explanations. Using auditory scenes as a case study, we leveraged contemporary computational tools to infer explanations of sounds in a candidate internal generative model of the auditory world (ecologically inspired audio synthesizers). Model inferences accounted for many classic illusions. Unlike traditional accounts of auditory illusions, the model is applicable to any sound, and exhibited human-like perceptual organization for real-world sound mixtures. The combination of stimulus-computability and interpretable model structure enabled 'rich falsification', revealing additional assumptions about sound generation needed to account for perception. The results show how generative models can account for the perception of both classic illusions and everyday sensory signals, and illustrate the opportunities and challenges involved in incorporating them into theories of perception.
Collapse
Affiliation(s)
- Maddie Cusimano
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America.
| | - Luke B Hewitt
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, United States of America; McGovern Institute, Massachusetts Institute of Technology, United States of America; Center for Brains Minds and Machines, Massachusetts Institute of Technology, United States of America; Speech and Hearing Bioscience and Technology, Harvard University, United States of America.
| |
Collapse
|
2
|
Lertpoompunya A, Ozmeral EJ, Higgins NC, Eddins DA. Head-orienting behaviors during simultaneous speech detection and localization. Front Psychol 2024; 15:1425972. [PMID: 39355293 PMCID: PMC11442202 DOI: 10.3389/fpsyg.2024.1425972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/04/2024] [Indexed: 10/03/2024] Open
Abstract
Head movement plays a vital role in auditory processing by contributing to spatial awareness and the ability to identify and locate sound sources. Here we investigate head-orienting behaviors using a dual-task experimental paradigm to measure: (a) localization of a speech source; and (b) detection of meaningful speech (numbers), within a complex acoustic background. Ten younger adults with normal hearing and 20 older adults with mild-to-severe sensorineural hearing loss were evaluated in the free field on two head-movement conditions: (1) head fixed to the front and (2) head moving to a source location; and two context conditions: (1) with audio only or (2) with audio plus visual cues. Head-tracking analyses quantified the target location relative to head location, as well as the peak velocity during head movements. Evaluation of head-orienting behaviors revealed that both groups tended to undershoot the auditory target for targets beyond 60° in azimuth. Listeners with hearing loss had higher head-turn errors than the normal-hearing listeners, even when a visual location cue was provided. Digit detection accuracy was better for the normal-hearing than hearing-loss groups, with a main effect of signal-to-noise ratio (SNR). When performing the dual-task paradigm in the most difficult listening environments, participants consistently demonstrated a wait-and-listen head-movement strategy, characterized by a short pause during which they maintained their head orientation and gathered information before orienting to the target location.
Collapse
Affiliation(s)
- Angkana Lertpoompunya
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
- Department of Communication Sciences and Disorders, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
| | - Nathan C. Higgins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, FL, United States
- Department of Communication Sciences and Disorders, Faculty of Medicine, Ramathibodi Hospital, Mahidol University, Bangkok, Thailand
- Department of Communication Sciences and Disorders, University of Central Florida, Orlando, FL, United States
| |
Collapse
|
3
|
Gómez-Vicente V, Esquiva G, Lancho C, Benzerdjeb K, Jerez AA, Ausó E. Importance of Visual Support Through Lipreading in the Identification of Words in Spanish Language. LANGUAGE AND SPEECH 2024:238309241270741. [PMID: 39189455 DOI: 10.1177/00238309241270741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
We sought to examine the contribution of visual cues, such as lipreading, in the identification of familiar (words) and unfamiliar (phonemes) words in terms of percent accuracy. For that purpose, in this retrospective study, we presented lists of words and phonemes (adult female healthy voice) in auditory (A) and audiovisual (AV) modalities to 65 Spanish normal-hearing male and female listeners classified in four age groups. Our results showed a remarkable benefit of AV information in word and phoneme recognition. Regarding gender, women exhibited better performance than men in both A and AV modalities, although we only found significant differences for words but not for phonemes. Concerning age, significant differences were detected in word recognition in the A modality between the youngest (18-29 years old) and oldest (⩾50 years old) groups only. We conclude visual information enhances word and phoneme recognition and women are more influenced by visual signals than men in AV speech perception. On the contrary, it seems that, overall, age is not a limiting factor for word recognition, with no significant differences observed in the AV modality.
Collapse
Affiliation(s)
| | - Gema Esquiva
- Department of Optics, Pharmacology and Anatomy, University of Alicante, Spain; Alicante Institute for Health and Biomedical Research (ISABIAL), Spain
| | - Carmen Lancho
- Data Science Laboratory, University Rey Juan Carlos, Spain
| | | | | | - Eva Ausó
- Department of Optics, Pharmacology and Anatomy, University of Alicante, Spain
| |
Collapse
|
4
|
Rotaru I, Geirnaert S, Heintz N, Van de Ryck I, Bertrand A, Francart T. What are we reallydecoding? Unveiling biases in EEG-based decoding of the spatial focus of auditory attention. J Neural Eng 2024; 21:016017. [PMID: 38266281 DOI: 10.1088/1741-2552/ad2214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 01/24/2024] [Indexed: 01/26/2024]
Abstract
Objective.Spatial auditory attention decoding (Sp-AAD) refers to the task of identifying the direction of the speaker to which a person is attending in a multi-talker setting, based on the listener's neural recordings, e.g. electroencephalography (EEG). The goal of this study is to thoroughly investigate potential biases when training such Sp-AAD decoders on EEG data, particularly eye-gaze biases and latent trial-dependent confounds, which may result in Sp-AAD models that decode eye-gaze or trial-specific fingerprints rather than spatial auditory attention.Approach.We designed a two-speaker audiovisual Sp-AAD protocol in which the spatial auditory and visual attention were enforced to be either congruent or incongruent, and we recorded EEG data from sixteen participants undergoing several trials recorded at distinct timepoints. We trained a simple linear model for Sp-AAD based on common spatial patterns filters in combination with either linear discriminant analysis (LDA) or k-means clustering, and evaluated them both across- and within-trial.Main results.We found that even a simple linear Sp-AAD model is susceptible to overfitting to confounding signal patterns such as eye-gaze and trial fingerprints (e.g. due to feature shifts across trials), resulting in artificially high decoding accuracies. Furthermore, we found that changes in the EEG signal statistics across trials deteriorate the trial generalization of the classifier, even when the latter is retrained on the test trial with an unsupervised algorithm.Significance.Collectively, our findings confirm that there exist subtle biases and confounds that can strongly interfere with the decoding of spatial auditory attention from EEG. It is expected that more complicated non-linear models based on deep neural networks, which are often used for Sp-AAD, are even more vulnerable to such biases. Future work should perform experiments and model evaluations that avoid and/or control for such biases in Sp-AAD tasks.
Collapse
Affiliation(s)
- Iustina Rotaru
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
| | - Simon Geirnaert
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
- Leuven.AI-KU Leuven Institute for AI, Leuven, Belgium
| | - Nicolas Heintz
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
- Leuven.AI-KU Leuven Institute for AI, Leuven, Belgium
| | - Iris Van de Ryck
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
| | - Alexander Bertrand
- Department of Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001 Leuven, Belgium
- Leuven.AI-KU Leuven Institute for AI, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49 bus 721, B-3000 Leuven, Belgium
- Leuven.AI-KU Leuven Institute for AI, Leuven, Belgium
| |
Collapse
|
5
|
Zeng B, Yu G, Hasshim N, Hong S. Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones. J Eye Mov Res 2023; 16:10.16910/jemr.16.4.4. [PMID: 38585238 PMCID: PMC10997307 DOI: 10.16910/jemr.16.4.4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/09/2024] Open
Abstract
The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers' eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants' eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information.
Collapse
Affiliation(s)
- Biao Zeng
- University of South Wales, Pontypridd, UK
| | | | | | - Shanhu Hong
- Quanzhou Preschool Education College, Quanzhou, China
| |
Collapse
|
6
|
Lewis D, Al-Salim S, McDermott T, Dergan A, McCreery RW. Impact of room acoustics and visual cues on speech perception and talker localization by children with mild bilateral or unilateral hearing loss. Front Pediatr 2023; 11:1252452. [PMID: 38078311 PMCID: PMC10703386 DOI: 10.3389/fped.2023.1252452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/30/2023] [Indexed: 02/12/2024] Open
Abstract
Introduction This study evaluated the ability of children (8-12 years) with mild bilateral or unilateral hearing loss (MBHL/UHL) listening unaided, or normal hearing (NH) to locate and understand talkers in varying auditory/visual acoustic environments. Potential differences across hearing status were examined. Methods Participants heard sentences presented by female talkers from five surrounding locations in varying acoustic environments. A localization-only task included two conditions (auditory only, visually guided auditory) in three acoustic environments (favorable, typical, poor). Participants were asked to locate each talker. A speech perception task included four conditions [auditory-only, visually guided auditory, audiovisual, auditory-only from 0° azimuth (baseline)] in a single acoustic environment. Participants were asked to locate talkers, then repeat what was said. Results In the localization-only task, participants were better able to locate talkers and looking times were shorter with visual guidance to talker location. Correct looking was poorest and looking times longest in the poor acoustic environment. There were no significant effects of hearing status/age. In the speech perception task, performance was highest in the audiovisual condition and was better in the visually guided and auditory-only conditions than in the baseline condition. Although audiovisual performance was best overall, children with MBHL or UHL performed more poorly than peers with NH. Better-ear pure-tone averages for children with MBHL had a greater effect on keyword understanding than did poorer-ear pure-tone averages for children with UHL. Conclusion Although children could locate talkers more easily and quickly with visual information, finding locations alone did not improve speech perception. Best speech perception occurred in the audiovisual condition; however, poorer performance by children with MBHL or UHL suggested that being able to see talkers did not overcome reduced auditory access. Children with UHL exhibited better speech perception than children with MBHL, supporting benefits of NH in at least one ear.
Collapse
Affiliation(s)
- Dawna Lewis
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE, United States
- Auditory Perception and Cognition Laboratory, Boys Town National Research Hospital, Omaha, NE, United States
| | - Sarah Al-Salim
- Clinical Measurement Program, Boys Town National Research Hospital, Omaha, NE, United States
| | - Tessa McDermott
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE, United States
| | - Andrew Dergan
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE, United States
| | - Ryan W. McCreery
- Auditory Perception and Cognition Laboratory, Boys Town National Research Hospital, Omaha, NE, United States
| |
Collapse
|
7
|
Valzolgher C, Alzaher M, Gaveau V, Coudert A, Marx M, Truy E, Barone P, Farnè A, Pavani F. Capturing Visual Attention With Perturbed Auditory Spatial Cues. Trends Hear 2023; 27:23312165231182289. [PMID: 37611181 PMCID: PMC10467228 DOI: 10.1177/23312165231182289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 05/25/2023] [Accepted: 05/29/2023] [Indexed: 08/25/2023] Open
Abstract
Lateralized sounds can orient visual attention, with benefits for audio-visual processing. Here, we asked to what extent perturbed auditory spatial cues-resulting from cochlear implants (CI) or unilateral hearing loss (uHL)-allow this automatic mechanism of information selection from the audio-visual environment. We used a classic paradigm from experimental psychology (capture of visual attention with sounds) to probe the integrity of audio-visual attentional orienting in 60 adults with hearing loss: bilateral CI users (N = 20), unilateral CI users (N = 20), and individuals with uHL (N = 20). For comparison, we also included a group of normal-hearing (NH, N = 20) participants, tested in binaural and monaural listening conditions (i.e., with one ear plugged). All participants also completed a sound localization task to assess spatial hearing skills. Comparable audio-visual orienting was observed in bilateral CI, uHL, and binaural NH participants. By contrast, audio-visual orienting was, on average, absent in unilateral CI users and reduced in NH listening with one ear plugged. Spatial hearing skills were better in bilateral CI, uHL, and binaural NH participants than in unilateral CI users and monaurally plugged NH listeners. In unilateral CI users, spatial hearing skills correlated with audio-visual-orienting abilities. These novel results show that audio-visual-attention orienting can be preserved in bilateral CI users and in uHL patients to a greater extent than unilateral CI users. This highlights the importance of assessing the impact of hearing loss beyond auditory difficulties alone: to capture to what extent it may enable or impede typical interactions with the multisensory environment.
Collapse
Affiliation(s)
- Chiara Valzolgher
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
- Integrative, Multisensory, Perception, Action and Cognition Team, Lyon Neuroscience Research Center, Lyon, France
| | - Mariam Alzaher
- Centre de Recherche Cerveau & Cognition, Toulouse, France
- Hospices Civils, Toulouse, France
| | - Valérie Gaveau
- Integrative, Multisensory, Perception, Action and Cognition Team, Lyon Neuroscience Research Center, Lyon, France
| | | | - Mathieu Marx
- Centre de Recherche Cerveau & Cognition, Toulouse, France
- Hospices Civils, Toulouse, France
| | - Eric Truy
- Integrative, Multisensory, Perception, Action and Cognition Team, Lyon Neuroscience Research Center, Lyon, France
- Hospices Civils de Lyon, Lyon, France
| | - Pascal Barone
- Centre de Recherche Cerveau & Cognition, Toulouse, France
| | - Alessandro Farnè
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
- Integrative, Multisensory, Perception, Action and Cognition Team, Lyon Neuroscience Research Center, Lyon, France
- Neuro-immersion, Lyon, France
| | - Francesco Pavani
- Center for Mind/Brain Sciences - CIMeC, University of Trento, Rovereto, Italy
- Integrative, Multisensory, Perception, Action and Cognition Team, Lyon Neuroscience Research Center, Lyon, France
- Centro Interuniversitario di Ricerca « Cognizione, Linguaggio e Sordità », Rovereto, Italy
| |
Collapse
|
8
|
Hládek Ľ, Seeber BU. Speech Intelligibility in Reverberation is Reduced During Self-Rotation. Trends Hear 2023; 27:23312165231188619. [PMID: 37475460 PMCID: PMC10363862 DOI: 10.1177/23312165231188619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 06/23/2023] [Accepted: 07/02/2023] [Indexed: 07/22/2023] Open
Abstract
Speech intelligibility in cocktail party situations has been traditionally studied for stationary sound sources and stationary participants. Here, speech intelligibility and behavior were investigated during active self-rotation of standing participants in a spatialized speech test. We investigated if people would rotate to improve speech intelligibility, and we asked if knowing the target location would be further beneficial. Target sentences randomly appeared at one of four possible locations: 0°, ± 90°, 180° relative to the participant's initial orientation on each trial, while speech-shaped noise was presented from the front (0°). Participants responded naturally with self-rotating motion. Target sentences were presented either without (Audio-only) or with a picture of an avatar (Audio-Visual). In a baseline (Static) condition, people were standing still without visual location cues. Participants' self-orientation undershot the target location and orientations were close to acoustically optimal. Participants oriented more often in an acoustically optimal way, and speech intelligibility was higher in the Audio-Visual than in the Audio-only condition for the lateral targets. The intelligibility of the individual words in Audio-Visual and Audio-only increased during self-rotation towards the rear target, but it was reduced for the lateral targets when compared to Static, which could be mostly, but not fully, attributed to changes in spatial unmasking. Speech intelligibility prediction based on a model of static spatial unmasking considering self-rotations overestimated the participant performance by 1.4 dB. The results suggest that speech intelligibility is reduced during self-rotation, and that visual cues of location help to achieve more optimal self-rotations and better speech intelligibility.
Collapse
Affiliation(s)
- Ľuboš Hládek
- Audio Information Processing, Technical University of Munich, Munich, Germany
| | - Bernhard U. Seeber
- Audio Information Processing, Technical University of Munich, Munich, Germany
| |
Collapse
|
9
|
Cappelloni MS, Mateo VS, Maddox RK. Performance in an Audiovisual Selective Attention Task Using Speech-Like Stimuli Depends on the Talker Identities, But Not Temporal Coherence. Trends Hear 2023; 27:23312165231207235. [PMID: 37847849 PMCID: PMC10586009 DOI: 10.1177/23312165231207235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 09/25/2023] [Accepted: 09/26/2023] [Indexed: 10/19/2023] Open
Abstract
Audiovisual integration of speech can benefit the listener by not only improving comprehension of what a talker is saying but also helping a listener select a particular talker's voice from a mixture of sounds. Binding, an early integration of auditory and visual streams that helps an observer allocate attention to a combined audiovisual object, is likely involved in processing audiovisual speech. Although temporal coherence of stimulus features across sensory modalities has been implicated as an important cue for non-speech stimuli (Maddox et al., 2015), the specific cues that drive binding in speech are not fully understood due to the challenges of studying binding in natural stimuli. Here we used speech-like artificial stimuli that allowed us to isolate three potential contributors to binding: temporal coherence (are the face and the voice changing synchronously?), articulatory correspondence (do visual faces represent the correct phones?), and talker congruence (do the face and voice come from the same person?). In a trio of experiments, we examined the relative contributions of each of these cues. Normal hearing listeners performed a dual task in which they were instructed to respond to events in a target auditory stream while ignoring events in a distractor auditory stream (auditory discrimination) and detecting flashes in a visual stream (visual detection). We found that viewing the face of a talker who matched the attended voice (i.e., talker congruence) offered a performance benefit. We found no effect of temporal coherence on performance in this task, prompting an important recontextualization of previous findings.
Collapse
Affiliation(s)
- Madeline S. Cappelloni
- Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
| | - Vincent S. Mateo
- Audio and Music Engineering, University of Rochester, Rochester, NY, USA
| | - Ross K. Maddox
- Biomedical Engineering, University of Rochester, Rochester, NY, USA
- Center for Visual Science, University of Rochester, Rochester, NY, USA
- Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, USA
- Neuroscience, University of Rochester, Rochester, NY, USA
| |
Collapse
|
10
|
Kachlicka M, Laffere A, Dick F, Tierney A. Slow phase-locked modulations support selective attention to sound. Neuroimage 2022; 252:119024. [PMID: 35231629 PMCID: PMC9133470 DOI: 10.1016/j.neuroimage.2022.119024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/16/2022] [Accepted: 02/19/2022] [Indexed: 11/16/2022] Open
Abstract
To make sense of complex soundscapes, listeners must select and attend to task-relevant streams while ignoring uninformative sounds. One possible neural mechanism underlying this process is alignment of endogenous oscillations with the temporal structure of the target sound stream. Such a mechanism has been suggested to mediate attentional modulation of neural phase-locking to the rhythms of attended sounds. However, such modulations are compatible with an alternate framework, where attention acts as a filter that enhances exogenously-driven neural auditory responses. Here we attempted to test several predictions arising from the oscillatory account by playing two tone streams varying across conditions in tone duration and presentation rate; participants attended to one stream or listened passively. Attentional modulation of the evoked waveform was roughly sinusoidal and scaled with rate, while the passive response did not. However, there was only limited evidence for continuation of modulations through the silence between sequences. These results suggest that attentionally-driven changes in phase alignment reflect synchronization of slow endogenous activity with the temporal structure of attended stimuli.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England; Division of Psychology & Language Sciences, UCL, Gower Street, London WC1E 6BT, England
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England.
| |
Collapse
|
11
|
Can visual capture of sound separate auditory streams? Exp Brain Res 2022; 240:813-824. [PMID: 35048159 DOI: 10.1007/s00221-021-06281-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 11/21/2021] [Indexed: 11/04/2022]
Abstract
In noisy contexts, sound discrimination improves when the auditory sources are separated in space. This phenomenon, named Spatial Release from Masking (SRM), arises from the interaction between the auditory information reaching the ear and spatial attention resources. To examine the relative contribution of these two factors, we exploited an audio-visual illusion in a hearing-in-noise task to create conditions in which the initial stimulation to the ears is held constant, while the perceived separation between speech and masker is changed illusorily (visual capture of sound). In two experiments, we asked participants to identify a string of five digits pronounced by a female voice, embedded in either energetic (Experiment 1) or informational (Experiment 2) noise, before reporting the perceived location of the heard digits. Critically, the distance between target digits and masking noise was manipulated both physically (from 22.5 to 75.0 degrees) and illusorily, by pairing target sounds with visual stimuli either at same (audio-visual congruent) or different positions (15 degrees offset, leftward or rightward: audio-visual incongruent). The proportion of correctly reported digits increased with the physical separation between the target and masker, as expected from SRM. However, despite effective visual capture of sounds, performance was not modulated by illusory changes of target sound position. Our results are compatible with a limited role of central factors in the SRM phenomenon, at least in our experimental setting. Moreover, they add to the controversial literature on the limited effects of audio-visual capture in auditory stream separation.
Collapse
|
12
|
Eckert MA, Teubner-Rhodes S, Vaden KI, Ahlstrom JB, McClaskey CM, Dubno JR. Unique patterns of hearing loss and cognition in older adults' neural responses to cues for speech recognition difficulty. Brain Struct Funct 2022; 227:203-218. [PMID: 34632538 PMCID: PMC9044122 DOI: 10.1007/s00429-021-02398-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 09/26/2021] [Indexed: 01/31/2023]
Abstract
Older adults with hearing loss experience significant difficulties understanding speech in noise, perhaps due in part to limited benefit from supporting executive functions that enable the use of environmental cues signaling changes in listening conditions. Here we examined the degree to which 41 older adults (60.56-86.25 years) exhibited cortical responses to informative listening difficulty cues that communicated the listening difficulty for each trial compared to neutral cues that were uninformative of listening difficulty. Word recognition was significantly higher for informative compared to uninformative cues in a + 10 dB signal-to-noise ratio (SNR) condition, and response latencies were significantly shorter for informative cues in the + 10 dB SNR and the more-challenging + 2 dB SNR conditions. Informative cues were associated with elevated blood oxygenation level-dependent contrast in visual and parietal cortex. A cue-SNR interaction effect was observed in the cingulo-opercular (CO) network, such that activity only differed between SNR conditions when an informative cue was presented. That is, participants used the informative cues to prepare for changes in listening difficulty from one trial to the next. This cue-SNR interaction effect was driven by older adults with more low-frequency hearing loss and was not observed for those with more high-frequency hearing loss, poorer set-shifting task performance, and lower frontal operculum gray matter volume. These results suggest that proactive strategies for engaging CO adaptive control may be important for older adults with high-frequency hearing loss to optimize speech recognition in changing and challenging listening conditions.
Collapse
Affiliation(s)
- Mark A Eckert
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC, 29425-5500, USA.
| | | | - Kenneth I Vaden
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC, 29425-5500, USA
| | - Jayne B Ahlstrom
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC, 29425-5500, USA
| | - Carolyn M McClaskey
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC, 29425-5500, USA
| | - Judy R Dubno
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC, 29425-5500, USA
| |
Collapse
|
13
|
Visentin C, Valzolgher C, Pellegatti M, Potente P, Pavani F, Prodi N. A comparison of simultaneously-obtained measures of listening effort: pupil dilation, verbal response time and self-rating. Int J Audiol 2021; 61:561-573. [PMID: 34634214 DOI: 10.1080/14992027.2021.1921290] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
OBJECTIVE The aim of this study was to assess to what extent simultaneously-obtained measures of listening effort (task-evoked pupil dilation, verbal response time [RT], and self-rating) could be sensitive to auditory and cognitive manipulations in a speech perception task. The study also aimed to explore the possible relationship between RT and pupil dilation. DESIGN A within-group design was adopted. All participants were administered the Matrix Sentence Test in 12 conditions (signal-to-noise ratios [SNR] of -3, -6, -9 dB; attentional resources focussed vs divided; spatial priors present vs absent). STUDY SAMPLE Twenty-four normal-hearing adults, 20-41 years old (M = 23.5), were recruited in the study. RESULTS A significant effect of the SNR was found for all measures. However, pupil dilation discriminated only partially between the SNRs. Neither of the cognitive manipulations were effective in modulating the measures. No relationship emerged between pupil dilation, RT and self-ratings. CONCLUSIONS RT, pupil dilation, and self-ratings can be obtained simultaneously when administering speech perception tasks, even though some limitations remain related to the absence of a retention period after the listening phase. The sensitivity of the three measures to changes in the auditory environment differs. RTs and self-ratings proved most sensitive to changes in SNR.
Collapse
Affiliation(s)
- Chiara Visentin
- Department of Engineering, University of Ferrara, Ferrara, Italy
| | - Chiara Valzolgher
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy.,Centre de Recherche en Neuroscience de Lyon (CRNL), Integrative, Multisensory, Perception, Action and Cognition Team (IMPACT), Lyon, France
| | | | - Paola Potente
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy
| | - Francesco Pavani
- Center for Mind/Brain Sciences (CIMeC), University of Trento, Trento, Italy.,Centre de Recherche en Neuroscience de Lyon (CRNL), Integrative, Multisensory, Perception, Action and Cognition Team (IMPACT), Lyon, France.,Department of Psychology and Cognitive Sciences (DiPSCo), University of Trento, Trento, Italy
| | - Nicola Prodi
- Department of Engineering, University of Ferrara, Ferrara, Italy
| |
Collapse
|
14
|
Seol HY, Kang S, Lim J, Hong SH, Moon IJ. Feasibility of Virtual Reality Audiological Testing: Prospective Study. JMIR Serious Games 2021; 9:e26976. [PMID: 34463624 PMCID: PMC8441603 DOI: 10.2196/26976] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 05/13/2021] [Accepted: 05/29/2021] [Indexed: 11/25/2022] Open
Abstract
Background It has been noted in the literature that there is a gap between clinical assessment and real-world performance. Real-world conversations entail visual and audio information, yet there are not any audiological assessment tools that include visual information. Virtual reality (VR) technology has been applied to various areas, including audiology. However, the use of VR in speech-in-noise perception has not yet been investigated. Objective The purpose of this study was to investigate the impact of virtual space (VS) on speech performance and its feasibility to be used as a speech test instrument. We hypothesized that individuals’ ability to recognize speech would improve when visual cues were provided. Methods A total of 30 individuals with normal hearing and 25 individuals with hearing loss completed pure-tone audiometry and the Korean version of the Hearing in Noise Test (K-HINT) under three conditions—conventional K-HINT (cK-HINT), VS on PC (VSPC), and VS head-mounted display (VSHMD)—at –10 dB, –5 dB, 0 dB, and +5 dB signal-to-noise ratios (SNRs). Participants listened to target speech and repeated it back to the tester for all conditions. Hearing aid users in the hearing loss group completed testing under unaided and aided conditions. A questionnaire was administered after testing to gather subjective opinions on the headset, the VSHMD condition, and test preference. Results Provision of visual information had a significant impact on speech performance between the normal hearing and hearing impaired groups. The Mann-Whitney U test showed statistical significance (P<.05) between the two groups under all test conditions. Hearing aid use led to better integration of audio and visual cues. Statistical significance through the Mann-Whitney U test was observed for –5 dB (P=.04) and 0 dB (P=.02) SNRs under the cK-HINT condition, as well as for –10 dB (P=.007) and 0 dB (P=.04) SNRs under the VSPC condition, between hearing aid and non–hearing aid users. Participants reported positive responses across almost all items on the questionnaire except for the weight of the headset. Participants preferred a test method with visual imagery, but found the headset to be heavy. Conclusions Findings are in line with previous literature that showed that visual cues were beneficial for communication. This is the first study to include hearing aid users with a more naturalistic stimulus and a relatively simple test environment, suggesting the feasibility of VR audiological testing in clinical practice.
Collapse
Affiliation(s)
- Hye Yoon Seol
- Medical Research Institute, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea.,Hearing Research Laboratory, Samsung Medical Center, Seoul, Republic of Korea
| | - Soojin Kang
- Medical Research Institute, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea.,Hearing Research Laboratory, Samsung Medical Center, Seoul, Republic of Korea
| | - Jihyun Lim
- Center for Clinical Epidemiology, Samsung Medical Center, Seoul, Republic of Korea
| | - Sung Hwa Hong
- Hearing Research Laboratory, Samsung Medical Center, Seoul, Republic of Korea.,Department of Otolaryngology-Head & Neck Surgery, Samsung Changwon Hospital, Changwon, Republic of Korea
| | - Il Joon Moon
- Hearing Research Laboratory, Samsung Medical Center, Seoul, Republic of Korea.,Department of Otolaryngology-Head & Neck Surgery, Samsung Medical Center, Seoul, Republic of Korea
| |
Collapse
|
15
|
Turri S, Rizvi M, Rabini G, Melonio A, Gennari R, Pavani F. Orienting Auditory Attention through Vision: the Impact of Monaural Listening. Multisens Res 2021; 35:1-28. [PMID: 34384046 DOI: 10.1163/22134808-bja10059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2021] [Indexed: 11/19/2022]
Abstract
The understanding of linguistic messages can be made extremely complex by the simultaneous presence of interfering sounds, especially when they are also linguistic in nature. In two experiments, we tested if visual cues directing attention to spatial or temporal components of speech in noise can improve its identification. The hearing-in-noise task required identification of a five-digit sequence (target) embedded in a stream of time-reversed speech. Using a custom-built device located in front of the participant, we delivered visual cues to orient attention to the location of target sounds and/or their temporal window. In Exp. 1 ( n = 14), we validated this visual-to-auditory cueing method in normal-hearing listeners, tested under typical binaural listening conditions. In Exp. 2 ( n = 13), we assessed the efficacy of the same visual cues in normal-hearing listeners wearing a monaural ear plug, to study the effects of simulated monaural and conductive hearing loss on visual-to-auditory attention orienting. While Exp. 1 revealed a benefit of both spatial and temporal visual cues for hearing in noise, Exp. 2 showed that only the temporal visual cues remained effective during monaural listening. These findings indicate that when the acoustic experience is altered, visual-to-auditory attention orienting is more robust for temporal compared to spatial attributes of the auditory stimuli. These findings have implications for the relation between spatial and temporal attributes of sound objects, and when planning devices to orient audiovisual attention for subjects suffering from hearing loss.
Collapse
Affiliation(s)
- Silvia Turri
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,Dipartimento di Psicologia e Scienze Cognitive, Università di Trento, 38068 Rovereto, Italy
| | - Mehdi Rizvi
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Giuseppe Rabini
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy
| | - Alessandra Melonio
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Rosella Gennari
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Francesco Pavani
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,IMPACT, Centre de Recherche en Neurosciences de Lyon (CRNL), 69500 Bron, France
| |
Collapse
|
16
|
Clayton KK, Asokan MM, Watanabe Y, Hancock KE, Polley DB. Behavioral Approaches to Study Top-Down Influences on Active Listening. Front Neurosci 2021; 15:666627. [PMID: 34305516 PMCID: PMC8299106 DOI: 10.3389/fnins.2021.666627] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Accepted: 06/09/2021] [Indexed: 11/21/2022] Open
Abstract
The massive network of descending corticofugal projections has been long-recognized by anatomists, but their functional contributions to sound processing and auditory-guided behaviors remain a mystery. Most efforts to characterize the auditory corticofugal system have been inductive; wherein function is inferred from a few studies employing a wide range of methods to manipulate varying limbs of the descending system in a variety of species and preparations. An alternative approach, which we focus on here, is to first establish auditory-guided behaviors that reflect the contribution of top-down influences on auditory perception. To this end, we postulate that auditory corticofugal systems may contribute to active listening behaviors in which the timing of bottom-up sound cues can be predicted from top-down signals arising from cross-modal cues, temporal integration, or self-initiated movements. Here, we describe a behavioral framework for investigating how auditory perceptual performance is enhanced when subjects can anticipate the timing of upcoming target sounds. Our first paradigm, studied both in human subjects and mice, reports species-specific differences in visually cued expectation of sound onset in a signal-in-noise detection task. A second paradigm performed in mice reveals the benefits of temporal regularity as a perceptual grouping cue when detecting repeating target tones in complex background noise. A final behavioral approach demonstrates significant improvements in frequency discrimination threshold and perceptual sensitivity when auditory targets are presented at a predictable temporal interval following motor self-initiation of the trial. Collectively, these three behavioral approaches identify paradigms to study top-down influences on sound perception that are amenable to head-fixed preparations in genetically tractable animals, where it is possible to monitor and manipulate particular nodes of the descending auditory pathway with unparalleled precision.
Collapse
Affiliation(s)
- Kameron K. Clayton
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, United States
| | - Meenakshi M. Asokan
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, United States
| | - Yurika Watanabe
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, United States
| | - Kenneth E. Hancock
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, United States
- Department of Otolaryngology – Head and Neck Surgery, Harvard Medical School, Boston, MA, United States
| | - Daniel B. Polley
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, MA, United States
- Department of Otolaryngology – Head and Neck Surgery, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
17
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
18
|
Salanger M, Lewis D, Vallier T, McDermott T, Dergan A. Applying Virtual Reality to Audiovisual Speech Perception Tasks in Children. Am J Audiol 2020; 29:244-258. [PMID: 32250641 DOI: 10.1044/2020_aja-19-00004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Purpose The primary purpose of this study was to explore the efficacy of using virtual reality (VR) technology in hearing research with children by comparing speech perception abilities in a typical laboratory environment and a simulated VR classroom environment. Method The study included 48 final participants (40 children and eight young adults). The study design utilized a speech perception task in conjunction with a localization demand in auditory-only (AO) and auditory-visual (AV) conditions. Tasks were completed in simulated classroom acoustics in both a typical laboratory environment and in a virtual classroom environment accessed using an Oculus Rift head-mounted display. Results Speech perception scores were higher for AV conditions over AO conditions across age groups. In addition, interaction effects of environment (i.e., laboratory environment and VR classroom environment) and visual accessibility (i.e., AV vs. AO) indicated that children's performance on the speech perception task in the VR classroom was more similar to their performance in the laboratory environment for AV tasks than it was for AO tasks. AO tasks showed improvement in speech perception scores from the laboratory to the VR classroom environment, whereas AV conditions showed little significant change. Conclusion These results suggest that VR head-mounted displays are a viable research tool in AV tasks for children, increasing flexibility for audiovisual testing in a typical laboratory environment.
Collapse
Affiliation(s)
| | - Dawna Lewis
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE
| | - Timothy Vallier
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE
| | - Tessa McDermott
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE
| | - Andrew Dergan
- Listening and Learning Laboratory, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
19
|
Wang Y, Zhang J, Zou J, Luo H, Ding N. Prior Knowledge Guides Speech Segregation in Human Auditory Cortex. Cereb Cortex 2020; 29:1561-1571. [PMID: 29788144 DOI: 10.1093/cercor/bhy052] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/22/2018] [Accepted: 02/15/2018] [Indexed: 11/12/2022] Open
Abstract
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Collapse
Affiliation(s)
- Yuanye Wang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jianfeng Zhang
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiajie Zou
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Huan Luo
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, Zhejiang, China.,State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, Zhejiang, China.,Interdisciplinary Center for Social Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
20
|
AVATAR Assesses Speech Understanding and Multitask Costs in Ecologically Relevant Listening Situations. Ear Hear 2020; 41:521-531. [DOI: 10.1097/aud.0000000000000778] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
21
|
Laffere A, Dick F, Tierney A. Effects of auditory selective attention on neural phase: individual differences and short-term training. Neuroimage 2020; 213:116717. [PMID: 32165265 DOI: 10.1016/j.neuroimage.2020.116717] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/02/2020] [Accepted: 03/04/2020] [Indexed: 02/06/2023] Open
Abstract
How does the brain follow a sound that is mixed with others in a noisy environment? One possible strategy is to allocate attention to task-relevant time intervals. Prior work has linked auditory selective attention to alignment of neural modulations with stimulus temporal structure. However, since this prior research used relatively easy tasks and focused on analysis of main effects of attention across participants, relatively little is known about the neural foundations of individual differences in auditory selective attention. Here we investigated individual differences in auditory selective attention by asking participants to perform a 1-back task on a target auditory stream while ignoring a distractor auditory stream presented 180° out of phase. Neural entrainment to the attended auditory stream was strongly linked to individual differences in task performance. Some variability in performance was accounted for by degree of musical training, suggesting a link between long-term auditory experience and auditory selective attention. To investigate whether short-term improvements in auditory selective attention are possible, we gave participants 2 h of auditory selective attention training and found improvements in both task performance and enhancements of the effects of attention on neural phase angle. Our results suggest that although there exist large individual differences in auditory selective attention and attentional modulation of neural phase angle, this skill improves after a small amount of targeted training.
Collapse
Affiliation(s)
- Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK; Division of Psychology & Language Sciences, UCL, Gower Street, London, WC1E 6BT, UK
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
22
|
Leibold LJ, Buss E. Yes/no and two-interval forced-choice tasks with listener-based vs observer-based responses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1588. [PMID: 32237812 PMCID: PMC7067614 DOI: 10.1121/10.0000894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Revised: 02/20/2020] [Accepted: 02/24/2020] [Indexed: 06/11/2023]
Abstract
Observer-based procedures are used to assess auditory behavior in infants, often incorporating adaptive tracking algorithms. These procedures are reliable, but effects of modifications made to accommodate infant testing are not fully understood. One modification is that observation intervals are undefined for the listener, introducing signal-temporal uncertainty and increasing the likelihood that listener response bias will influence estimates of performance. The effect of these factors was evaluated by comparing threshold estimates obtained from adults using two tasks: (1) single-interval, yes/no and (2) two-interval, forced-choice. Detection thresholds were estimated adaptively for a 1000-Hz FM tone in quiet and for a word presented in two-talker speech masking. Trials were initiated and judged by the observer (observer-based) or the listener (listener-based). Thus, listening intervals were temporally uncertain in observer-based procedures and temporally defined in listener-based procedures. Thresholds were higher for observer-based relative to corresponding listener-based procedures. The magnitude of this difference was similar across the yes/no and two-interval tasks, and was larger for masked word detection than tone detection in quiet. Listeners adopted a conservative criterion when tested using the observer-based, yes/no procedure, but modeling results suggest that signal-temporal uncertainty accounts for the largest portion of the threshold difference between observer-based and listener-based procedures.
Collapse
Affiliation(s)
- Lori J Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska 68131, USA
| | - Emily Buss
- Departement of Otolaryngology/Head and Neck Surgery, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| |
Collapse
|
23
|
Linson A, Parr T, Friston KJ. Active inference, stressors, and psychological trauma: A neuroethological model of (mal)adaptive explore-exploit dynamics in ecological context. Behav Brain Res 2020; 380:112421. [PMID: 31830495 PMCID: PMC6961115 DOI: 10.1016/j.bbr.2019.112421] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 12/06/2019] [Accepted: 12/07/2019] [Indexed: 12/28/2022]
Abstract
This paper offers a formal account of emotional inference and stress-related behaviour, using the notion of active inference. We formulate responses to stressful scenarios in terms of Bayesian belief-updating and subsequent policy selection; namely, planning as (active) inference. Using a minimal model of how creatures or subjects account for their sensations (and subsequent action), we deconstruct the sequences of belief updating and behaviour that underwrite stress-related responses - and simulate the aberrant responses of the sort seen in post-traumatic stress disorder (PTSD). Crucially, the model used for belief-updating generates predictions in multiple (exteroceptive, proprioceptive and interoceptive) modalities, to provide an integrated account of evidence accumulation and multimodal integration that has consequences for both motor and autonomic responses. The ensuing phenomenology speaks to many constructs in the ecological and clinical literature on stress, which we unpack with reference to simulated inference processes and accompanying neuronal responses. A key insight afforded by this formal approach rests on the trade-off between the epistemic affordance of certain cues (that resolve uncertainty about states of affairs in the environment) and the consequences of epistemic foraging (that may be in conflict with the instrumental or pragmatic value of 'fleeing' or 'freezing'). Starting from first principles, we show how this trade-off is nuanced by prior (subpersonal) beliefs about the outcomes of behaviour - beliefs that, when held with unduly high precision, can lead to (Bayes optimal) responses that closely resemble PTSD.
Collapse
Affiliation(s)
- Adam Linson
- Faculty of Natural Sciences, University of Stirling, Stirling, UK; Faculty of Arts and Humanities, University of Stirling, Stirling, UK.
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, UK
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, Institute of Neurology, University College London, London, UK
| |
Collapse
|
24
|
Choi JY, Perrachione TK. Time and information in perceptual adaptation to speech. Cognition 2019; 192:103982. [PMID: 31229740 PMCID: PMC6732236 DOI: 10.1016/j.cognition.2019.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 05/11/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States.
| |
Collapse
|
25
|
Linson A, Friston K. Reframing PTSD for computational psychiatry with the active inference framework. Cogn Neuropsychiatry 2019; 24:347-368. [PMID: 31564212 PMCID: PMC6816477 DOI: 10.1080/13546805.2019.1665994] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 09/04/2019] [Indexed: 11/25/2022]
Abstract
Introduction: Recent advances in research on stress and, respectively, on disorders of perception, learning, and behaviour speak to a promising synthesis of current insights from (i) neurobiology, cognitive neuroscience and psychology of stress and post-traumatic stress disorder (PTSD), and (ii) computational psychiatry approaches to pathophysiology (e.g. of schizophrenia and autism). Methods: Specifically, we apply this synthesis to PTSD. The framework of active inference offers an embodied and embedded lens through which to understand neuronal mechanisms, structures, and processes of cognitive function and dysfunction. In turn, this offers an explanatory model of how healthy mental functioning can go awry due to psychopathological conditions that impair inference about our environment and our bodies. In this context, auditory phenomena-known to be especially relevant to studies of PTSD and schizophrenia-and traditional models of auditory function can be viewed from an evolutionary perspective based on active inference. Results: We assess and contextualise a range of evidence on audition, stress, psychosis, and PTSD, and bring some existing partial models of PTSD into multilevel alignment. Conclusions: The novel perspective on PTSD we present aims to serve as a basis for new experimental designs and therapeutic interventions that integrate fundamentally biological, cognitive, behavioural, and environmental factors.
Collapse
Affiliation(s)
- Adam Linson
- Faculty of Natural Sciences & Faculty of Arts and Humanities, University of Stirling, Stirling, UK
| | - Karl Friston
- Wellcome Centre for Human Neuroimaging, UCL, London, UK
| |
Collapse
|
26
|
Multisensory feature integration in (and out) of the focus of spatial attention. Atten Percept Psychophys 2019; 82:363-376. [DOI: 10.3758/s13414-019-01813-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
Zobel BH, Wagner A, Sanders LD, Başkent D. Spatial release from informational masking declines with age: Evidence from a detection task in a virtual separation paradigm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:548. [PMID: 31370625 DOI: 10.1121/1.5118240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 06/28/2019] [Indexed: 06/10/2023]
Abstract
Declines in spatial release from informational masking may contribute to the speech-processing difficulties that older adults often experience within complex listening environments. The present study sought to answer two fundamental questions: (1) Does spatial release from informational masking decline with age and, if so, (2) does age predict this decline independently of age-typical hearing loss? Younger (18-34 years) and older (60-80 years) adults with age-typical hearing completed a yes/no target-detection task with low-pass filtered noise-vocoded speech designed to reduce non-spatial segregation cues and control for hearing loss. Participants detected a target voice among two-talker masking babble while a virtual spatial separation paradigm [Freyman, Helfer, McCall, and Clifton, J. Acoust. Soc. Am. 106(6), 3578-3588 (1999)] was used to isolate informational masking release. The younger and older adults both exhibited spatial release from informational masking, but masking release was reduced among the older adults. Furthermore, age predicted this decline controlling for hearing loss, while there was no indication that hearing loss played a role. These findings provide evidence that declines specific to aging limit spatial release from informational masking under challenging listening conditions.
Collapse
Affiliation(s)
- Benjamin H Zobel
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Anita Wagner
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Lisa D Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
28
|
Feng T, Chen Q, Xiao Z. Age-Related Differences in the Effects of Masker Cuing on Releasing Chinese Speech From Informational Masking. Front Psychol 2018; 9:1922. [PMID: 30356784 PMCID: PMC6189421 DOI: 10.3389/fpsyg.2018.01922] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Accepted: 09/18/2018] [Indexed: 11/22/2022] Open
Abstract
The aims of the present study were to examine whether familiarity with a masker improves word recognition in speech masking situations and whether there are age-related differences in the effects of masker cuing. Thirty-two older listeners (range = 59–74; mean age = 66.41 years) with high-frequency hearing loss and 32 younger normal-hearing listeners (range = 21–28; mean age = 23.73) participated in this study, all of whom spoke Chinese as their first language. Two experiments were conducted and 16 younger and 16 older listeners were used in each experiment. The masking speech with different content from target speech with syntactically correct but semantically meaningless was a continuous recording of meaningless Chinese sentences spoken by two talkers. The masker level was adjusted to produce signal-to-masker ratios of -12, -8, -4, and 0 dB for the younger participants and -8, -4, 0, and 4 dB for the older participants. Under masker-priming conditions, a priming sentence, spoken by the masker talkers, was presented in quiet three times before a target sentence was presented together with a masker sentence 4 s later. In Experiment 1, using same-sentence masker-priming (identical to the masker sentence), the masker-priming improved the identification of the target sentence for both age groups compared to when no priming was provided. However, the amount of masking release was less in the older adults than in the younger adults. In Experiment 2, two kinds of primes were considered: same-sentence masker-priming, and different-sentence masker-priming (different from the masker sentence in content for each keyword). The results of Experiment 2 showed that both kinds of primes improved the identification of the targets for both age groups. However, the release from speech masking in both priming conditions was less in the older adults than in the younger adults, and the release from speech masking in both age groups was greater with same-sentence masker-priming than with different-sentence masker-priming. These results suggest that both the voice and content cues of a masker could be used to release target speech from maskers in noisy listening conditions. Furthermore, there was an age-related decline in masker-priming-induced release from speech masking.
Collapse
Affiliation(s)
- Tianquan Feng
- College of Teacher Education, Nanjing Normal University, Nanjing, China.,State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Qingrong Chen
- School of Psychology, Nanjing Normal University, Nanjing, China
| | - Zhongdang Xiao
- State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| |
Collapse
|
29
|
Francis AL, Tigchelaar LJ, Zhang R, Zekveld AA. Effects of Second Language Proficiency and Linguistic Uncertainty on Recognition of Speech in Native and Nonnative Competing Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:1815-1830. [PMID: 29971338 DOI: 10.1044/2018_jslhr-h-17-0254] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 03/26/2018] [Indexed: 06/08/2023]
Abstract
PURPOSE The purpose of this study was to investigate the effects of 2nd language proficiency and linguistic uncertainty on performance and listening effort in mixed language contexts. METHOD Thirteen native speakers of Dutch with varying degrees of fluency in English listened to and repeated sentences produced in both Dutch and English and presented in the presence of single-talker competing speech in both Dutch and English. Target and masker language combinations were presented in both blocked and mixed (unpredictable) conditions. In the blocked condition, in each block of trials the target-masker language combination remained constant, and the listeners were informed of both prior to beginning the block. In the mixed condition, target and masker language varied randomly from trial to trial. All listeners participated in all conditions. Performance was assessed in terms of speech reception thresholds, whereas listening effort was quantified in terms of pupil dilation. RESULTS Performance (speech reception thresholds) and listening effort (pupil dilation) were both affected by 2nd language proficiency (English test score) and target and masker language: Performance was better in blocked as compared to mixed conditions, with Dutch as compared to English targets, and with English as compared to Dutch maskers. English proficiency was correlated with listening performance. Listeners also exhibited greater peak pupil dilation in mixed as compared to blocked conditions for trials with Dutch maskers, whereas pupil dilation during preparation for speaking was higher for English targets as compared to Dutch ones in almost all conditions. CONCLUSIONS Both listener's proficiency in a 2nd language and uncertainty about the target language on a given trial play a significant role in how bilingual listeners attend to speech in the presence of competing speech in different languages, but precise effects also depend on which language is serving as target and which as masker.
Collapse
Affiliation(s)
- Alexander L Francis
- Department of Speech, Language & Hearing Sciences, Purdue University, West Lafayette, IN
| | | | - Rongrong Zhang
- Department of Statistics, Purdue University, West Lafayette, IN
| | - Adriana A Zekveld
- VU University Medical Center, Amsterdam, the Netherlands
- Linnaeus Centre, Linköping University, Sweden
| |
Collapse
|
30
|
Choi I. Interactive Sonification Exploring Emergent Behavior Applying Models for Biological Information and Listening. Front Neurosci 2018; 12:197. [PMID: 29755311 PMCID: PMC5934483 DOI: 10.3389/fnins.2018.00197] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 03/12/2018] [Indexed: 11/29/2022] Open
Abstract
Sonification is an open-ended design task to construct sound informing a listener of data. Understanding application context is critical for shaping design requirements for data translation into sound. Sonification requires methodology to maintain reproducibility when data sources exhibit non-linear properties of self-organization and emergent behavior. This research formalizes interactive sonification in an extensible model to support reproducibility when data exhibits emergent behavior. In the absence of sonification theory, extensibility demonstrates relevant methods across case studies. The interactive sonification framework foregrounds three factors: reproducible system implementation for generating sonification; interactive mechanisms enhancing a listener's multisensory observations; and reproducible data from models that characterize emergent behavior. Supramodal attention research suggests interactive exploration with auditory feedback can generate context for recognizing irregular patterns and transient dynamics. The sonification framework provides circular causality as a signal pathway for modeling a listener interacting with emergent behavior. The extensible sonification model adopts a data acquisition pathway to formalize functional symmetry across three subsystems: Experimental Data Source, Sound Generation, and Guided Exploration. To differentiate time criticality and dimensionality of emerging dynamics, tuning functions are applied between subsystems to maintain scale and symmetry of concurrent processes and temporal dynamics. Tuning functions accommodate sonification design strategies that yield order parameter values to render emerging patterns discoverable as well as rehearsable, to reproduce desired instances for clinical listeners. Case studies are implemented with two computational models, Chua's circuit and Swarm Chemistry social agent simulation, generating data in real-time that exhibits emergent behavior. Heuristic Listening is introduced as an informal model of a listener's clinical attention to data sonification through multisensory interaction in a context of structured inquiry. Three methods are introduced to assess the proposed sonification framework: Listening Scenario classification, data flow Attunement, and Sonification Design Patterns to classify sound control. Case study implementations are assessed against these methods comparing levels of abstraction between experimental data and sound generation. Outcomes demonstrate the framework performance as a reference model for representing experimental implementations, also for identifying common sonification structures having different experimental implementations, identifying common functions implemented in different subsystems, and comparing impact of affordances across multiple implementations of listening scenarios.
Collapse
Affiliation(s)
- Insook Choi
- Studio for International Media & Technology, MediaCityUK, School of Arts & Media, University of Salford, Manchester, United Kingdom
| |
Collapse
|
31
|
Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms. Atten Percept Psychophys 2018; 80:1520-1538. [PMID: 29696570 DOI: 10.3758/s13414-018-1531-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
Collapse
|
32
|
Looking Behavior and Audiovisual Speech Understanding in Children With Normal Hearing and Children With Mild Bilateral or Unilateral Hearing Loss. Ear Hear 2017; 39:783-794. [PMID: 29252979 DOI: 10.1097/aud.0000000000000534] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Visual information from talkers facilitates speech intelligibility for listeners when audibility is challenged by environmental noise and hearing loss. Less is known about how listeners actively process and attend to visual information from different talkers in complex multi-talker environments. This study tracked looking behavior in children with normal hearing (NH), mild bilateral hearing loss (MBHL), and unilateral hearing loss (UHL) in a complex multi-talker environment to examine the extent to which children look at talkers and whether looking patterns relate to performance on a speech-understanding task. It was hypothesized that performance would decrease as perceptual complexity increased and that children with hearing loss would perform more poorly than their peers with NH. Children with MBHL or UHL were expected to demonstrate greater attention to individual talkers during multi-talker exchanges, indicating that they were more likely to attempt to use visual information from talkers to assist in speech understanding in adverse acoustics. It also was of interest to examine whether MBHL, versus UHL, would differentially affect performance and looking behavior. DESIGN Eighteen children with NH, eight children with MBHL, and 10 children with UHL participated (8-12 years). They followed audiovisual instructions for placing objects on a mat under three conditions: a single talker providing instructions via a video monitor, four possible talkers alternately providing instructions on separate monitors in front of the listener, and the same four talkers providing both target and nontarget information. Multi-talker background noise was presented at a 5 dB signal-to-noise ratio during testing. An eye tracker monitored looking behavior while children performed the experimental task. RESULTS Behavioral task performance was higher for children with NH than for either group of children with hearing loss. There were no differences in performance between children with UHL and children with MBHL. Eye-tracker analysis revealed that children with NH looked more at the screens overall than did children with MBHL or UHL, though individual differences were greater in the groups with hearing loss. Listeners in all groups spent a small proportion of time looking at relevant screens as talkers spoke. Although looking was distributed across all screens, there was a bias toward the right side of the display. There was no relationship between overall looking behavior and performance on the task. CONCLUSIONS The present study examined the processing of audiovisual speech in the context of a naturalistic task. Results demonstrated that children distributed their looking to a variety of sources during the task, but that children with NH were more likely to look at screens than were those with MBHL/UHL. However, all groups looked at the relevant talkers as they were speaking only a small proportion of the time. Despite variability in looking behavior, listeners were able to follow the audiovisual instructions and children with NH demonstrated better performance than children with MBHL/UHL. These results suggest that performance on some challenging multi-talker audiovisual tasks is not dependent on visual fixation to relevant talkers for children with NH or with MBHL/UHL.
Collapse
|
33
|
Leibold LJ. Speech Perception in Complex Acoustic Environments: Developmental Effects. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:3001-3008. [PMID: 29049600 PMCID: PMC5945069 DOI: 10.1044/2017_jslhr-h-17-0070] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 06/19/2017] [Indexed: 05/06/2023]
Abstract
PURPOSE The ability to hear and understand speech in complex acoustic environments follows a prolonged time course of development. The purpose of this article is to provide a general overview of the literature describing age effects in susceptibility to auditory masking in the context of speech recognition, including a summary of findings related to the maturation of processes thought to facilitate segregation of target from competing speech. METHOD Data from published and ongoing studies are discussed, with a focus on synthesizing results from studies that address age-related changes in the ability to perceive speech in the presence of a small number of competing talkers. CONCLUSIONS This review provides a summary of the current state of knowledge that is valuable for researchers and clinicians. It highlights the importance of considering listener factors, such as age and hearing status, as well as stimulus factors, such as masker type, when interpreting masked speech recognition data. PRESENTATION VIDEO http://cred.pubs.asha.org/article.aspx?articleid=2601620.
Collapse
Affiliation(s)
- Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, NE
| |
Collapse
|
34
|
Peripheral hearing loss reduces the ability of children to direct selective attention during multi-talker listening. Hear Res 2017; 350:160-172. [PMID: 28505526 DOI: 10.1016/j.heares.2017.05.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 04/28/2017] [Accepted: 05/08/2017] [Indexed: 11/23/2022]
Abstract
Restoring normal hearing requires knowledge of how peripheral and central auditory processes are affected by hearing loss. Previous research has focussed primarily on peripheral changes following sensorineural hearing loss, whereas consequences for central auditory processing have received less attention. We examined the ability of hearing-impaired children to direct auditory attention to a voice of interest (based on the talker's spatial location or gender) in the presence of a common form of background noise: the voices of competing talkers (i.e. during multi-talker, or "Cocktail Party" listening). We measured brain activity using electro-encephalography (EEG) when children prepared to direct attention to the spatial location or gender of an upcoming target talker who spoke in a mixture of three talkers. Compared to normally-hearing children, hearing-impaired children showed significantly less evidence of preparatory brain activity when required to direct spatial attention. This finding is consistent with the idea that hearing-impaired children have a reduced ability to prepare spatial attention for an upcoming talker. Moreover, preparatory brain activity was not restored when hearing-impaired children listened with their acoustic hearing aids. An implication of these findings is that steps to improve auditory attention alongside acoustic hearing aids may be required to improve the ability of hearing-impaired children to understand speech in the presence of competing talkers.
Collapse
|
35
|
Getzmann S, Wascher E. Visually guided auditory attention in a dynamic “cocktail-party” speech perception task: ERP evidence for age-related differences. Hear Res 2017; 344:98-108. [DOI: 10.1016/j.heares.2016.11.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 10/20/2016] [Accepted: 11/03/2016] [Indexed: 10/20/2022]
|
36
|
Kidd G, Colburn HS. Informational Masking in Speech Recognition. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_4] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
37
|
Shinn-Cunningham B, Best V, Lee AKC. Auditory Object Formation and Selection. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_2] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
|
38
|
Oberfeld D, Klöckner-Nowotny F. Individual differences in selective attention predict speech identification at a cocktail party. eLife 2016; 5:e16747. [PMID: 27580272 PMCID: PMC5441891 DOI: 10.7554/elife.16747] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 08/08/2016] [Indexed: 11/13/2022] Open
Abstract
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Collapse
Affiliation(s)
- Daniel Oberfeld
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| | - Felicitas Klöckner-Nowotny
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| |
Collapse
|
39
|
Kidd G, Mason CR, Swaminathan J, Roverud E, Clayton KK, Best V. Determining the energetic and informational components of speech-on-speech masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:132. [PMID: 27475139 PMCID: PMC5392100 DOI: 10.1121/1.4954748] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated "glimpses" were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Kameron K Clayton
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
40
|
Moradi S, Lidestam B, Rönnberg J. Comparison of Gated Audiovisual Speech Identification in Elderly Hearing Aid Users and Elderly Normal-Hearing Individuals: Effects of Adding Visual Cues to Auditory Speech Stimuli. Trends Hear 2016; 20:20/0/2331216516653355. [PMID: 27317667 PMCID: PMC5562342 DOI: 10.1177/2331216516653355] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The present study compared elderly hearing aid (EHA) users (n = 20) with elderly normal-hearing (ENH) listeners (n = 20) in terms of isolation points (IPs, the shortest time required for correct identification of a speech stimulus) and accuracy of audiovisual gated speech stimuli (consonants, words, and final words in highly and less predictable sentences) presented in silence. In addition, we compared the IPs of audiovisual speech stimuli from the present study with auditory ones extracted from a previous study, to determine the impact of the addition of visual cues. Both participant groups achieved ceiling levels in terms of accuracy in the audiovisual identification of gated speech stimuli; however, the EHA group needed longer IPs for the audiovisual identification of consonants and words. The benefit of adding visual cues to auditory speech stimuli was more evident in the EHA group, as audiovisual presentation significantly shortened the IPs for consonants, words, and final words in less predictable sentences; in the ENH group, audiovisual presentation only shortened the IPs for consonants and words. In conclusion, although the audiovisual benefit was greater for EHA group, this group had inferior performance compared with the ENH group in terms of IPs when supportive semantic context was lacking. Consequently, EHA users needed the initial part of the audiovisual speech signal to be longer than did their counterparts with normal hearing to reach the same level of accuracy in the absence of a semantic context.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Sweden
| | - Björn Lidestam
- Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Jerker Rönnberg
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Sweden
| |
Collapse
|
41
|
Holmes E, Kitterick PT, Summerfield AQ. EEG activity evoked in preparation for multi-talker listening by adults and children. Hear Res 2016; 336:83-100. [PMID: 27178442 DOI: 10.1016/j.heares.2016.04.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Revised: 04/04/2016] [Accepted: 04/28/2016] [Indexed: 12/01/2022]
Abstract
Selective attention is critical for successful speech perception because speech is often encountered in the presence of other sounds, including the voices of competing talkers. Faced with the need to attend selectively, listeners perceive speech more accurately when they know characteristics of upcoming talkers before they begin to speak. However, the neural processes that underlie the preparation of selective attention for voices are not fully understood. The current experiments used electroencephalography (EEG) to investigate the time course of brain activity during preparation for an upcoming talker in young adults aged 18-27 years with normal hearing (Experiments 1 and 2) and in typically-developing children aged 7-13 years (Experiment 3). Participants reported key words spoken by a target talker when an opposite-gender distractor talker spoke simultaneously. The two talkers were presented from different spatial locations (±30° azimuth). Before the talkers began to speak, a visual cue indicated either the location (left/right) or the gender (male/female) of the target talker. Adults evoked preparatory EEG activity that started shortly after (<50 ms) the visual cue was presented and was sustained until the talkers began to speak. The location cue evoked similar preparatory activity in Experiments 1 and 2 with different samples of participants. The gender cue did not evoke preparatory activity when it predicted gender only (Experiment 1) but did evoke preparatory activity when it predicted the identity of a specific talker with greater certainty (Experiment 2). Location cues evoked significant preparatory EEG activity in children but gender cues did not. The results provide converging evidence that listeners evoke consistent preparatory brain activity for selecting a talker by their location (regardless of their gender or identity), but not by their gender alone.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Psychology, University of York, UK.
| | - Padraig T Kitterick
- NIHR Nottingham Hearing Biomedical Research Unit, UK; Division of Clinical Neuroscience, School of Medicine, University of Nottingham, UK
| | - A Quentin Summerfield
- Department of Psychology, University of York, UK; Hull York Medical School, University of York, UK
| |
Collapse
|
42
|
Evans S, McGettigan C, Agnew ZK, Rosen S, Scott SK. Getting the Cocktail Party Started: Masking Effects in Speech Perception. J Cogn Neurosci 2015; 28:483-500. [PMID: 26696297 DOI: 10.1162/jocn_a_00913] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Spoken conversations typically take place in noisy environments, and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous fMRI, while they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioral task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment; activity was found within right lateralized frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise.
Collapse
Affiliation(s)
| | | | - Zarinah K Agnew
- University College London.,University of California, San Francisco
| | | | | |
Collapse
|
43
|
Roaring lions and chirruping lemurs: How the brain encodes sound objects in space. Neuropsychologia 2015; 75:304-13. [DOI: 10.1016/j.neuropsychologia.2015.06.012] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Revised: 06/07/2015] [Accepted: 06/10/2015] [Indexed: 01/29/2023]
|
44
|
Koelewijn T, de Kluiver H, Shinn-Cunningham BG, Zekveld AA, Kramer SE. The pupil response reveals increased listening effort when it is difficult to focus attention. Hear Res 2015; 323:81-90. [PMID: 25732724 PMCID: PMC4632994 DOI: 10.1016/j.heares.2015.02.004] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Revised: 02/05/2015] [Accepted: 02/16/2015] [Indexed: 12/04/2022]
Abstract
Recent studies have shown that prior knowledge about where, when, and who is going to talk improves speech intelligibility. How related attentional processes affect cognitive processing load has not been investigated yet. In the current study, three experiments investigated how the pupil dilation response is affected by prior knowledge of target speech location, target speech onset, and who is going to talk. A total of 56 young adults with normal hearing participated. They had to reproduce a target sentence presented to one ear while ignoring a distracting sentence simultaneously presented to the other ear. The two sentences were independently masked by fluctuating noise. Target location (left or right ear), speech onset, and talker variability were manipulated in separate experiments by keeping these features either fixed during an entire block or randomized over trials. Pupil responses were recorded during listening and performance was scored after recall. The results showed an improvement in performance when the location of the target speech was fixed instead of randomized. Additionally, location uncertainty increased the pupil dilation response, which suggests that prior knowledge of location reduces cognitive load. Interestingly, the observed pupil responses for each condition were consistent with subjective reports of listening effort. We conclude that communicating in a dynamic environment like a cocktail party (where participants in competing conversations move unpredictably) requires substantial listening effort because of the demands placed on attentional processes.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands.
| | - Hilde de Kluiver
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| | - Barbara G Shinn-Cunningham
- Department of Biomedical Engineering, Center for Computational Neuroscience and Neural Technology, Boston University, Boston, USA
| | - Adriana A Zekveld
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands; Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Sophia E Kramer
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
45
|
Kim J, Davis C. How visual timing and form information affect speech and non-speech processing. BRAIN AND LANGUAGE 2014; 137:86-90. [PMID: 25190328 DOI: 10.1016/j.bandl.2014.07.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 07/13/2014] [Accepted: 07/17/2014] [Indexed: 06/03/2023]
Abstract
Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming.
Collapse
Affiliation(s)
- Jeesun Kim
- The MARCS Institute, University of Western Sydney, Australia.
| | - Chris Davis
- The MARCS Institute, University of Western Sydney, Australia.
| |
Collapse
|
46
|
Maddox RK, Pospisil DA, Stecker GC, Lee AKC. Directing eye gaze enhances auditory spatial cue discrimination. Curr Biol 2014; 24:748-52. [PMID: 24631242 DOI: 10.1016/j.cub.2014.02.021] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Revised: 12/12/2013] [Accepted: 02/11/2014] [Indexed: 11/29/2022]
Abstract
The present study demonstrates, for the first time, a specific enhancement of auditory spatial cue discrimination due to eye gaze. Whereas the region of sharpest visual acuity, called the fovea, can be directed at will by moving one's eyes, auditory spatial information is derived primarily from head-related acoustic cues. Past auditory studies have found better discrimination in front of the head [1-3] but have not manipulated subjects' gaze, thus overlooking potential oculomotor influences. Electrophysiological studies have shown that the inferior colliculus, a critical auditory midbrain nucleus, shows visual and oculomotor responses [4-6] and modulations of auditory activity [7-9], and that auditory neurons in the superior colliculus show shifting receptive fields [10-13]. How the auditory system leverages this crossmodal information at the behavioral level remains unknown. Here we directed subjects' gaze (with an eccentric dot) or auditory attention (with lateralized noise) while they performed an auditory spatial cue discrimination task. We found that directing gaze toward a sound significantly enhances discrimination of both interaural level and time differences, whereas directing auditory spatial attention does not. These results show that oculomotor information variably enhances auditory spatial resolution even when the head remains stationary, revealing a distinct behavioral benefit possibly arising from auditory-oculomotor interactions at an earlier level of processing than previously demonstrated.
Collapse
Affiliation(s)
- Ross K Maddox
- Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Portage Bay Building, Box 357988, Seattle, WA 98195, USA
| | - Dean A Pospisil
- Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Portage Bay Building, Box 357988, Seattle, WA 98195, USA
| | - G Christopher Stecker
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42(nd) Street, Eagleson Hall, Box 354875, Seattle, WA 98105, USA; Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21(st) Avenue South, Room 8310, Nashville, TN 37232, USA
| | - Adrian K C Lee
- Institute for Learning and Brain Sciences, University of Washington, 1715 NE Columbia Road, Portage Bay Building, Box 357988, Seattle, WA 98195, USA; Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42(nd) Street, Eagleson Hall, Box 354875, Seattle, WA 98105, USA.
| |
Collapse
|
47
|
Kidd G, Mason CR, Best V. The role of syntax in maintaining the integrity of streams of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:766-77. [PMID: 25234885 PMCID: PMC3986016 DOI: 10.1121/1.4861354] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Revised: 12/13/2013] [Accepted: 12/23/2013] [Indexed: 05/21/2023]
Abstract
This study examined the ability of listeners to utilize syntactic structure to extract a target stream of speech from among competing sounds. Target talkers were identified by voice or location, which was held constant throughout a test utterance, and paired with correct or incorrect (random word order) target sentence syntax. Both voice and location provided reliable cues for identifying target speech even when other features varied unpredictably. The target sentences were masked either by predominantly energetic maskers (noise bursts) or by predominantly informational maskers (similar speech in random word order). When the maskers were noise bursts, target sentence syntax had relatively minor effects on identification performance. However, when the maskers were other talkers, correct target sentence syntax resulted in significantly better speech identification performance than incorrect syntax. Furthermore, conformance to correct syntax alone was sufficient to accurately identify the target speech. The results were interpreted as supporting the idea that the predictability of the elements comprising streams of speech, as manifested by syntactic structure, is an important factor in binding words together into coherent streams. Furthermore, these findings suggest that predictability is particularly important for maintaining the coherence of an auditory stream over time under conditions high in informational masking.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215
| | - Virginia Best
- National Acoustic Laboratories, Macquarie University, New South Wales 2109, Australia
| |
Collapse
|
48
|
Lewis DE, Wannagot S. Effects of Looking Behavior on Listening and Understanding in a Simulated Classroom. JOURNAL OF EDUCATIONAL AUDIOLOGY : OFFICIAL JOURNAL OF THE EDUCATIONAL AUDIOLOGY ASSOCIATION 2014; 20:24-33. [PMID: 26478719 PMCID: PMC4607086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Audiovisual cues can improve speech perception in adverse acoustical environments when compared to auditory cues alone. In classrooms, where acoustics often are less than ideal, the availability of visual cues has the potential to benefit children during learning activities. The current study evaluated the effects of looking behavior on speech understanding of children (8-11 years) and adults during comprehension and sentence repetition tasks in a simulated classroom environment. For the comprehension task, results revealed an effect of looking behavior (looking required versus looking not required) for older children and adults only. Within the looking-behavior conditions, age effects also were evident. There was no effect of looking behavior for the sentence-repetition task (looking versus no looking) but an age effect also was found. The current findings suggest that looking behavior may impact speech understanding differently depending on the task and the age of the listener. In classrooms, these potential differences should be taken into account when designing learning tasks.
Collapse
Affiliation(s)
| | - Shannon Wannagot
- Boys Town National Research Hospital, Omaha, NE; University of Connecticut, Storrs, CT
| |
Collapse
|
49
|
Bonino AY, Leibold LJ, Buss E. Effect of signal-temporal uncertainty in children and adults: tone detection in noise or a random-frequency masker. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:4446. [PMID: 25669256 PMCID: PMC3874056 DOI: 10.1121/1.4828828] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 10/12/2013] [Accepted: 10/18/2013] [Indexed: 06/04/2023]
Abstract
A cue indicating when in time to listen can improve adults' tone detection thresholds, particularly for conditions that produce substantial informational masking. The purpose of this study was to determine if 5- to 13-yr-old children likewise benefit from a light cue indicating when in time to listen for a masked pure-tone signal. Each listener was tested in one of two continuous maskers: Broadband noise (low informational masking) or a random-frequency, two-tone masker (high informational masking). Using a single-interval method of constant stimuli, detection thresholds were measured for two temporal conditions: (1) Temporally-defined, with the listening interval defined by a light cue, and (2) temporally-uncertain, with no light cue. Thresholds estimated from psychometric functions fitted to the data indicated that children and adults benefited to the same degree from the visual cue. Across listeners, the average benefit of a defined listening interval was 1.8 dB in the broadband noise and 8.6 dB in the random-frequency, two-tone masker. Thus, the benefit of knowing when in time to listen was more robust for conditions believed to be dominated by informational masking. An unexpected finding of this study was that children's thresholds were comparable to adults' in the random-frequency, two-tone masker.
Collapse
Affiliation(s)
- Angela Yarnell Bonino
- Department of Allied Health Sciences, The University of North Carolina at Chapel Hill, CB 7190, Chapel Hill, North Carolina 27599
| | - Lori J Leibold
- Department of Allied Health Sciences, The University of North Carolina at Chapel Hill, CB 7190, Chapel Hill, North Carolina 27599
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, The University of North Carolina at Chapel Hill, CB 7070, Chapel Hill, North Carolina 27599
| |
Collapse
|
50
|
Moradi S, Lidestam B, Rönnberg J. Gated audiovisual speech identification in silence vs. noise: effects on time and accuracy. Front Psychol 2013; 4:359. [PMID: 23801980 PMCID: PMC3685792 DOI: 10.3389/fpsyg.2013.00359] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2013] [Accepted: 05/31/2013] [Indexed: 11/15/2022] Open
Abstract
This study investigated the degree to which audiovisual presentation (compared to auditory-only presentation) affected isolation point (IPs, the amount of time required for the correct identification of speech stimuli using a gating paradigm) in silence and noise conditions. The study expanded on the findings of Moradi et al. (under revision), using the same stimuli, but presented in an audiovisual instead of an auditory-only manner. The results showed that noise impeded the identification of consonants and words (i.e., delayed IPs and lowered accuracy), but not the identification of final words in sentences. In comparison with the previous study by Moradi et al., it can be concluded that the provision of visual cues expedited IPs and increased the accuracy of speech stimuli identification in both silence and noise. The implication of the results is discussed in terms of models for speech understanding.
Collapse
Affiliation(s)
- Shahram Moradi
- Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University Linköping, Sweden
| | | | | |
Collapse
|