1
|
Shen D, Ross B, Alain C. Temporal deployment of attention in musicians: Evidence from an attentional blink paradigm. Ann N Y Acad Sci 2023; 1530:110-123. [PMID: 37823710 DOI: 10.1111/nyas.15069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
The generalization of music training to unrelated nonmusical domains is well established and may reflect musicians' superior ability to regulate attention. We investigated the temporal deployment of attention in musicians and nonmusicians using scalp-recording of event-related potentials in an attentional blink (AB) paradigm. Participants listened to rapid sequences of stimuli and identified target and probe sounds. The AB was defined as a probe identification deficit when the probe closely follows the target. The sequence of stimuli was preceded by a neutral or informative cue about the probe position within the sequence. Musicians outperformed nonmusicians in identifying the target and probe. In both groups, cueing improved target and probe identification and reduced the AB. The informative cue elicited a sustained potential, which was more prominent in musicians than nonmusicians over left temporal areas and yielded a larger N1 amplitude elicited by the target. The N1 was larger in musicians than nonmusicians, and its amplitude over the left frontocentral cortex of musicians correlated with accuracy. Together, these results reveal musicians' superior ability to regulate attention, allowing them to prepare for incoming stimuli, thereby improving sound object identification. This capacity to manage attentional resources to optimize task performance may generalize to nonmusical activities.
Collapse
Affiliation(s)
- Dawei Shen
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
| | - Bernhard Ross
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
- Music and Health Science Research Collaboratory, University of Toronto, Toronto, Ontario, Canada
| | - Claude Alain
- Rotman Research Institute, Baycrest Centre for Geriatric Care, Toronto, Ontario, Canada
- Institute of Medical Sciences, University of Toronto, Toronto, Ontario, Canada
- Music and Health Science Research Collaboratory, University of Toronto, Toronto, Ontario, Canada
- Department of Psychology, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Cochlear Implant Facilitates the Use of Talker Sex and Spatial Cues to Segregate Competing Speech in Unilaterally Deaf Listeners. Ear Hear 2023; 44:77-91. [PMID: 35733275 DOI: 10.1097/aud.0000000000001254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
OBJECTIVES Talker sex and spatial cues can facilitate segregation of competing speech. However, the spectrotemporal degradation associated with cochlear implants (CIs) can limit the benefit of talker sex and spatial cues. Acoustic hearing in the nonimplanted ear can improve access to talker sex cues in CI users. However, it's unclear whether the CI can improve segregation of competing speech when maskers are symmetrically placed around the target (i.e., when spatial cues are available), compared with acoustic hearing alone. The aim of this study was to investigate whether a CI can improve segregation of competing speech by individuals with unilateral hearing loss. DESIGN Speech recognition thresholds (SRTs) for competing speech were measured in 16 normal-hearing (NH) adults and 16 unilaterally deaf CI users. All participants were native speakers of Mandarin Chinese. CI users were divided into two groups according to thresholds in the nonimplanted ear: (1) single-sided deaf (SSD); pure-tone thresholds <25 dB HL at all audiometric frequencies, and (2) Asymmetric hearing loss (AHL; one or more thresholds > 25 dB HL). SRTs were measured for target sentences produced by a male talker in the presence of two masker talkers (different male or female talkers). The target sentence was always presented via loudspeaker directly in front of the listener (0°), and the maskers were either colocated with the target (0°) or spatially separated from the target at ±90°. Three segregation cue conditions were tested to measure masking release (MR) relative to the baseline condition: (1) Talker sex, (2) Spatial, and (3) Talker sex + Spatial. For CI users, SRTs were measured with the CI on or off. RESULTS Binaural MR was significantly better for the NH group than for the AHL or SSD groups ( P < 0.001 in all cases). For the NH group, mean MR was largest with the Talker sex + spatial cues (18.8 dB) and smallest for the Talker sex cues (10.7 dB). In contrast, mean MR for the SSD group was largest with the Talker sex + spatial cues (14.7 dB), and smallest with the Spatial cues (4.8 dB). For the AHL group, mean MR was largest with the Talker sex + spatial cues (7.8 dB) and smallest with the Talker sex (4.8 dB) and the Spatial cues (4.8 dB). MR was significantly better with the CI on than off for both the AHL ( P = 0.014) and SSD groups ( P < 0.001). Across all unilaterally deaf CI users, monaural (acoustic ear alone) and binaural MR were significantly correlated with unaided pure-tone average thresholds in the nonimplanted ear for the Talker sex and Talker sex + spatial conditions ( P < 0.001 in both cases) but not for the Spatial condition. CONCLUSION Although the CI benefitted unilaterally deaf listeners' segregation of competing speech, MR was much poorer than that observed in NH listeners. Different from previous findings with steady noise maskers, the CI benefit for segregation of competing speech from a different talker sex was greater in the SSD group than in the AHL group.
Collapse
|
3
|
Pinto D, Kaufman M, Brown A, Zion Golumbic E. An ecological investigation of the capacity to follow simultaneous speech and preferential detection of ones’ own name. Cereb Cortex 2022; 33:5361-5374. [PMID: 36331339 DOI: 10.1093/cercor/bhac424] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/11/2022] [Accepted: 09/12/2022] [Indexed: 11/06/2022] Open
Abstract
Abstract
Many situations require focusing attention on one speaker, while monitoring the environment for potentially important information. Some have proposed that dividing attention among 2 speakers involves behavioral trade-offs, due to limited cognitive resources. However the severity of these trade-offs, particularly under ecologically-valid circumstances, is not well understood. We investigated the capacity to process simultaneous speech using a dual-task paradigm simulating task-demands and stimuli encountered in real-life. Participants listened to conversational narratives (Narrative Stream) and monitored a stream of announcements (Barista Stream), to detect when their order was called. We measured participants’ performance, neural activity, and skin conductance as they engaged in this dual-task. Participants achieved extremely high dual-task accuracy, with no apparent behavioral trade-offs. Moreover, robust neural and physiological responses were observed for target-stimuli in the Barista Stream, alongside significant neural speech-tracking of the Narrative Stream. These results suggest that humans have substantial capacity to process simultaneous speech and do not suffer from insufficient processing resources, at least for this highly ecological task-combination and level of perceptual load. Results also confirmed the ecological validity of the advantage for detecting ones’ own name at the behavioral, neural, and physiological level, highlighting the contribution of personal relevance when processing simultaneous speech.
Collapse
Affiliation(s)
- Danna Pinto
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Maya Kaufman
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Adi Brown
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Center for Brain Research, Bar Ilan University, Ramat Gan, 5290002, Israel
| |
Collapse
|
4
|
Zobel BH, Freyman RL, Sanders LD. Spatial release from informational masking enhances the early cortical representation of speech sounds. AUDITORY PERCEPTION & COGNITION 2022; 5:211-237. [PMID: 36160272 PMCID: PMC9494573 DOI: 10.1080/25742442.2022.2088329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/04/2022] [Indexed: 06/16/2023]
Abstract
INTRODUCTION Spatial separation between competing speech streams reduces their confusion (informational masking), improving speech processing under challenging listening conditions. The precise stages of auditory processing involved in this benefit are not fully understood. This study used event-related potentials to examine the processing of target speech under conditions of informational masking and its spatial release. METHODS Participants detected noise-vocoded target speech presented with two-talker noise-vocoded masking speech. In separate conditions, the same set of targets were spatially co-located with maskers to produce informational masking and spatially separated from maskers using a perceptual manipulation to release the informational masking. RESULTS An increase in N1 and P2 amplitude, consistent with cortical auditory evoked potentials, and a later sustained positivity (P300) were observed in response to target onsets only under conditions supporting release from informational masking. At target intensities above masking threshold in both spatial conditions, N1 and P2 latencies were shorter when targets and maskers were perceptually separated. DISCUSSION These results indicate that spatial release from informational masking benefits speech representation beginning in the early stages of auditory perception. Additionally, these results suggest that the auditory evoked potential itself may be heavily dependent upon how information is perceptually organized rather than physically organized.
Collapse
Affiliation(s)
- Benjamin H. Zobel
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, Amherst, Massachusetts 01003
| | - Richard L. Freyman
- Department of Communication Disorders, University of Massachusetts Amherst, Amherst, Massachusetts 01003
| | - Lisa D. Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts Amherst, Amherst, Massachusetts 01003
| |
Collapse
|
5
|
Uhrig S, Perkis A, Möller S, Svensson UP, Behne DM. Effects of Spatial Speech Presentation on Listener Response Strategy for Talker-Identification. Front Neurosci 2022; 15:730744. [PMID: 35153653 PMCID: PMC8831717 DOI: 10.3389/fnins.2021.730744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/13/2021] [Indexed: 11/28/2022] Open
Abstract
This study investigates effects of spatial auditory cues on human listeners' response strategy for identifying two alternately active talkers (“turn-taking” listening scenario). Previous research has demonstrated subjective benefits of audio spatialization with regard to speech intelligibility and talker-identification effort. So far, the deliberate activation of specific perceptual and cognitive processes by listeners to optimize their task performance remained largely unexamined. Spoken sentences selected as stimuli were either clean or degraded due to background noise or bandpass filtering. Stimuli were presented via three horizontally positioned loudspeakers: In a non-spatial mode, both talkers were presented through a central loudspeaker; in a spatial mode, each talker was presented through the central or a talker-specific lateral loudspeaker. Participants identified talkers via speeded keypresses and afterwards provided subjective ratings (speech quality, speech intelligibility, voice similarity, talker-identification effort). In the spatial mode, presentations at lateral loudspeaker locations entailed quicker behavioral responses, which were significantly slower in comparison to a talker-localization task. Under clean speech, response times globally increased in the spatial vs. non-spatial mode (across all locations); these “response time switch costs,” presumably being caused by repeated switching of spatial auditory attention between different locations, diminished under degraded speech. No significant effects of spatialization on subjective ratings were found. The results suggested that when listeners could utilize task-relevant auditory cues about talker location, they continued to rely on voice recognition instead of localization of talker sound sources as primary response strategy. Besides, the presence of speech degradations may have led to increased cognitive control, which in turn compensated for incurring response time switch costs.
Collapse
Affiliation(s)
- Stefan Uhrig
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- *Correspondence: Stefan Uhrig
| | - Andrew Perkis
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Sebastian Möller
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- Speech and Language Technology, German Research Center for Artificial Intelligence, Berlin, Germany
| | - U. Peter Svensson
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M. Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
6
|
Wächtler M, Kessler J, Walger M, Meister H. Revealing Perceptional and Cognitive Mechanisms in Static and Dynamic Cocktail Party Listening by Means of Error Analyses. Trends Hear 2022; 26:23312165221111676. [PMID: 35849353 PMCID: PMC9297473 DOI: 10.1177/23312165221111676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 06/15/2022] [Accepted: 06/18/2022] [Indexed: 11/16/2022] Open
Abstract
In cocktail party situations multiple talkers speak simultaneously, which causes listening to be perceptually and cognitively challenging. Such situations can either be static (fixed target talker) or dynamic, meaning the target talker switches occasionally and in a potentially unpredictable way. To shed light on the perceptional and cognitive mechanisms in static and dynamic cocktail party situations, we conducted an analysis of error types that occur during a multi-talker speech recognition test. The error analysis distinguished between misunderstood or omitted words (random errors) and target-masker confusions. To investigate the effects of aging and hearing impairment, we compared data from three listener groups, comprised of young as well as older adults with and without hearing loss. In the static condition, error rates were generally very low, except for the older hearing-impaired listeners. Consistent with the assumption of decreased audibility, they showed a notable amount of random errors. In the dynamic condition, errors increased compared to the static condition, especially immediately following a target talker switch. Those increases were similar for random and confusion errors. The older hearing-impaired listeners showed greater difficulties than the younger adults in trials not preceded by a switch. These results suggest that the load associated with dynamic cocktail party listening affects the ability to focus attention on the talker of interest and the retrieval of words from short-term memory, as indicated by the increased amount of confusion and random errors. This was most pronounced in the older hearing-impaired listeners proposing an interplay of perceptual and cognitive mechanisms.
Collapse
Affiliation(s)
- Moritz Wächtler
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne, Cologne, Germany
| | - Josef Kessler
- Department of Neurology, University Hospital of Cologne, Cologne, Germany
| | - Martin Walger
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne, Cologne, Germany
- Clinic of Otorhinolaryngology, Head and Neck Surgery, University of Cologne, Cologne, Germany
| | - Hartmut Meister
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne, Cologne, Germany
| |
Collapse
|
7
|
Holmes E, Parr T, Griffiths TD, Friston KJ. Active inference, selective attention, and the cocktail party problem. Neurosci Biobehav Rev 2021; 131:1288-1304. [PMID: 34687699 DOI: 10.1016/j.neubiorev.2021.09.038] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 08/27/2021] [Accepted: 09/17/2021] [Indexed: 11/25/2022]
Abstract
In this paper, we introduce a new generative model for an active inference account of preparatory and selective attention, in the context of a classic 'cocktail party' paradigm. In this setup, pairs of words are presented simultaneously to the left and right ears and an instructive spatial cue directs attention to the left or right. We use this generative model to test competing hypotheses about the way that human listeners direct preparatory and selective attention. We show that assigning low precision to words at attended-relative to unattended-locations can explain why a listener reports words from a competing sentence. Under this model, temporal changes in sensory precision were not needed to account for faster reaction times with longer cue-target intervals, but were necessary to explain ramping effects on event-related potentials (ERPs)-resembling the contingent negative variation (CNV)-during the preparatory interval. These simulations reveal that different processes are likely to underlie the improvement in reaction times and the ramping of ERPs that are associated with spatial cueing.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Speech Hearing and Phonetic Sciences, UCL, London, WC1N 1PF, UK; Wellcome Centre for Human Neuroimaging, UCL, London, WC1N 3AR, UK.
| | - Thomas Parr
- Wellcome Centre for Human Neuroimaging, UCL, London, WC1N 3AR, UK
| | - Timothy D Griffiths
- Wellcome Centre for Human Neuroimaging, UCL, London, WC1N 3AR, UK; Biosciences Institute, Newcastle University, Newcastle upon Tyne, NE2 4HH, UK
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, UCL, London, WC1N 3AR, UK
| |
Collapse
|
8
|
Fintor E, Aspöck L, Fels J, Schlittmeier SJ. The role of spatial separation of two talkers' auditory stimuli in the listener's memory of running speech: listening effort in a non-noisy conversational setting. Int J Audiol 2021; 61:371-379. [PMID: 34126838 DOI: 10.1080/14992027.2021.1922765] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE To investigate the role of the spatial position of conversing talkers, that is, spatially separated or co-located, in the listener's short-term memory of running speech and listening effort. DESIGN In two experiments (between-subject), participants underwent a dual-task paradigm, including a listening (primary) task wherein male and female talkers spoke coherent texts. Talkers were either spatially separated or co-located (within-subject). As a secondary task, visually presented tasks were used. Experiment I involved a number-judgement task, and Experiment II entailed switching between number and letter-judgement task. STUDY SAMPLE Twenty-four young adults who reported normal hearing and normal or corrected to normal vision participated in each experiment. They were all students from the RWTH Aachen University. RESULTS In both experiments, similar short-term memory performance of running speech was found independently of talkers being spatially separated or co-located. Performance in the secondary tasks, however, differed between these two talkers' auditory stimuli conditions, indicating that spatially separated talkers imposed reduced listening effort compared to their co-location. CONCLUSION The findings indicated that auditory-perceptive information, such as the spatial position of talkers, plays a role in higher-level auditory cognition, that is, short-term memory of running speech, even when listening in quiet.
Collapse
Affiliation(s)
- Edina Fintor
- Teaching and Research Area Work and Engineering Psychology, RWTH Aachen University, Aachen, Germany
| | - Lukas Aspöck
- Institute for Hearing Technology and Acoustics, RWTH Aachen University, Aachen, Germany
| | - Janina Fels
- Institute for Hearing Technology and Acoustics, RWTH Aachen University, Aachen, Germany
| | - Sabine J Schlittmeier
- Teaching and Research Area Work and Engineering Psychology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
9
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
10
|
Geronazzo M, Vieira LS, Nilsson NC, Udesen J, Serafin S. Superhuman Hearing - Virtual Prototyping of Artificial Hearing: a Case Study on Interactions and Acoustic Beamforming. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:1912-1922. [PMID: 32070968 DOI: 10.1109/tvcg.2020.2973059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Directivity and gain in microphone array systems for hearing aids or hearable devices allow users to acoustically enhance the information of a source of interest. This source is usually positioned directly in front. This feature is called acoustic beamforming. The current study aimed to improve users' interactions with beamforming via a virtual prototyping approach in immersive virtual environments (VEs). Eighteen participants took part in experimental sessions composed of a calibration procedure and a selective auditory attention voice-pairing task. Eight concurrent speakers were placed in an anechoic environment in two virtual reality (VR) scenarios. The scenarios were a purely virtual scenario and a realistic 360° audio-visual recording. Participants were asked to find an individual optimal parameterization for three different virtual beamformers: (i) head-guided, (ii) eye gaze-guided, and (iii) a novel interaction technique called dual beamformer, where head-guided is combined with an additional hand-guided beamformer. None of the participants were able to complete the task without a virtual beamformer (i.e., in normal hearing condition) due to the high complexity introduced by the experimental design. However, participants were able to correctly pair all speakers using all three proposed interaction metaphors. Providing superhuman hearing abilities in the form of a dual acoustic beamformer guided by head and hand movements resulted in statistically significant improvements in terms of pairing time, suggesting the task-relevance of interacting with multiple points of interests.
Collapse
|
11
|
Jaha N, Shen S, Kerlin JR, Shahin AJ. Visual Enhancement of Relevant Speech in a 'Cocktail Party'. Multisens Res 2020; 33:277-294. [PMID: 32508080 DOI: 10.1163/22134808-20191423] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Lip-reading improves intelligibility in noisy acoustical environments. We hypothesized that watching mouth movements benefits speech comprehension in a 'cocktail party' by strengthening the encoding of the neural representations of the visually paired speech stream. In an audiovisual (AV) task, EEG was recorded as participants watched and listened to videos of a speaker uttering a sentence while also hearing a concurrent sentence by a speaker of the opposite gender. A key manipulation was that each audio sentence had a 200-ms segment replaced by white noise. To assess comprehension, subjects were tasked with transcribing the AV-attended sentence on randomly selected trials. In the auditory-only trials, subjects listened to the same sentences and completed the same task while watching a static picture of a speaker of either gender. Subjects directed their listening to the voice of the gender of the speaker in the video. We found that the N1 auditory-evoked potential (AEP) time-locked to white noise onsets was significantly more inhibited for the AV-attended sentences than for those of the auditorily-attended (A-attended) and AV-unattended sentences. N1 inhibition to noise onsets has been shown to index restoration of phonemic representations of degraded speech. These results underscore that attention and congruency in the AV setting help streamline the complex auditory scene, partly by reinforcing the neural representations of the visually attended stream, heightening the perception of continuity and comprehension.
Collapse
Affiliation(s)
- Niti Jaha
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Stanley Shen
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Jess R Kerlin
- 1Center for Mind and Brain, University of California, Davis, 95618, USA
| | - Antoine J Shahin
- 1Center for Mind and Brain, University of California, Davis, 95618, USA.,2Department of Cognitive and Information Sciences, University of California, Merced, CA 95343, USA
| |
Collapse
|
12
|
Bologna WJ, Ahlstrom JB, Dubno JR. Contributions of Voice Expectations to Talker Selection in Younger and Older Adults With Normal Hearing. Trends Hear 2020; 24:2331216520915110. [PMID: 32372720 PMCID: PMC7225833 DOI: 10.1177/2331216520915110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 03/02/2020] [Accepted: 03/03/2020] [Indexed: 11/17/2022] Open
Abstract
Focused attention on expected voice features, such as fundamental frequency (F0) and spectral envelope, may facilitate segregation and selection of a target talker in competing talker backgrounds. Age-related declines in attention may limit these abilities in older adults, resulting in poorer speech understanding in complex environments. To test this hypothesis, younger and older adults with normal hearing listened to sentences with a single competing talker. For most trials, listener attention was directed to the target by a cue phrase that matched the target talker's F0 and spectral envelope. For a small percentage of randomly occurring probe trials, the target's voice unexpectedly differed from the cue phrase in terms of F0 and spectral envelope. Overall, keyword recognition for the target talker was poorer for older adults than younger adults. Keyword recognition was poorer on probe trials than standard trials for both groups, and incorrect responses on probe trials contained keywords from the single-talker masker. No interaction was observed between age-group and the decline in keyword recognition on probe trials. Thus, reduced performance by older adults overall could not be attributed to declines in attention to an expected voice. Rather, other cognitive abilities, such as speed of processing and linguistic closure, were predictive of keyword recognition for younger and older adults. Moreover, the effects of age interacted with the sex of the target talker, such that older adults had greater difficulty understanding target keywords from female talkers than male talkers.
Collapse
Affiliation(s)
- William J. Bologna
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Jayne B. Ahlstrom
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Judy R. Dubno
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| |
Collapse
|
13
|
Zobel BH, Wagner A, Sanders LD, Başkent D. Spatial release from informational masking declines with age: Evidence from a detection task in a virtual separation paradigm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:548. [PMID: 31370625 DOI: 10.1121/1.5118240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 06/28/2019] [Indexed: 06/10/2023]
Abstract
Declines in spatial release from informational masking may contribute to the speech-processing difficulties that older adults often experience within complex listening environments. The present study sought to answer two fundamental questions: (1) Does spatial release from informational masking decline with age and, if so, (2) does age predict this decline independently of age-typical hearing loss? Younger (18-34 years) and older (60-80 years) adults with age-typical hearing completed a yes/no target-detection task with low-pass filtered noise-vocoded speech designed to reduce non-spatial segregation cues and control for hearing loss. Participants detected a target voice among two-talker masking babble while a virtual spatial separation paradigm [Freyman, Helfer, McCall, and Clifton, J. Acoust. Soc. Am. 106(6), 3578-3588 (1999)] was used to isolate informational masking release. The younger and older adults both exhibited spatial release from informational masking, but masking release was reduced among the older adults. Furthermore, age predicted this decline controlling for hearing loss, while there was no indication that hearing loss played a role. These findings provide evidence that declines specific to aging limit spatial release from informational masking under challenging listening conditions.
Collapse
Affiliation(s)
- Benjamin H Zobel
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Anita Wagner
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Lisa D Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
14
|
Bramsløw L, Naithani G, Hafez A, Barker T, Pontoppidan NH, Virtanen T. Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:172. [PMID: 30075667 DOI: 10.1121/1.5045322] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Hearing aid users are challenged in listening situations with noise and especially speech-on-speech situations with two or more competing voices. Specifically, the task of attending to and segregating two competing voices is particularly hard, unlike for normal-hearing listeners, as shown in a small sub-experiment. In the main experiment, the competing voices benefit of a deep neural network (DNN) based stream segregation enhancement algorithm was tested on hearing-impaired listeners. A mixture of two voices was separated using a DNN and presented to the two ears as individual streams and tested for word score. Compared to the unseparated mixture, there was a 13%-point benefit from the separation, while attending to both voices. If only one output was selected as in a traditional target-masker scenario, a larger benefit of 37%-points was found. The results agreed well with objective metrics and show that for hearing-impaired listeners, DNNs have a large potential for improving stream segregation and speech intelligibility in difficult scenarios with two equally important targets without any prior selection of a primary target stream. An even higher benefit can be obtained if the user can select the preferred target via remote control.
Collapse
Affiliation(s)
- Lars Bramsløw
- Eriksholm Research Centre, Oticon A/S, Rørtangvej 20, DK-3070 Snekkersten, Denmark
| | - Gaurav Naithani
- Tampere University of Technology, Laboratory of Signal Processing, Tampere, P.O. Box 553, FI-33101 Tampere, Finland
| | - Atefeh Hafez
- Eriksholm Research Centre, Oticon A/S, Rørtangvej 20, DK-3070 Snekkersten, Denmark
| | - Tom Barker
- Tampere University of Technology, Laboratory of Signal Processing, Tampere, P.O. Box 553, FI-33101 Tampere, Finland
| | | | - Tuomas Virtanen
- Tampere University of Technology, Laboratory of Signal Processing, Tampere, P.O. Box 553, FI-33101 Tampere, Finland
| |
Collapse
|
15
|
Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms. Atten Percept Psychophys 2018; 80:1520-1538. [PMID: 29696570 DOI: 10.3758/s13414-018-1531-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
Collapse
|
16
|
Zhang M, Mary Ying YL, Ihlefeld A. Spatial Release From Informational Masking: Evidence From Functional Near Infrared Spectroscopy. Trends Hear 2018; 22:2331216518817464. [PMID: 30558491 PMCID: PMC6299332 DOI: 10.1177/2331216518817464] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Revised: 10/31/2018] [Accepted: 11/13/2018] [Indexed: 11/30/2022] Open
Abstract
Informational masking (IM) can greatly reduce speech intelligibility, but the neural mechanisms underlying IM are not understood. Binaural differences between target and masker can improve speech perception. In general, improvement in masked speech intelligibility due to provision of spatial cues is called spatial release from masking. Here, we focused on an aspect of spatial release from masking, specifically, the role of spatial attention. We hypothesized that in a situation with IM background sound (a) attention to speech recruits lateral frontal cortex (LFCx) and (b) LFCx activity varies with direction of spatial attention. Using functional near infrared spectroscopy, we assessed LFCx activity bilaterally in normal-hearing listeners. In Experiment 1, two talkers were simultaneously presented. Listeners either attended to the target talker (speech task) or they listened passively to an unintelligible, scrambled version of the acoustic mixture (control task). Target and masker differed in pitch and interaural time difference (ITD). Relative to the passive control, LFCx activity increased during attentive listening. Experiment 2 measured how LFCx activity varied with ITD, by testing listeners on the speech task in Experiment 1, except that talkers either were spatially separated by ITD or colocated. Results show that directing of auditory attention activates LFCx bilaterally. Moreover, right LFCx is recruited more strongly in the spatially separated as compared with colocated configurations. Findings hint that LFCx function contributes to spatial release from masking in situations with IM.
Collapse
Affiliation(s)
- Min Zhang
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
- Graduate School of Biomedical Sciences, Rutgers University, Newark, NJ, USA
| | - Yu-Lan Mary Ying
- Department of Otolaryngology-Head and Neck Surgery, Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Antje Ihlefeld
- Department of Biomedical Engineering, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
17
|
Gaston J, Dickerson K, Hipp D, Gerhardstein P. Change deafness for real spatialized environmental scenes. COGNITIVE RESEARCH-PRINCIPLES AND IMPLICATIONS 2017; 2:29. [PMID: 28680950 PMCID: PMC5487906 DOI: 10.1186/s41235-017-0066-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 06/02/2017] [Indexed: 11/10/2022]
Abstract
The everyday auditory environment is complex and dynamic; often, multiple sounds co-occur and compete for a listener’s cognitive resources. ‘Change deafness’, framed as the auditory analog to the well-documented phenomenon of ‘change blindness’, describes the finding that changes presented within complex environments are often missed. The present study examines a number of stimulus factors that may influence change deafness under real-world listening conditions. Specifically, an AX (same-different) discrimination task was used to examine the effects of both spatial separation over a loudspeaker array and the type of change (sound source additions and removals) on discrimination of changes embedded in complex backgrounds. Results using signal detection theory and accuracy analyses indicated that, under most conditions, errors were significantly reduced for spatially distributed relative to non-spatial scenes. A second goal of the present study was to evaluate a possible link between memory for scene contents and change discrimination. Memory was evaluated by presenting a cued recall test following each trial of the discrimination task. Results using signal detection theory and accuracy analyses indicated that recall ability was similar in terms of accuracy, but there were reductions in sensitivity compared to previous reports. Finally, the present study used a large and representative sample of outdoor, urban, and environmental sounds, presented in unique combinations of nearly 1000 trials per participant. This enabled the exploration of the relationship between change perception and the perceptual similarity between change targets and background scene sounds. These (post hoc) analyses suggest both a categorical and a stimulus-level relationship between scene similarity and the magnitude of change errors.
Collapse
Affiliation(s)
- Jeremy Gaston
- Army Research Laboratory, Human Research and Engineering Directorate, Adelphi, MD USA
| | - Kelly Dickerson
- Army Research Laboratory, Human Research and Engineering Directorate, Adelphi, MD USA
| | - Daniel Hipp
- Army Research Laboratory, Human Research and Engineering Directorate, Adelphi, MD USA
| | - Peter Gerhardstein
- Army Research Laboratory, Human Research and Engineering Directorate, Adelphi, MD USA
| |
Collapse
|
18
|
Oberfeld D, Klöckner-Nowotny F. Individual differences in selective attention predict speech identification at a cocktail party. eLife 2016; 5:e16747. [PMID: 27580272 PMCID: PMC5441891 DOI: 10.7554/elife.16747] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 08/08/2016] [Indexed: 11/13/2022] Open
Abstract
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Collapse
Affiliation(s)
- Daniel Oberfeld
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| | - Felicitas Klöckner-Nowotny
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| |
Collapse
|
19
|
Lewald J, Hanenberg C, Getzmann S. Brain correlates of the orientation of auditory spatial attention onto speaker location in a “cocktail-party” situation. Psychophysiology 2016; 53:1484-95. [DOI: 10.1111/psyp.12692] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2015] [Accepted: 05/24/2016] [Indexed: 11/29/2022]
Affiliation(s)
- Jörg Lewald
- Department of Cognitive Psychology, Faculty of Psychology; Ruhr University Bochum; Bochum Germany
- Leibniz Research Centre for Working Environment and Human Factors; Dortmund Germany
| | - Christina Hanenberg
- Department of Cognitive Psychology, Faculty of Psychology; Ruhr University Bochum; Bochum Germany
- Leibniz Research Centre for Working Environment and Human Factors; Dortmund Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors; Dortmund Germany
| |
Collapse
|
20
|
Schoenmaker E, Brand T, van de Par S. The multiple contributions of interaural differences to improved speech intelligibility in multitalker scenarios. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:2589. [PMID: 27250153 DOI: 10.1121/1.4948568] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Spatial separation of talkers is known to improve speech intelligibility in a multitalker scenario. A contribution of binaural unmasking, in addition to a better-ear effect, is usually considered to account for this advantage. Binaural unmasking is assumed to result from the spectro-temporally simultaneous presence of target and masker energy with different interaural properties. However, in the case of speech targets and speech interference, the spectro-temporal signal-to-noise ratio (SNR) fluctuates strongly, resulting in audible and localizable glimpses of target speech even at adverse global SNRs. The disparate interaural properties of target and masker may thus lead to improved segregation without requiring simultaneity. This study addresses the binaural contribution to spatial release from masking due to simultaneous disparities in interaural cues between target and interferers. For that purpose stimuli were designed that lacked simultaneously occurring disparities, but yielded a percept of spatially separated speech nearly indistinguishable from that of non-modified stimuli. A phoneme recognition experiment with either three collocated or spatially separated talkers showed a substantial spatial release from masking for the modified stimuli. The results suggest that binaural unmasking made a minor contribution to spatial release from masking, and that rather the interaural cues mediated by dominant speech components were essential.
Collapse
Affiliation(s)
- Esther Schoenmaker
- Acoustics Group, Cluster of Excellence Hearing4all, Carl von Ossietzky University, 26111 Oldenburg, Germany
| | - Thomas Brand
- Medizinische Physik, Cluster of Excellence Hearing4all, Carl von Ossietzky University, 26111 Oldenburg, Germany
| | - Steven van de Par
- Acoustics Group, Cluster of Excellence Hearing4all, Carl von Ossietzky University, 26111 Oldenburg, Germany
| |
Collapse
|
21
|
Getzmann S, Hanenberg C, Lewald J, Falkenstein M, Wascher E. Effects of age on electrophysiological correlates of speech processing in a dynamic "cocktail-party" situation. Front Neurosci 2015; 9:341. [PMID: 26483623 PMCID: PMC4586946 DOI: 10.3389/fnins.2015.00341] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2015] [Accepted: 09/09/2015] [Indexed: 11/23/2022] Open
Abstract
Successful speech perception in multi-speaker environments depends on auditory scene analysis, comprising auditory object segregation and grouping, and on focusing attention toward the speaker of interest. Changes in speaker settings (e.g., in speaker position) require object re-selection and attention re-focusing. Here, we tested the processing of changes in a realistic multi-speaker scenario in younger and older adults, employing a speech-perception task, and event-related potential (ERP) measures. Sequences of short words (combinations of company names and values) were simultaneously presented via four loudspeakers at different locations, and the participants responded to the value of a target company. Voice and position of the speaker of the target information were kept constant for a variable number of trials and then changed. Relative to the pre-change level, changes caused higher error rates, and more so in older than younger adults. The ERP analysis revealed stronger fronto-central N2 and N400 components in younger adults, suggesting a more effective inhibition of concurrent speech stimuli and enhanced language processing. The difference ERPs (post-change minus pre-change) indicated a change-related N400 and late positive complex (LPC) over parietal areas in both groups. Only the older adults showed an additional frontal LPC, suggesting increased allocation of attentional resources after changes in speaker settings. In sum, changes in speaker settings are critical events for speech perception in multi-speaker environments. Especially older persons show deficits that could be based on less flexible inhibitory control and increased distraction.
Collapse
Affiliation(s)
- Stephan Getzmann
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors Dortmund, Germany
| | - Christina Hanenberg
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors Dortmund, Germany
| | - Jörg Lewald
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors Dortmund, Germany
| | - Michael Falkenstein
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors Dortmund, Germany
| | - Edmund Wascher
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors Dortmund, Germany
| |
Collapse
|
22
|
Limitations on Monaural and Binaural Temporal Processing in Bilateral Cochlear Implant Listeners. J Assoc Res Otolaryngol 2015; 16:641-52. [PMID: 26105749 PMCID: PMC4569611 DOI: 10.1007/s10162-015-0527-7] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2014] [Accepted: 05/20/2015] [Indexed: 11/20/2022] Open
Abstract
Monaural rate discrimination and binaural interaural time difference (ITD) discrimination were studied as functions of pulse rate in a group of bilaterally implanted cochlear implant users. Stimuli for the rate discrimination task were pulse trains presented to one electrode, which could be in the apical, middle, or basal part of the array, and in either the left or the right ear. In each two-interval trial, the standard stimulus had a rate of 100, 200, 300, or 500 pulses per second and the signal stimulus had a rate 35 % higher. ITD discrimination between pitch-matched electrode pairs was measured for the same standard rates as in the rate discrimination task and with an ITD of +/− 500 μs. Sensitivity (d′) on both tasks decreased with increasing rate, as has been reported previously. This study tested the hypothesis that deterioration in performance at high rates occurs for the two tasks due to a common neural basis, specific to the stimulation of each electrode. Results show that ITD scores for different pairs of electrodes correlated with the lower rate discrimination scores for those two electrodes. Statistical analysis, which partialed out overall differences between listeners, electrodes, and rates, supports the hypothesis that monaural and binaural temporal processing limitations are at least partly due to a common mechanism.
Collapse
|
23
|
Mirkovic B, Debener S, Jaeger M, De Vos M. Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications. J Neural Eng 2015; 12:046007. [PMID: 26035345 DOI: 10.1088/1741-2560/12/4/046007] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE Recent studies have provided evidence that temporal envelope driven speech decoding from high-density electroencephalography (EEG) and magnetoencephalography recordings can identify the attended speech stream in a multi-speaker scenario. The present work replicated the previous high density EEG study and investigated the necessary technical requirements for practical attended speech decoding with EEG. APPROACH Twelve normal hearing participants attended to one out of two simultaneously presented audiobook stories, while high density EEG was recorded. An offline iterative procedure eliminating those channels contributing the least to decoding provided insight into the necessary channel number and optimal cross-subject channel configuration. Aiming towards the future goal of near real-time classification with an individually trained decoder, the minimum duration of training data necessary for successful classification was determined by using a chronological cross-validation approach. MAIN RESULTS Close replication of the previously reported results confirmed the method robustness. Decoder performance remained stable from 96 channels down to 25. Furthermore, for less than 15 min of training data, the subject-independent (pre-trained) decoder performed better than an individually trained decoder did. SIGNIFICANCE Our study complements previous research and provides information suggesting that efficient low-density EEG online decoding is within reach.
Collapse
Affiliation(s)
- Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, D-26129 Oldenburg, Germany. Cluster of Excellence Hearing4all, Carl von Ossietzky University of Oldenburg, Germany
| | | | | | | |
Collapse
|
24
|
Lin G, Carlile S. Costs of switching auditory spatial attention in following conversational turn-taking. Front Neurosci 2015; 9:124. [PMID: 25941466 PMCID: PMC4403343 DOI: 10.3389/fnins.2015.00124] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 03/26/2015] [Indexed: 11/17/2022] Open
Abstract
Following a multi-talker conversation relies on the ability to rapidly and efficiently shift the focus of spatial attention from one talker to another. The current study investigated the listening costs associated with shifts in spatial attention during conversational turn-taking in 16 normally-hearing listeners using a novel sentence recall task. Three pairs of syntactically fixed but semantically unpredictable matrix sentences, recorded from a single male talker, were presented concurrently through an array of three loudspeakers (directly ahead and +/−30° azimuth). Subjects attended to one spatial location, cued by a tone, and followed the target conversation from one sentence to the next using the call-sign at the beginning of each sentence. Subjects were required to report the last three words of each sentence (speech recall task) or answer multiple choice questions related to the target material (speech comprehension task). The reading span test, attention network test, and trail making test were also administered to assess working memory, attentional control, and executive function. There was a 10.7 ± 1.3% decrease in word recall, a pronounced primacy effect, and a rise in masker confusion errors and word omissions when the target switched location between sentences. Switching costs were independent of the location, direction, and angular size of the spatial shift but did appear to be load dependent and only significant for complex questions requiring multiple cognitive operations. Reading span scores were positively correlated with total words recalled, and negatively correlated with switching costs and word omissions. Task switching speed (Trail-B time) was also significantly correlated with recall accuracy. Overall, this study highlights (i) the listening costs associated with shifts in spatial attention and (ii) the important role of working memory in maintaining goal relevant information and extracting meaning from dynamic multi-talker conversations.
Collapse
Affiliation(s)
- Gaven Lin
- Auditory Neuroscience Laboratory, Department of Physiology, School of Medical Sciences, University of Sydney Sydney, NSW, Australia
| | - Simon Carlile
- Auditory Neuroscience Laboratory, Department of Physiology, School of Medical Sciences, University of Sydney Sydney, NSW, Australia
| |
Collapse
|
25
|
Xia J, Nooraei N, Kalluri S, Edwards B. Spatial release of cognitive load measured in a dual-task paradigm in normal-hearing and hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 137:1888-1898. [PMID: 25920841 DOI: 10.1121/1.4916599] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
This study investigated whether spatial separation between talkers helps reduce cognitive processing load, and how hearing impairment interacts with the cognitive load of individuals listening in multi-talker environments. A dual-task paradigm was used in which performance on a secondary task (visual tracking) served as a measure of the cognitive load imposed by a speech recognition task. Visual tracking performance was measured under four conditions in which the target and the interferers were distinguished by (1) gender and spatial location, (2) gender only, (3) spatial location only, and (4) neither gender nor spatial location. Results showed that when gender cues were available, a 15° spatial separation between talkers reduced the cognitive load of listening even though it did not provide further improvement in speech recognition (Experiment I). Compared to normal-hearing listeners, large individual variability in spatial release of cognitive load was observed among hearing-impaired listeners. Cognitive load was lower when talkers were spatially separated by 60° than when talkers were of different genders, even though speech recognition was comparable in these two conditions (Experiment II). These results suggest that a measure of cognitive load might provide valuable insight into the benefit of spatial cues in multi-talker environments.
Collapse
Affiliation(s)
- Jing Xia
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704
| | - Nazanin Nooraei
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704
| | - Sridhar Kalluri
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704
| | - Brent Edwards
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704
| |
Collapse
|
26
|
Koelewijn T, de Kluiver H, Shinn-Cunningham BG, Zekveld AA, Kramer SE. The pupil response reveals increased listening effort when it is difficult to focus attention. Hear Res 2015; 323:81-90. [PMID: 25732724 PMCID: PMC4632994 DOI: 10.1016/j.heares.2015.02.004] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Revised: 02/05/2015] [Accepted: 02/16/2015] [Indexed: 12/04/2022]
Abstract
Recent studies have shown that prior knowledge about where, when, and who is going to talk improves speech intelligibility. How related attentional processes affect cognitive processing load has not been investigated yet. In the current study, three experiments investigated how the pupil dilation response is affected by prior knowledge of target speech location, target speech onset, and who is going to talk. A total of 56 young adults with normal hearing participated. They had to reproduce a target sentence presented to one ear while ignoring a distracting sentence simultaneously presented to the other ear. The two sentences were independently masked by fluctuating noise. Target location (left or right ear), speech onset, and talker variability were manipulated in separate experiments by keeping these features either fixed during an entire block or randomized over trials. Pupil responses were recorded during listening and performance was scored after recall. The results showed an improvement in performance when the location of the target speech was fixed instead of randomized. Additionally, location uncertainty increased the pupil dilation response, which suggests that prior knowledge of location reduces cognitive load. Interestingly, the observed pupil responses for each condition were consistent with subjective reports of listening effort. We conclude that communicating in a dynamic environment like a cocktail party (where participants in competing conversations move unpredictably) requires substantial listening effort because of the demands placed on attentional processes.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands.
| | - Hilde de Kluiver
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| | - Barbara G Shinn-Cunningham
- Department of Biomedical Engineering, Center for Computational Neuroscience and Neural Technology, Boston University, Boston, USA
| | - Adriana A Zekveld
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands; Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Sophia E Kramer
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
27
|
Getzmann S, Lewald J, Falkenstein M. Using auditory pre-information to solve the cocktail-party problem: electrophysiological evidence for age-specific differences. Front Neurosci 2014; 8:413. [PMID: 25540608 PMCID: PMC4261705 DOI: 10.3389/fnins.2014.00413] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 11/24/2014] [Indexed: 11/13/2022] Open
Abstract
Speech understanding in complex and dynamic listening environments requires (a) auditory scene analysis, namely auditory object formation and segregation, and (b) allocation of the attentional focus to the talker of interest. There is evidence that pre-information is actively used to facilitate these two aspects of the so-called “cocktail-party” problem. Here, a simulated multi-talker scenario was combined with electroencephalography to study scene analysis and allocation of attention in young and middle-aged adults. Sequences of short words (combinations of brief company names and stock-price values) from four talkers at different locations were simultaneously presented, and the detection of target names and the discrimination between critical target values were assessed. Immediately prior to speech sequences, auditory pre-information was provided via cues that either prepared auditory scene analysis or attentional focusing, or non-specific pre-information was given. While performance was generally better in younger than older participants, both age groups benefited from auditory pre-information. The analysis of the cue-related event-related potentials revealed age-specific differences in the use of pre-cues: Younger adults showed a pronounced N2 component, suggesting early inhibition of concurrent speech stimuli; older adults exhibited a stronger late P3 component, suggesting increased resource allocation to process the pre-information. In sum, the results argue for an age-specific utilization of auditory pre-information to improve listening in complex dynamic auditory environments.
Collapse
Affiliation(s)
- Stephan Getzmann
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany
| | - Jörg Lewald
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany ; Faculty of Psychology, Ruhr-University Bochum Bochum, Germany
| | - Michael Falkenstein
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany
| |
Collapse
|
28
|
Lewis DE, Manninen CM, Valente DL, Smith NA. Children's understanding of instructions presented in noise and reverberation. Am J Audiol 2014; 23:326-36. [PMID: 25036922 DOI: 10.1044/2014_aja-14-0020] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2014] [Accepted: 06/27/2014] [Indexed: 11/09/2022] Open
Abstract
PURPOSE This study examined children's ability to follow audio-visual instructions presented in noise and reverberation. METHOD Children (8-12 years of age) with normal hearing followed instructions in noise or noise plus reverberation. Performance was compared for a single talker (ST), multiple talkers speaking one at a time (MT), and multiple talkers with competing comments from other talkers (MTC). Working memory was assessed using measures of digit span. RESULTS Performance was better for children in noise than for those in noise plus reverberation. In noise, performance for ST was better than for either MT or MTC, and performance for MT was better than for MTC. In noise plus reverberation, performance for ST and MT was better than for MTC, but there were no differences between ST and MT. Digit span did not account for significant variance in the task. CONCLUSIONS Overall, children performed better in noise than in noise plus reverberation. However, differing patterns across conditions for the 2 environments suggested that the addition of reverberation may have affected performance in a way that was not apparent in noise alone. Continued research is needed to examine the differing effects of noise and reverberation on children's speech understanding.
Collapse
Affiliation(s)
| | - Crystal M. Manninen
- Boys Town National Research Hospital, Omaha, NE
- University of Nebraska–Lincoln
| | | | | |
Collapse
|
29
|
Kaya EM, Elhilali M. Investigating bottom-up auditory attention. Front Hum Neurosci 2014; 8:327. [PMID: 24904367 PMCID: PMC4034154 DOI: 10.3389/fnhum.2014.00327] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 05/01/2014] [Indexed: 11/22/2022] Open
Abstract
Bottom-up attention is a sensory-driven selection mechanism that directs perception toward a subset of the stimulus that is considered salient, or attention-grabbing. Most studies of bottom-up auditory attention have adapted frameworks similar to visual attention models whereby local or global “contrast” is a central concept in defining salient elements in a scene. In the current study, we take a more fundamental approach to modeling auditory attention; providing the first examination of the space of auditory saliency spanning pitch, intensity and timbre; and shedding light on complex interactions among these features. Informed by psychoacoustic results, we develop a computational model of auditory saliency implementing a novel attentional framework, guided by processes hypothesized to take place in the auditory pathway. In particular, the model tests the hypothesis that perception tracks the evolution of sound events in a multidimensional feature space, and flags any deviation from background statistics as salient. Predictions from the model corroborate the relationship between bottom-up auditory attention and statistical inference, and argues for a potential role of predictive coding as mechanism for saliency detection in acoustic scenes.
Collapse
Affiliation(s)
- Emine Merve Kaya
- Department of Electrical and Computer Engineering, The Johns Hopkins University Baltimore, MD, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University Baltimore, MD, USA
| |
Collapse
|
30
|
The pupil response is sensitive to divided attention during speech processing. Hear Res 2014; 312:114-20. [PMID: 24709275 PMCID: PMC4634867 DOI: 10.1016/j.heares.2014.03.010] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 03/12/2014] [Accepted: 03/24/2014] [Indexed: 11/23/2022]
Abstract
Dividing attention over two streams of speech strongly decreases performance compared to focusing on only one. How divided attention affects cognitive processing load as indexed with pupillometry during speech recognition has so far not been investigated. In 12 young adults the pupil response was recorded while they focused on either one or both of two sentences that were presented dichotically and masked by fluctuating noise across a range of signal-to-noise ratios. In line with previous studies, the performance decreases when processing two target sentences instead of one. Additionally, dividing attention to process two sentences caused larger pupil dilation and later peak pupil latency than processing only one. This suggests an effect of attention on cognitive processing load (pupil dilation) during speech processing in noise.
Collapse
|
31
|
Abstract
In a complex auditory scene, a "cocktail party" for example, listeners can disentangle multiple competing sequences of sounds. A recent psychophysical study in our laboratory demonstrated a robust spatial component of stream segregation showing ∼8° acuity. Here, we recorded single- and multiple-neuron responses from the primary auditory cortex of anesthetized cats while presenting interleaved sound sequences that human listeners would experience as segregated streams. Sequences of broadband sounds alternated between pairs of locations. Neurons synchronized preferentially to sounds from one or the other location, thereby segregating competing sound sequences. Neurons favoring one source location or the other tended to aggregate within the cortex, suggestive of modular organization. The spatial acuity of stream segregation was as narrow as ∼10°, markedly sharper than the broad spatial tuning for single sources that is well known in the literature. Spatial sensitivity was sharpest among neurons having high characteristic frequencies. Neural stream segregation was predicted well by a parameter-free model that incorporated single-source spatial sensitivity and a measured forward-suppression term. We found that the forward suppression was not due to post discharge adaptation in the cortex and, therefore, must have arisen in the subcortical pathway or at the level of thalamocortical synapses. A linear-classifier analysis of single-neuron responses to rhythmic stimuli like those used in our psychophysical study yielded thresholds overlapping those of human listeners. Overall, the results indicate that the ascending auditory system does the work of segregating auditory streams, bringing them to discrete modules in the cortex for selection by top-down processes.
Collapse
|
32
|
Robinson PW, Pätynen J, Lokki T, Jang HS, Jeon JY, Xiang N. The role of diffusive architectural surfaces on auditory spatial discrimination in performance venues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:3940-50. [PMID: 23742348 DOI: 10.1121/1.4803846] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
In musical or theatrical performance, some venues allow listeners to individually localize and segregate individual performers, while others produce a well blended ensemble sound. The room acoustic conditions that make this possible, and the psycho-acoustic effects at work are not fully understood. This research utilizes auralizations from measured and simulated performance venues to investigate spatial discrimination of multiple acoustic sources in rooms. Signals were generated from measurements taken in a small theater, and listeners in the audience area were asked to distinguish pairs of speech sources on stage with various spatial separations. This experiment was repeated with the proscenium splay walls treated to be flat, diffusive, or absorptive. Similar experiments were conducted in a simulated hall, utilizing 11 early reflections with various characteristics, and measured late reverberation. The experiments reveal that discriminating the lateral arrangement of two sources is possible at narrower separation angles when reflections come from flat or absorptive rather than diffusive surfaces.
Collapse
Affiliation(s)
- Philip W Robinson
- Department of Media Technology, Aalto University School of Science, P.O. Box 15500, FI-00076 Aalto, Finland.
| | | | | | | | | | | |
Collapse
|
33
|
Sussman ES, Steinschneider M. Attention modifies sound level detection in young children. Dev Cogn Neurosci 2013; 1:351-60. [PMID: 21808660 DOI: 10.1016/j.dcn.2011.01.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Have you ever shouted your child's name from the kitchen while they were watching television in the living room to no avail, so you shout their name again, only louder? Yet, still no response. The current study provides evidence that young children process loudness changes differently than pitch changes when they are engaged in another task such as watching a video. Intensity level changes were physiologically detected only when they were behaviorally relevant, but frequency level changes were physiologically detected without task relevance in younger children. This suggests that changes in pitch rather than changes in volume may be more effective in evoking a response when sounds are unexpected. Further, even though behavioral ability may appear to be similar in younger and older children, attention-based physiologic responses differ from automatic physiologic processes in children. Results indicate that 1) the automatic auditory processes leading to more efficient higher-level skills continue to become refined through childhood; and 2) there are different time courses for the maturation of physiological processes encoding the distinct acoustic attributes of sound pitch and sound intensity. The relevance of these findings to sound perception in real-world environments is discussed.
Collapse
Affiliation(s)
- Elyse S Sussman
- Department of Neuroscience, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, United States.
| | | |
Collapse
|
34
|
Kong L, Michalka SW, Rosen ML, Sheremata SL, Swisher JD, Shinn-Cunningham BG, Somers DC. Auditory spatial attention representations in the human cerebral cortex. Cereb Cortex 2012. [PMID: 23180753 DOI: 10.1093/cercor/bhs359] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Auditory spatial attention serves important functions in auditory source separation and selection. Although auditory spatial attention mechanisms have been generally investigated, the neural substrates encoding spatial information acted on by attention have not been identified in the human neocortex. We performed functional magnetic resonance imaging experiments to identify cortical regions that support auditory spatial attention and to test 2 hypotheses regarding the coding of auditory spatial attention: 1) auditory spatial attention might recruit the visuospatial maps of the intraparietal sulcus (IPS) to create multimodal spatial attention maps; 2) auditory spatial information might be encoded without explicit cortical maps. We mapped visuotopic IPS regions in individual subjects and measured auditory spatial attention effects within these regions of interest. Contrary to the multimodal map hypothesis, we observed that auditory spatial attentional modulations spared the visuotopic maps of IPS; the parietal regions activated by auditory attention lacked map structure. However, multivoxel pattern analysis revealed that the superior temporal gyrus and the supramarginal gyrus contained significant information about the direction of spatial attention. These findings support the hypothesis that auditory spatial information is coded without a cortical map representation. Our findings suggest that audiospatial and visuospatial attention utilize distinctly different spatial coding schemes.
Collapse
|
35
|
Ihlefeld A, Litovsky RY. Interaural level differences do not suffice for restoring spatial release from masking in simulated cochlear implant listening. PLoS One 2012; 7:e45296. [PMID: 23028914 PMCID: PMC3447935 DOI: 10.1371/journal.pone.0045296] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 08/21/2012] [Indexed: 11/18/2022] Open
Abstract
Spatial release from masking refers to a benefit for speech understanding. It occurs when a target talker and a masker talker are spatially separated. In those cases, speech intelligibility for target speech is typically higher than when both talkers are at the same location. In cochlear implant listeners, spatial release from masking is much reduced or absent compared with normal hearing listeners. Perhaps this reduced spatial release occurs because cochlear implant listeners cannot effectively attend to spatial cues. Three experiments examined factors that may interfere with deploying spatial attention to a target talker masked by another talker. To simulate cochlear implant listening, stimuli were vocoded with two unique features. First, we used 50-Hz low-pass filtered speech envelopes and noise carriers, strongly reducing the possibility of temporal pitch cues; second, co-modulation was imposed on target and masker utterances to enhance perceptual fusion between the two sources. Stimuli were presented over headphones. Experiments 1 and 2 presented high-fidelity spatial cues with unprocessed and vocoded speech. Experiment 3 maintained faithful long-term average interaural level differences but presented scrambled interaural time differences with vocoded speech. Results show a robust spatial release from masking in Experiments 1 and 2, and a greatly reduced spatial release in Experiment 3. Faithful long-term average interaural level differences were insufficient for producing spatial release from masking. This suggests that appropriate interaural time differences are necessary for restoring spatial release from masking, at least for a situation where there are few viable alternative segregation cues.
Collapse
Affiliation(s)
- Antje Ihlefeld
- University of Wisconsin Waisman Center, Madison, Wisconsin, United States of America
- New York University, Center for Neural Science, New York, New York, United States of America
| | - Ruth Y. Litovsky
- University of Wisconsin Waisman Center, Madison, Wisconsin, United States of America
| |
Collapse
|
36
|
Rothpletz AM, Wightman FL, Kistler DJ. Informational masking and spatial hearing in listeners with and without unilateral hearing loss. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:511-531. [PMID: 22215037 PMCID: PMC3320681 DOI: 10.1044/1092-4388(2011/10-0205)] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE This study assessed selective listening for speech in individuals with and without unilateral hearing loss (UHL) and the potential relationship between spatial release from informational masking and localization ability in listeners with UHL. METHOD Twelve adults with UHL and 12 normal-hearing controls completed a series of monaural and binaural speech tasks that were designed to measure informational masking. They also completed a horizontal localization task. RESULTS Monaural performance by participants with UHL was comparable to that of normal-hearing participants. Unlike the normal-hearing participants, the participants with UHL did not exhibit a true spatial release from informational masking. Rather, their performance could be predicted by head shadow effects. Performance among participants with UHL in the localization task was quite variable, with some showing near-normal abilities and others demonstrating no localization ability. CONCLUSION Individuals with UHL did not show deficits in all listening situations but were at a significant disadvantage when listening to speech in environments where normal-hearing listeners benefit from spatial separation between target and masker. This inability to capitalize on spatial cues for selective listening does not appear to be related to localization ability.
Collapse
|
37
|
Maddox RK, Shinn-Cunningham BG. Influence of task-relevant and task-irrelevant feature continuity on selective auditory attention. J Assoc Res Otolaryngol 2011; 13:119-29. [PMID: 22124889 DOI: 10.1007/s10162-011-0299-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2011] [Accepted: 10/19/2011] [Indexed: 11/30/2022] Open
Abstract
Past studies have explored the relative strengths of auditory features in a selective attention task by pitting features against one another and asking listeners to report the words perceived in a given sentence. While these studies show that the continuity of competing features affects streaming, they did not address whether the influence of specific features is modulated by volitionally directed attention. Here, we explored whether the continuity of a task-irrelevant feature affects the ability to selectively report one of two competing speech streams when attention is specifically directed to a different feature. Sequences of simultaneous pairs of spoken digits were presented in which exactly one digit of each pair matched a primer phrase in pitch and exactly one digit of each pair matched the primer location. Within a trial, location and pitch were randomly paired; they either were consistent with each other from digit to digit or were switched (e.g., the sequence from the primer's location changed pitch across digits). In otherwise identical blocks, listeners were instructed to report digits matching the primer either in location or in pitch. Listeners were told to ignore the irrelevant feature, if possible, in order to perform well. Listener responses depended on task instructions, proving that top-down attention alters how a subject performs the task. Performance improved when the separation of the target and masker in the task-relevant feature increased. Importantly, the values of the task-irrelevant feature also influenced performance in some cases. Specifically, when instructed to attend location, listeners performed worse as the separation between target and masker pitch increased, especially when the spatial separation between digits was small. These results indicate that task-relevant and task-irrelevant features are perceptually bound together: continuity of task-irrelevant features influences selective attention in an automatic, obligatory manner, consistent with the idea that auditory attention operates on objects.
Collapse
Affiliation(s)
- Ross K Maddox
- Hearing Research Center, Biomedical Engineering, Boston, MA 02215, USA.
| | | |
Collapse
|
38
|
Lotto A, Holt L. Psychology of auditory perception. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2011; 2:479-489. [PMID: 26302301 DOI: 10.1002/wcs.123] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Audition is often treated as a 'secondary' sensory system behind vision in the study of cognitive science. In this review, we focus on three seemingly simple perceptual tasks to demonstrate the complexity of perceptual-cognitive processing involved in everyday audition. After providing a short overview of the characteristics of sound and their neural encoding, we present a description of the perceptual task of segregating multiple sound events that are mixed together in the signal reaching the ears. Then, we discuss the ability to localize the sound source in the environment. Finally, we provide some data and theory on how listeners categorize complex sounds, such as speech. In particular, we present research on how listeners weigh multiple acoustic cues in making a categorization decision. One conclusion of this review is that it is time for auditory cognitive science to be developed to match what has been done in vision in order for us to better understand how humans communicate with speech and music. WIREs Cogni Sci 2011 2 479-489 DOI: 10.1002/wcs.123 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Andrew Lotto
- Department of Speech, Language, and Hearing Sciences, Tucson, AZ, USA
| | - Lori Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
39
|
Spatial selective auditory attention in the presence of reverberant energy: individual differences in normal-hearing listeners. J Assoc Res Otolaryngol 2010; 12:395-405. [PMID: 21128091 DOI: 10.1007/s10162-010-0254-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Accepted: 11/16/2010] [Indexed: 10/18/2022] Open
Abstract
Listeners can selectively attend to a desired target by directing attention to known target source features, such as location or pitch. Reverberation, however, reduces the reliability of the cues that allow a target source to be segregated and selected from a sound mixture. Given this, it is likely that reverberant energy interferes with selective auditory attention. Anecdotal reports suggest that the ability to focus spatial auditory attention degrades even with early aging, yet there is little evidence that middle-aged listeners have behavioral deficits on tasks requiring selective auditory attention. The current study was designed to look for individual differences in selective attention ability and to see if any such differences correlate with age. Normal-hearing adults, ranging in age from 18 to 55 years, were asked to report a stream of digits located directly ahead in a simulated rectangular room. Simultaneous, competing masker digit streams were simulated at locations 15° left and right of center. The level of reverberation was varied to alter task difficulty by interfering with localization cues (increasing localization blur). Overall, performance was best in the anechoic condition and worst in the high-reverberation condition. Listeners nearly always reported a digit from one of the three competing streams, showing that reverberation did not render the digits unintelligible. Importantly, inter-subject differences were extremely large. These differences, however, were not significantly correlated with age, memory span, or hearing status. These results show that listeners with audiometrically normal pure tone thresholds differ in their ability to selectively attend to a desired source, a task important in everyday communication. Further work is necessary to determine if these differences arise from differences in peripheral auditory function or in more central function.
Collapse
|
40
|
Ihlefeld A, Deeks JM, Axon PR, Carlyon RP. Simulations of cochlear-implant speech perception in modulated and unmodulated noise. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:870-880. [PMID: 20707456 DOI: 10.1121/1.3458817] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Experiment 1 replicated the finding that normal-hearing listeners identify speech better in modulated than in unmodulated noise. This modulated-unmodulated difference ("MUD") has been previously shown to be reduced or absent for cochlear-implant listeners and for normal-hearing listeners presented with noise-vocoded speech. Experiments 2-3 presented normal-hearing listeners with noise-vocoded speech in unmodulated or 16-Hz-square-wave modulated noise, and investigated whether the introduction of simple binaural differences between target and masker could restore the masking release. Stimuli were presented over headphones. When the target and masker were presented to one ear, adding a copy of the masker to the other ear ("diotic configuration") aided performance but did so to a similar degree for modulated and unmodulated maskers, thereby failing to improve the modulation masking release. Presenting an uncorrelated noise to the opposite ear ("dichotic configuration") had no effect, either for modulated or unmodulated maskers, consistent with the improved performance in the diotic configuration being due to interaural decorrelation processing. For noise-vocoded speech, the provision of simple spatial differences did not allow listeners to take greater advantage of the dips present in a modulated masker.
Collapse
Affiliation(s)
- Antje Ihlefeld
- MRC Cognition and Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 7EF, United Kingdom
| | | | | | | |
Collapse
|
41
|
Improved segregation of simultaneous talkers differentially affects perceptual and cognitive capacity demands for recognizing speech in competing speech. Atten Percept Psychophys 2010; 72:501-16. [DOI: 10.3758/app.72.2.501] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
42
|
Hill KT, Miller LM. Auditory attentional control and selection during cocktail party listening. Cereb Cortex 2009; 20:583-90. [PMID: 19574393 DOI: 10.1093/cercor/bhp124] [Citation(s) in RCA: 126] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
In realistic auditory environments, people rely on both attentional control and attentional selection to extract intelligible signals from a cluttered background. We used functional magnetic resonance imaging to examine auditory attention to natural speech under such high processing-load conditions. Participants attended to a single talker in a group of 3, identified by the target talker's pitch or spatial location. A catch-trial design allowed us to distinguish activity due to top-down control of attention versus attentional selection of bottom-up information in both the spatial and spectral (pitch) feature domains. For attentional control, we found a left-dominant fronto-parietal network with a bias toward spatial processing in dorsal precentral sulcus and superior parietal lobule, and a bias toward pitch in inferior frontal gyrus. During selection of the talker, attention modulated activity in left intraparietal sulcus when using talker location and in bilateral but right-dominant superior temporal sulcus when using talker pitch. We argue that these networks represent the sources and targets of selective attention in rich auditory environments.
Collapse
Affiliation(s)
- Kevin T Hill
- Center for Mind and Brain, University of California Davis, Davis, CA 95618, USA
| | | |
Collapse
|
43
|
Abstract
A common complaint among listeners with hearing loss (HL) is that they have difficulty communicating in common social settings. This article reviews how normal-hearing listeners cope in such settings, especially how they focus attention on a source of interest. Results of experiments with normal-hearing listeners suggest that the ability to selectively attend depends on the ability to analyze the acoustic scene and to form perceptual auditory objects properly. Unfortunately, sound features important for auditory object formation may not be robustly encoded in the auditory periphery of HL listeners. In turn, impaired auditory object formation may interfere with the ability to filter out competing sound sources. Peripheral degradations are also likely to reduce the salience of higher-order auditory cues such as location, pitch, and timbre, which enable normal-hearing listeners to select a desired sound source out of a sound mixture. Degraded peripheral processing is also likely to increase the time required to form auditory objects and focus selective attention so that listeners with HL lose the ability to switch attention rapidly (a skill that is particularly important when trying to participate in a lively conversation). Finally, peripheral deficits may interfere with strategies that normal-hearing listeners employ in complex acoustic settings, including the use of memory to fill in bits of the conversation that are missed. Thus, peripheral hearing deficits are likely to cause a number of interrelated problems that challenge the ability of HL listeners to communicate in social settings requiring selective attention.
Collapse
Affiliation(s)
- Barbara G Shinn-Cunningham
- Hearing Research Center, Departments of Cognitive and Neural Systems and Biomedical Engineering, Boston University, Boston, MA 02421, USA.
| | | |
Collapse
|