1
|
Strivens A, Koch I, Lavric A. Does preparation help to switch auditory attention between simultaneous voices: Effects of switch probability and prevalence of conflict. Atten Percept Psychophys 2024; 86:750-767. [PMID: 38212478 PMCID: PMC11062987 DOI: 10.3758/s13414-023-02841-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 01/13/2024]
Abstract
Switching auditory attention to one of two (or more) simultaneous voices incurs a substantial performance overhead. Whether/when this voice 'switch cost' reduces when the listener has opportunity to prepare in silence is not clear-the findings on the effect of preparation on the switch cost range from (near) null to substantial. We sought to determine which factors are crucial for encouraging preparation and detecting its effect on the switch cost in a paradigm where participants categorized the number spoken by one of two simultaneous voices; the target voice, which changed unpredictably, was specified by a visual cue depicting the target's gender. First, we manipulated the probability of a voice switch. When 25% of trials were switches, increasing the preparation interval (50/800/1,400 ms) resulted in substantial (~50%) reduction in switch cost. No reduction was observed when 75% of trials were switches. Second, we examined the relative prevalence of low-conflict, 'congruent' trials (where the numbers spoken by the two voices were mapped onto the same response) and high-conflict, 'incongruent' trials (where the voices afforded different responses). 'Conflict prevalence' had a strong effect on selectivity-the incongruent-congruent difference ('congruence effect') was reduced in the 66%-incongruent condition relative to the 66%-congruent condition-but conflict prevalence did not discernibly interact with preparation and its effect on the switch cost. Thus, conditions where switches of target voice are relatively rare are especially conducive to preparation, possibly because attention is committed more strongly to (and/or disengaged less rapidly from) the perceptual features of target voice.
Collapse
Affiliation(s)
- Amy Strivens
- Institute for Psychology, RWTH Aachen University, Jägerstraße 17-19, 52066, Aachen, Germany.
| | - Iring Koch
- Institute for Psychology, RWTH Aachen University, Jägerstraße 17-19, 52066, Aachen, Germany
| | - Aureliu Lavric
- Department of Psychology, University of Exeter, Exeter, UK
| |
Collapse
|
2
|
Holmes E, Johnsrude IS. Intelligibility benefit for familiar voices is not accompanied by better discrimination of fundamental frequency or vocal tract length. Hear Res 2023; 429:108704. [PMID: 36701896 DOI: 10.1016/j.heares.2023.108704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/11/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023]
Abstract
Speech is more intelligible when it is spoken by familiar than unfamiliar people. If this benefit arises because key voice characteristics like perceptual correlates of fundamental frequency or vocal tract length (VTL) are more accurately represented for familiar voices, listeners may be able to discriminate smaller manipulations to such characteristics for familiar than unfamiliar voices. We measured participants' (N = 17) thresholds for discriminating pitch (correlate of fundamental frequency, or glottal pulse rate) and formant spacing (correlate of VTL; 'VTL-timbre') for voices that were familiar (participants' friends) and unfamiliar (other participants' friends). As expected, familiar voices were more intelligible. However, discrimination thresholds were no smaller for the same familiar voices. The size of the intelligibility benefit for a familiar over an unfamiliar voice did not relate to the difference in discrimination thresholds for the same voices. Also, the familiar-voice intelligibility benefit was just as large following perceptible manipulations to pitch and VTL-timbre. These results are more consistent with cognitive accounts of speech perception than traditional accounts that predict better discrimination.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Speech Hearing and Phonetic Sciences, UCL, London WC1N 1PF, UK; Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada; School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario N6G 1H1, Canada
| |
Collapse
|
3
|
Shehabi AM, Prendergast G, Guest H, Plack CJ. The Effect of Lifetime Noise Exposure and Aging on Speech-Perception-in-Noise Ability and Self-Reported Hearing Symptoms: An Online Study. Front Aging Neurosci 2022; 14:890010. [PMID: 35711902 PMCID: PMC9195834 DOI: 10.3389/fnagi.2022.890010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/28/2022] [Indexed: 12/03/2022] Open
Abstract
Animal research shows that aging and excessive noise exposure damage cochlear outer hair cells, inner hair cells, and the synapses connecting inner hair cells with the auditory nerve. This may translate into auditory symptoms such as difficulty understanding speech in noise, tinnitus, and hyperacusis. The current study, using a novel online approach, assessed and quantified the effects of lifetime noise exposure and aging on (i) speech-perception-in-noise (SPiN) thresholds, (ii) self-reported hearing ability, and (iii) the presence of tinnitus. Secondary aims involved documenting the effects of lifetime noise exposure and aging on tinnitus handicap and the severity of hyperacusis. Two hundred and ninety-four adults with no past diagnosis of hearing or memory impairments were recruited online. Participants were assigned into two groups: 217 "young" (age range: 18-35 years, females: 151) and 77 "older" (age range: 50-70 years, females: 50). Participants completed a set of online instruments including an otologic health and demographic questionnaire, a dementia screening tool, forward and backward digit span tests, a noise exposure questionnaire, the Khalfa hyperacusis questionnaire, the short-form of the Speech, Spatial, and Qualities of Hearing scale, the Tinnitus Handicap Inventory, a digits-in-noise test, and a Coordinate Response Measure speech-perception test. Analyses controlled for sex and cognitive function as reflected by the digit span. A detailed protocol was pre-registered, to guard against "p-hacking" of this extensive dataset. Lifetime noise exposure did not predict SPiN thresholds, self-reported hearing ability, or the presence of tinnitus in either age group. Exploratory analyses showed that worse hyperacusis scores, and a greater prevalence of tinnitus, were associated significantly with high lifetime noise exposure in the young, but not in the older group. Age was a significant predictor of SPiN thresholds and the presence of tinnitus, but not of self-reported hearing ability, tinnitus handicap, or severity of hyperacusis. Consistent with several lab studies, our online-derived data suggest that older adults with no diagnosis of hearing impairment have a poorer SPiN ability and a higher risk of tinnitus than their younger counterparts. Moreover, lifetime noise exposure may increase the risk of tinnitus and the severity of hyperacusis in young adults with no diagnosis of hearing impairment.
Collapse
Affiliation(s)
- Adnan M. Shehabi
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
- Department of Audiology and Speech Therapy, Birzeit University, Birzeit, Palestine
| | - Garreth Prendergast
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
| | - Hannah Guest
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
| | - Christopher J. Plack
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
- Department of Psychology, Lancaster University, Lancaster, United Kingdom
| |
Collapse
|
4
|
Kachlicka M, Laffere A, Dick F, Tierney A. Slow phase-locked modulations support selective attention to sound. Neuroimage 2022; 252:119024. [PMID: 35231629 PMCID: PMC9133470 DOI: 10.1016/j.neuroimage.2022.119024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/16/2022] [Accepted: 02/19/2022] [Indexed: 11/16/2022] Open
Abstract
To make sense of complex soundscapes, listeners must select and attend to task-relevant streams while ignoring uninformative sounds. One possible neural mechanism underlying this process is alignment of endogenous oscillations with the temporal structure of the target sound stream. Such a mechanism has been suggested to mediate attentional modulation of neural phase-locking to the rhythms of attended sounds. However, such modulations are compatible with an alternate framework, where attention acts as a filter that enhances exogenously-driven neural auditory responses. Here we attempted to test several predictions arising from the oscillatory account by playing two tone streams varying across conditions in tone duration and presentation rate; participants attended to one stream or listened passively. Attentional modulation of the evoked waveform was roughly sinusoidal and scaled with rate, while the passive response did not. However, there was only limited evidence for continuation of modulations through the silence between sequences. These results suggest that attentionally-driven changes in phase alignment reflect synchronization of slow endogenous activity with the temporal structure of attended stimuli.
Collapse
Affiliation(s)
- Magdalena Kachlicka
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England; Division of Psychology & Language Sciences, UCL, Gower Street, London WC1E 6BT, England
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London WC1E 7HX, England.
| |
Collapse
|
5
|
Uhrig S, Perkis A, Möller S, Svensson UP, Behne DM. Effects of Spatial Speech Presentation on Listener Response Strategy for Talker-Identification. Front Neurosci 2022; 15:730744. [PMID: 35153653 PMCID: PMC8831717 DOI: 10.3389/fnins.2021.730744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/13/2021] [Indexed: 11/28/2022] Open
Abstract
This study investigates effects of spatial auditory cues on human listeners' response strategy for identifying two alternately active talkers (“turn-taking” listening scenario). Previous research has demonstrated subjective benefits of audio spatialization with regard to speech intelligibility and talker-identification effort. So far, the deliberate activation of specific perceptual and cognitive processes by listeners to optimize their task performance remained largely unexamined. Spoken sentences selected as stimuli were either clean or degraded due to background noise or bandpass filtering. Stimuli were presented via three horizontally positioned loudspeakers: In a non-spatial mode, both talkers were presented through a central loudspeaker; in a spatial mode, each talker was presented through the central or a talker-specific lateral loudspeaker. Participants identified talkers via speeded keypresses and afterwards provided subjective ratings (speech quality, speech intelligibility, voice similarity, talker-identification effort). In the spatial mode, presentations at lateral loudspeaker locations entailed quicker behavioral responses, which were significantly slower in comparison to a talker-localization task. Under clean speech, response times globally increased in the spatial vs. non-spatial mode (across all locations); these “response time switch costs,” presumably being caused by repeated switching of spatial auditory attention between different locations, diminished under degraded speech. No significant effects of spatialization on subjective ratings were found. The results suggested that when listeners could utilize task-relevant auditory cues about talker location, they continued to rely on voice recognition instead of localization of talker sound sources as primary response strategy. Besides, the presence of speech degradations may have led to increased cognitive control, which in turn compensated for incurring response time switch costs.
Collapse
Affiliation(s)
- Stefan Uhrig
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- *Correspondence: Stefan Uhrig
| | - Andrew Perkis
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Sebastian Möller
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- Speech and Language Technology, German Research Center for Artificial Intelligence, Berlin, Germany
| | - U. Peter Svensson
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M. Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
6
|
Eckert MA, Teubner-Rhodes S, Vaden KI, Ahlstrom JB, McClaskey CM, Dubno JR. Unique patterns of hearing loss and cognition in older adults' neural responses to cues for speech recognition difficulty. Brain Struct Funct 2022; 227:203-218. [PMID: 34632538 PMCID: PMC9044122 DOI: 10.1007/s00429-021-02398-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 09/26/2021] [Indexed: 01/31/2023]
Abstract
Older adults with hearing loss experience significant difficulties understanding speech in noise, perhaps due in part to limited benefit from supporting executive functions that enable the use of environmental cues signaling changes in listening conditions. Here we examined the degree to which 41 older adults (60.56-86.25 years) exhibited cortical responses to informative listening difficulty cues that communicated the listening difficulty for each trial compared to neutral cues that were uninformative of listening difficulty. Word recognition was significantly higher for informative compared to uninformative cues in a + 10 dB signal-to-noise ratio (SNR) condition, and response latencies were significantly shorter for informative cues in the + 10 dB SNR and the more-challenging + 2 dB SNR conditions. Informative cues were associated with elevated blood oxygenation level-dependent contrast in visual and parietal cortex. A cue-SNR interaction effect was observed in the cingulo-opercular (CO) network, such that activity only differed between SNR conditions when an informative cue was presented. That is, participants used the informative cues to prepare for changes in listening difficulty from one trial to the next. This cue-SNR interaction effect was driven by older adults with more low-frequency hearing loss and was not observed for those with more high-frequency hearing loss, poorer set-shifting task performance, and lower frontal operculum gray matter volume. These results suggest that proactive strategies for engaging CO adaptive control may be important for older adults with high-frequency hearing loss to optimize speech recognition in changing and challenging listening conditions.
Collapse
Affiliation(s)
- Mark A. Eckert
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC 29425-5500, USA
| | | | - Kenneth I. Vaden
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC 29425-5500, USA
| | - Jayne B. Ahlstrom
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC 29425-5500, USA
| | - Carolyn M. McClaskey
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC 29425-5500, USA
| | - Judy R. Dubno
- Hearing Research Program, Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 55, Charleston, SC 29425-5500, USA
| |
Collapse
|
7
|
Heeren J, Nuesse T, Latzel M, Holube I, Hohmann V, Wagener KC, Schulte M. The Concurrent OLSA Test: A Method for Speech Recognition in Multi-talker Situations at Fixed SNR. Trends Hear 2022; 26:23312165221108257. [PMID: 35702051 PMCID: PMC9208053 DOI: 10.1177/23312165221108257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
A multi-talker paradigm is introduced that uses different attentional processes to adjust speech-recognition scores with the goal of conducting measurements at high signal-to-noise ratios (SNR). The basic idea is to simulate a group conversation with three talkers. Talkers alternately speak sentences of the German matrix test OLSA. Each time a sentence begins with the name “Kerstin” (call sign), the participant is addressed and instructed to repeat the last words of all sentences from that talker, until another talker begins a sentence with “Kerstin”. The alternation of the talkers is implemented with an adjustable overlap time that causes an overlap between the call sign “Kerstin” and the target words to be repeated. Thus, the two tasks of detecting “Kerstin” and repeating target words are to be done at the same time. The paradigm was tested with 22 young normal-hearing participants (YNH) for three overlap times (0.6 s, 0.8 s, 1.0 s). Results for these overlap times show significant differences, with median target word recognition scores of 88%, 82%, and 77%, respectively (including call-sign and dual-task effects). A comparison of the dual task with the corresponding single tasks suggests that the observed effects reflect an increased cognitive load.
Collapse
Affiliation(s)
- Jan Heeren
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Oldenburg, Germany
| | - Theresa Nuesse
- Cluster of Excellence Hearing4All, Oldenburg, Germany.,Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany
| | | | - Inga Holube
- Cluster of Excellence Hearing4All, Oldenburg, Germany.,Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg, Germany
| | - Volker Hohmann
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Oldenburg, Germany.,Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Oldenburg, Germany
| | - Kirsten C Wagener
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Oldenburg, Germany
| | - Michael Schulte
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany.,Cluster of Excellence Hearing4All, Oldenburg, Germany
| |
Collapse
|
8
|
Turri S, Rizvi M, Rabini G, Melonio A, Gennari R, Pavani F. Orienting Auditory Attention through Vision: the Impact of Monaural Listening. Multisens Res 2021; 35:1-28. [PMID: 34384046 DOI: 10.1163/22134808-bja10059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2021] [Indexed: 11/19/2022]
Abstract
The understanding of linguistic messages can be made extremely complex by the simultaneous presence of interfering sounds, especially when they are also linguistic in nature. In two experiments, we tested if visual cues directing attention to spatial or temporal components of speech in noise can improve its identification. The hearing-in-noise task required identification of a five-digit sequence (target) embedded in a stream of time-reversed speech. Using a custom-built device located in front of the participant, we delivered visual cues to orient attention to the location of target sounds and/or their temporal window. In Exp. 1 ( n = 14), we validated this visual-to-auditory cueing method in normal-hearing listeners, tested under typical binaural listening conditions. In Exp. 2 ( n = 13), we assessed the efficacy of the same visual cues in normal-hearing listeners wearing a monaural ear plug, to study the effects of simulated monaural and conductive hearing loss on visual-to-auditory attention orienting. While Exp. 1 revealed a benefit of both spatial and temporal visual cues for hearing in noise, Exp. 2 showed that only the temporal visual cues remained effective during monaural listening. These findings indicate that when the acoustic experience is altered, visual-to-auditory attention orienting is more robust for temporal compared to spatial attributes of the auditory stimuli. These findings have implications for the relation between spatial and temporal attributes of sound objects, and when planning devices to orient audiovisual attention for subjects suffering from hearing loss.
Collapse
Affiliation(s)
- Silvia Turri
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,Dipartimento di Psicologia e Scienze Cognitive, Università di Trento, 38068 Rovereto, Italy
| | - Mehdi Rizvi
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Giuseppe Rabini
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy
| | - Alessandra Melonio
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Rosella Gennari
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Francesco Pavani
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,IMPACT, Centre de Recherche en Neurosciences de Lyon (CRNL), 69500 Bron, France
| |
Collapse
|
9
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
10
|
Carcagno S, Plack CJ. Effects of age on psychophysical measures of auditory temporal processing and speech reception at low and high levels. Hear Res 2020; 400:108117. [PMID: 33253994 PMCID: PMC7812372 DOI: 10.1016/j.heares.2020.108117] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 10/18/2020] [Accepted: 11/17/2020] [Indexed: 01/21/2023]
Abstract
We found little evidence of greater age-related hearing declines at high sound levels. There are age-related temporal-processing declines independent of hearing loss. No evidence of age-related speech-reception deficits independent of hearing loss.
Age-related cochlear synaptopathy (CS) has been shown to occur in rodents with minimal noise exposure, and has been hypothesized to play a crucial role in age-related hearing declines in humans. It is not known to what extent age-related CS occurs in humans, and how it affects the coding of supra-threshold sounds and speech in noise. Because in rodents CS affects mainly low- and medium-spontaneous rate (L/M-SR) auditory-nerve fibers with rate-level functions covering medium-high levels, it should lead to greater deficits in the processing of sounds at high than at low stimulus levels. In this cross-sectional study the performance of 102 listeners across the age range (34 young, 34 middle-aged, 34 older) was assessed in a set of psychophysical temporal processing and speech reception in noise tests at both low, and high stimulus levels. Mixed-effect multiple regression models were used to estimate the effects of age while partialing out effects of audiometric thresholds, lifetime noise exposure, cognitive abilities (assessed with additional tests), and musical experience. Age was independently associated with performance deficits on several tests. However, only for one out of 13 tests were age effects credibly larger at the high compared to the low stimulus level. Overall these results do not provide much evidence that age-related CS, to the extent to which it may occur in humans according to the rodent model of greater L/M-SR synaptic loss, has substantial effects on psychophysical measures of auditory temporal processing or on speech reception in noise.
Collapse
Affiliation(s)
- Samuele Carcagno
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, United Kingdom.
| | - Christopher J Plack
- Department of Psychology, Lancaster University, Lancaster, LA1 4YF, United Kingdom; Manchester Centre for Audiology and Deafness, University of Manchester, Academic Health Science Centre, M13 9PL, United Kingdom
| |
Collapse
|
11
|
Sharma NK, Krishnamohan V, Ganapathy S, Gangopadhayay A, Fink L. Acoustic and linguistic features influence talker change detection. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL414. [PMID: 33261377 DOI: 10.1121/10.0002462] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2020] [Accepted: 10/20/2020] [Indexed: 06/12/2023]
Abstract
A listening test is proposed in which human participants detect talker changes in two natural, multi-talker speech stimuli sets-a familiar language (English) and an unfamiliar language (Chinese). Miss rate, false-alarm rate, and response times (RT) showed a significant dependence on language familiarity. Linear regression modeling of RTs using diverse acoustic features derived from the stimuli showed recruitment of a pool of acoustic features for the talker change detection task. Further, benchmarking the same task against the state-of-the-art machine diarization system showed that the machine system achieves human parity for the familiar language but not for the unfamiliar language.
Collapse
Affiliation(s)
- Neeraj Kumar Sharma
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Venkat Krishnamohan
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Sriram Ganapathy
- Learning and Extraction of Acoustic Patterns Lab, Indian Institute of Science, Bangalore
| | - Ahana Gangopadhayay
- Electrical and Systems Engineering, Washington University in St. Louis, Missouri, USA
| | - Lauren Fink
- Music Department, Max Planck Institute for Empirical Aesthetics, Frankfurt, , , , ,
| |
Collapse
|
12
|
Laffere A, Dick F, Holt LL, Tierney A. Attentional modulation of neural entrainment to sound streams in children with and without ADHD. Neuroimage 2020; 224:117396. [PMID: 32979522 DOI: 10.1016/j.neuroimage.2020.117396] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 08/25/2020] [Accepted: 09/14/2020] [Indexed: 01/06/2023] Open
Abstract
To extract meaningful information from complex auditory scenes like a noisy playground, rock concert, or classroom, children can direct attention to different sound streams. One means of accomplishing this might be to align neural activity with the temporal structure of a target stream, such as a specific talker or melody. However, this may be more difficult for children with ADHD, who can struggle with accurately perceiving and producing temporal intervals. In this EEG study, we found that school-aged children's attention to one of two temporally-interleaved isochronous tone 'melodies' was linked to an increase in phase-locking at the melody's rate, and a shift in neural phase that aligned the neural responses with the attended tone stream. Children's attention task performance and neural phase alignment with the attended melody were linked to performance on temporal production tasks, suggesting that children with more robust control over motor timing were better able to direct attention to the time points associated with the target melody. Finally, we found that although children with ADHD performed less accurately on the tonal attention task than typically developing children, they showed the same degree of attentional modulation of phase locking and neural phase shifts, suggesting that children with ADHD may have difficulty with attentional engagement rather than attentional selection.
Collapse
Affiliation(s)
- Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, United Kingdom
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, United Kingdom; Division of Psychology & Language Sciences, UCL, Gower Street, London, WC1E 6BT, United Kingdom
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, United Kingdom.
| |
Collapse
|
13
|
Static and dynamic cocktail party listening in younger and older adults. Hear Res 2020; 395:108020. [DOI: 10.1016/j.heares.2020.108020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 05/13/2020] [Accepted: 06/11/2020] [Indexed: 11/21/2022]
|
14
|
Valzolgher C, Campus C, Rabini G, Gori M, Pavani F. Updating spatial hearing abilities through multisensory and motor cues. Cognition 2020; 204:104409. [PMID: 32717425 DOI: 10.1016/j.cognition.2020.104409] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Revised: 07/09/2020] [Accepted: 07/09/2020] [Indexed: 10/23/2022]
Abstract
Spatial hearing relies on a series of mechanisms for associating auditory cues with positions in space. When auditory cues are altered, humans, as well as other animals, can update the way they exploit auditory cues and partially compensate for their spatial hearing difficulties. In two experiments, we simulated monaural listening in hearing adults by temporarily plugging and muffing one ear, to assess the effects of active or passive training conditions. During active training, participants moved an audio-bracelet attached to their wrist, while continuously attending to the position of the sounds it produced. During passive training, participants received identical acoustic stimulation and performed exactly the same task, but the audio-bracelet was moved by the experimenter. Before and after training, we measured adaptation to monaural listening in three auditory tasks: single sound localization, minimum audible angle (MAA), spatial and temporal bisection. We also performed the tests twice in an untrained group, which completed the same auditory tasks but received no training. Results showed that participants significantly improved in single sound localization, across 3 consecutive days, but more in the active compared to the passive training group. This reveals that benefits of kinesthetic cues are additive with respect to those of paying attention to the position of sounds and/or seeing their positions when updating spatial hearing. The observed adaptation did not generalize to other auditory spatial tasks (space bisection and MAA), suggesting that partial updating of sound-space correspondences does not extend to all aspects of spatial hearing.
Collapse
Affiliation(s)
- Chiara Valzolgher
- Centro Interdipartimentale Mente/Cervello (CIMeC), University of Trento, Italy; IMPACT, Centre de Recherche en Neurosciences Lyon (CRNL), France.
| | | | - Giuseppe Rabini
- Centro Interdipartimentale Mente/Cervello (CIMeC), University of Trento, Italy
| | - Monica Gori
- Italian Institute of Technology (IIT), Italy
| | - Francesco Pavani
- Centro Interdipartimentale Mente/Cervello (CIMeC), University of Trento, Italy; IMPACT, Centre de Recherche en Neurosciences Lyon (CRNL), France; Department of Psychology and Cognitive Science, Universiy of Trento, Italy
| |
Collapse
|
15
|
Couth S, Prendergast G, Guest H, Munro KJ, Moore DR, Plack CJ, Ginsborg J, Dawes P. Investigating the effects of noise exposure on self-report, behavioral and electrophysiological indices of hearing damage in musicians with normal audiometric thresholds. Hear Res 2020; 395:108021. [PMID: 32631495 DOI: 10.1016/j.heares.2020.108021] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 05/02/2020] [Accepted: 06/11/2020] [Indexed: 01/11/2023]
Abstract
Musicians are at risk of hearing loss due to prolonged noise exposure, but they may also be at risk of early sub-clinical hearing damage, such as cochlear synaptopathy. In the current study, we investigated the effects of noise exposure on electrophysiological, behavioral and self-report correlates of hearing damage in young adult (age range = 18-27 years) musicians and non-musicians with normal audiometric thresholds. Early-career musicians (n = 76) and non-musicians (n = 47) completed a test battery including the Noise Exposure Structured Interview, pure-tone audiometry (PTA; 0.25-8 kHz), extended high-frequency (EHF; 12 and 16 kHz) thresholds, otoacoustic emissions (OAEs), auditory brainstem responses (ABRs), speech perception in noise (SPiN), and self-reported tinnitus, hyperacusis and hearing in noise difficulties. Total lifetime noise exposure was similar between musicians and non-musicians, the majority of which could be accounted for by recreational activities. Musicians showed significantly greater ABR wave I/V ratios than non-musicians and were also more likely to report experience of - and/or more severe - tinnitus, hyperacusis and hearing in noise difficulties, irrespective of noise exposure. A secondary analysis revealed that individuals with the highest levels of noise exposure had reduced outer hair cell function compared to individuals with the lowest levels of noise exposure, as measured by OAEs. OAE level was also related to PTA and EHF thresholds. High levels of noise exposure were also associated with a significant increase in ABR wave V latency, but only for males, and a higher prevalence and severity of hyperacusis. These findings suggest that there may be sub-clinical effects of noise exposure on various hearing metrics even at a relatively young age, but do not support a link between lifetime noise exposure and proxy measures of cochlear synaptopathy such as ABR wave amplitudes and SPiN. Closely monitoring OAEs, PTA and EHF thresholds when conventional PTA is within the clinically 'normal' range could provide a useful early metric of noise-induced hearing damage. This may be particularly relevant to early-career musicians as they progress through a period of intensive musical training, and thus interventions to protect hearing longevity may be vital.
Collapse
Affiliation(s)
- Samuel Couth
- Manchester Centre for Audiology and Deafness, University of Manchester, UK.
| | | | - Hannah Guest
- Manchester Centre for Audiology and Deafness, University of Manchester, UK
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, University of Manchester, UK; Manchester Academic Health Science Centre, Manchester University Hospitals NHS Foundation Trust, UK
| | - David R Moore
- Manchester Centre for Audiology and Deafness, University of Manchester, UK; Communication Sciences Research Center, Cincinnati Children's Hospital Medical Centre, OH, USA
| | - Christopher J Plack
- Manchester Centre for Audiology and Deafness, University of Manchester, UK; Department of Psychology, Lancaster University, UK
| | | | - Piers Dawes
- Manchester Centre for Audiology and Deafness, University of Manchester, UK; Department of Linguistics, Macquarie University, Sydney, Australia
| |
Collapse
|
16
|
Laffere A, Dick F, Tierney A. Effects of auditory selective attention on neural phase: individual differences and short-term training. Neuroimage 2020; 213:116717. [PMID: 32165265 DOI: 10.1016/j.neuroimage.2020.116717] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/02/2020] [Accepted: 03/04/2020] [Indexed: 02/06/2023] Open
Abstract
How does the brain follow a sound that is mixed with others in a noisy environment? One possible strategy is to allocate attention to task-relevant time intervals. Prior work has linked auditory selective attention to alignment of neural modulations with stimulus temporal structure. However, since this prior research used relatively easy tasks and focused on analysis of main effects of attention across participants, relatively little is known about the neural foundations of individual differences in auditory selective attention. Here we investigated individual differences in auditory selective attention by asking participants to perform a 1-back task on a target auditory stream while ignoring a distractor auditory stream presented 180° out of phase. Neural entrainment to the attended auditory stream was strongly linked to individual differences in task performance. Some variability in performance was accounted for by degree of musical training, suggesting a link between long-term auditory experience and auditory selective attention. To investigate whether short-term improvements in auditory selective attention are possible, we gave participants 2 h of auditory selective attention training and found improvements in both task performance and enhancements of the effects of attention on neural phase angle. Our results suggest that although there exist large individual differences in auditory selective attention and attentional modulation of neural phase angle, this skill improves after a small amount of targeted training.
Collapse
Affiliation(s)
- Aeron Laffere
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK; Division of Psychology & Language Sciences, UCL, Gower Street, London, WC1E 6BT, UK
| | - Adam Tierney
- Department of Psychological Sciences, Birkbeck, University of London, Malet Street, London, WC1E 7HX, UK.
| |
Collapse
|
17
|
Domingo Y, Holmes E, Macpherson E, Johnsrude IS. Using spatial release from masking to estimate the magnitude of the familiar-voice intelligibility benefit. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3487. [PMID: 31795686 DOI: 10.1121/1.5133628] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Accepted: 10/23/2019] [Indexed: 06/10/2023]
Abstract
The ability to segregate simultaneous speech streams is crucial for successful communication. Recent studies have demonstrated that participants can report 10%-20% more words spoken by naturally familiar (e.g., friends or spouses) than unfamiliar talkers in two-voice mixtures. This benefit is commensurate with one of the largest benefits to speech intelligibility currently known-that which is gained by spatially separating two talkers. However, because of differences in the methods of these previous studies, the relative benefits of spatial separation and voice familiarity are unclear. Here, the familiar-voice benefit and spatial release from masking are directly compared, and it is examined if and how these two cues interact with one another. Talkers were recorded while speaking sentences from a published closed-set "matrix" task, and then listeners were presented with three different sentences played simultaneously. Each target sentence was played at 0° azimuth, and two masker sentences were symmetrically separated about the target. On average, participants reported 10%-30% more words correctly when the target sentence was spoken in a familiar than unfamiliar voice (collapsed over spatial separation conditions); it was found that participants gain a similar benefit from a familiar target as when an unfamiliar voice is separated from two symmetrical maskers by approximately 15° azimuth.
Collapse
Affiliation(s)
- Ysabel Domingo
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Emma Holmes
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| | - Ewan Macpherson
- School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario, Canada
| | - Ingrid S Johnsrude
- Brain and Mind Institute, University of Western Ontario, London, Ontario, Canada
| |
Collapse
|
18
|
Zobel BH, Wagner A, Sanders LD, Başkent D. Spatial release from informational masking declines with age: Evidence from a detection task in a virtual separation paradigm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:548. [PMID: 31370625 DOI: 10.1121/1.5118240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 06/28/2019] [Indexed: 06/10/2023]
Abstract
Declines in spatial release from informational masking may contribute to the speech-processing difficulties that older adults often experience within complex listening environments. The present study sought to answer two fundamental questions: (1) Does spatial release from informational masking decline with age and, if so, (2) does age predict this decline independently of age-typical hearing loss? Younger (18-34 years) and older (60-80 years) adults with age-typical hearing completed a yes/no target-detection task with low-pass filtered noise-vocoded speech designed to reduce non-spatial segregation cues and control for hearing loss. Participants detected a target voice among two-talker masking babble while a virtual spatial separation paradigm [Freyman, Helfer, McCall, and Clifton, J. Acoust. Soc. Am. 106(6), 3578-3588 (1999)] was used to isolate informational masking release. The younger and older adults both exhibited spatial release from informational masking, but masking release was reduced among the older adults. Furthermore, age predicted this decline controlling for hearing loss, while there was no indication that hearing loss played a role. These findings provide evidence that declines specific to aging limit spatial release from informational masking under challenging listening conditions.
Collapse
Affiliation(s)
- Benjamin H Zobel
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Anita Wagner
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Lisa D Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
19
|
Lin G, Carlile S. The Effects of Switching Non-Spatial Attention During Conversational Turn Taking. Sci Rep 2019; 9:8057. [PMID: 31147609 PMCID: PMC6542845 DOI: 10.1038/s41598-019-44560-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 05/17/2019] [Indexed: 11/09/2022] Open
Abstract
This study examined the effect of a change in target voice on word recall during a multi-talker conversation. Two experiments were conducted using matrix sentences to assess the cost of a single endogenous switch in non-spatial attention. Performance in a yes-no recognition task was significantly worse when a target voice changed compared to when it remained the same after a turn-taking gap. We observed a decrease in target hit rate and sensitivity, and an increase in masker confusion errors following a change in voice. These results highlight the cognitive demands of not only engaging attention on a new talker, but also of disengaging attention from a previous target voice. This shows that exposure to a voice can have a biasing effect on attention that persists well after a turn-taking gap. A second experiment showed that there was no change in switching performance using different talker combinations. This demonstrates that switching costs were consistent and did not depend on the degree of acoustic differences in target voice characteristics.
Collapse
Affiliation(s)
- Gaven Lin
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia.
| | - Simon Carlile
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
20
|
Vaughn CR. Expectations about the source of a speaker's accent affect accent adaptation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3218. [PMID: 31153344 DOI: 10.1121/1.5108831] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Accepted: 04/30/2019] [Indexed: 06/09/2023]
Abstract
When encountering speakers whose accents differ from the listener's own, listeners initially show a processing cost, but that cost can be attenuated after short term exposure. The extent to which processing foreign accents (L2-accents) and within-language accents (L1-accents) is similar is still an open question. This study considers whether listeners' expectations about the source of a speaker's accent-whether the speaker is purported to be an L1 or an L2 speaker-affect intelligibility. Prior work has indirectly manipulated expectations about a speaker's accent through photographs, but the present study primes listeners with a description of the speaker's accent itself. In experiment 1, native English listeners transcribed Spanish-accented English sentences in noise under three different conditions (speaker's accent: monolingual L1 Latinx English, L1-Spanish/L2-English, no information given). Results indicate that, by the end of the experiment, listeners given some information about the accent outperformed listeners given no information, and listeners told the speaker was L1-accented outperformed listeners told to expect L2-accented speech. Findings are interpreted in terms of listeners' expectations about task difficulty, and a follow-up experiment (experiment 2) found that priming listeners to expect that their ability to understand L2-accented speech can improve does in fact improve intelligibility.
Collapse
Affiliation(s)
- Charlotte R Vaughn
- Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, Oregon 97403-1290, USA
| |
Collapse
|
21
|
Sharma NK, Ganesh S, Ganapathy S, Holt LL. Talker change detection: A comparison of human and machine performance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:131. [PMID: 30710945 DOI: 10.1121/1.5084044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 12/01/2018] [Indexed: 06/09/2023]
Abstract
The automatic analysis of conversational audio remains difficult, in part, due to the presence of multiple talkers speaking in turns, often with significant intonation variations and overlapping speech. The majority of prior work on psychoacoustic speech analysis and system design has focused on single-talker speech or multi-talker speech with overlapping talkers (for example, the cocktail party effect). There has been much less focus on how listeners detect a change in talker or in probing the acoustic features significant in characterizing a talker's voice in conversational speech. This study examines human talker change detection (TCD) in multi-party speech utterances using a behavioral paradigm in which listeners indicate the moment of perceived talker change. Human reaction times in this task can be well-estimated by a model of the acoustic feature distance among speech segments before and after a change in talker, with estimation improving for models incorporating longer durations of speech prior to a talker change. Further, human performance is superior to several online and offline state-of-the-art machine TCD systems.
Collapse
Affiliation(s)
- Neeraj Kumar Sharma
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Shobhana Ganesh
- Department of Electrical Engineering, CV Raman Road, Indian Institute of Science, Bangalore 560012, India
| | - Sriram Ganapathy
- Department of Electrical Engineering, CV Raman Road, Indian Institute of Science, Bangalore 560012, India
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
22
|
Kreitewolf J, Mathias SR, Trapeau R, Obleser J, Schönwiesner M. Perceptual grouping in the cocktail party: Contributions of voice-feature continuity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2178. [PMID: 30404485 DOI: 10.1121/1.5058684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/18/2018] [Indexed: 06/08/2023]
Abstract
Cocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features-glottal-pulse rate (GPR) and vocal-tract length (VTL)-to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners' abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.
Collapse
Affiliation(s)
- Jens Kreitewolf
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Samuel R Mathias
- Neurocognition, Neurocomputation and Neurogenetics (n3) Division, Yale University School of Medicine, 40 Temple Street, New Haven, Connecticut 06511, USA
| | - Régis Trapeau
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Straße 9a, D-23562 Lübeck, Germany
| | - Marc Schönwiesner
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| |
Collapse
|
23
|
Van Engen KJ, McLaughlin DJ. Eyes and ears: Using eye tracking and pupillometry to understand challenges to speech recognition. Hear Res 2018; 369:56-66. [PMID: 29801981 DOI: 10.1016/j.heares.2018.04.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2017] [Revised: 04/12/2018] [Accepted: 04/25/2018] [Indexed: 11/16/2022]
Abstract
Although human speech recognition is often experienced as relatively effortless, a number of common challenges can render the task more difficult. Such challenges may originate in talkers (e.g., unfamiliar accents, varying speech styles), the environment (e.g. noise), or in listeners themselves (e.g., hearing loss, aging, different native language backgrounds). Each of these challenges can reduce the intelligibility of spoken language, but even when intelligibility remains high, they can place greater processing demands on listeners. Noisy conditions, for example, can lead to poorer recall for speech, even when it has been correctly understood. Speech intelligibility measures, memory tasks, and subjective reports of listener difficulty all provide critical information about the effects of such challenges on speech recognition. Eye tracking and pupillometry complement these methods by providing objective physiological measures of online cognitive processing during listening. Eye tracking records the moment-to-moment direction of listeners' visual attention, which is closely time-locked to unfolding speech signals, and pupillometry measures the moment-to-moment size of listeners' pupils, which dilate in response to increased cognitive load. In this paper, we review the uses of these two methods for studying challenges to speech recognition.
Collapse
|
24
|
Cueing listeners to attend to a target talker progressively improves word report as the duration of the cue-target interval lengthens to 2,000 ms. Atten Percept Psychophys 2018; 80:1520-1538. [PMID: 29696570 DOI: 10.3758/s13414-018-1531-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Endogenous attention is typically studied by presenting instructive cues in advance of a target stimulus array. For endogenous visual attention, task performance improves as the duration of the cue-target interval increases up to 800 ms. Less is known about how endogenous auditory attention unfolds over time or the mechanisms by which an instructive cue presented in advance of an auditory array improves performance. The current experiment used five cue-target intervals (0, 250, 500, 1,000, and 2,000 ms) to compare four hypotheses for how preparatory attention develops over time in a multi-talker listening task. Young adults were cued to attend to a target talker who spoke in a mixture of three talkers. Visual cues indicated the target talker's spatial location or their gender. Participants directed attention to location and gender simultaneously ("objects") at all cue-target intervals. Participants were consistently faster and more accurate at reporting words spoken by the target talker when the cue-target interval was 2,000 ms than 0 ms. In addition, the latency of correct responses progressively shortened as the duration of the cue-target interval increased from 0 to 2,000 ms. These findings suggest that the mechanisms involved in preparatory auditory attention develop gradually over time, taking at least 2,000 ms to reach optimal configuration, yet providing cumulative improvements in speech intelligibility as the duration of the cue-target interval increases from 0 to 2,000 ms. These results demonstrate an improvement in performance for cue-target intervals longer than those that have been reported previously in the visual or auditory modalities.
Collapse
|
25
|
Guest H, Munro KJ, Prendergast G, Millman RE, Plack CJ. Impaired speech perception in noise with a normal audiogram: No evidence for cochlear synaptopathy and no relation to lifetime noise exposure. Hear Res 2018; 364:142-151. [PMID: 29680183 PMCID: PMC5993872 DOI: 10.1016/j.heares.2018.03.008] [Citation(s) in RCA: 107] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 02/26/2018] [Accepted: 03/06/2018] [Indexed: 02/01/2023]
Abstract
In rodents, noise exposure can destroy synapses between inner hair cells and auditory nerve fibers (“cochlear synaptopathy”) without causing hair cell loss. Noise-induced cochlear synaptopathy usually leaves cochlear thresholds unaltered, but is associated with long-term reductions in auditory brainstem response (ABR) amplitudes at medium-to-high sound levels. This pathophysiology has been suggested to degrade speech perception in noise (SPiN), perhaps explaining why SPiN ability varies so widely among audiometrically normal humans. The present study is the first to test for evidence of cochlear synaptopathy in humans with significant SPiN impairment. Individuals were recruited on the basis of self-reported SPiN difficulties and normal pure tone audiometric thresholds. Performance on a listening task identified a subset with “verified” SPiN impairment. This group was matched with controls on the basis of age, sex, and audiometric thresholds up to 14 kHz. ABRs and envelope-following responses (EFRs) were recorded at high stimulus levels, yielding both raw amplitude measures and within-subject difference measures. Past exposure to high sound levels was assessed by detailed structured interview. Impaired SPiN was not associated with greater lifetime noise exposure, nor with any electrophysiological measure. It is conceivable that retrospective self-report cannot reliably capture noise exposure, and that ABRs and EFRs offer limited sensitivity to synaptopathy in humans. Nevertheless, the results do not support the notion that noise-induced synaptopathy is a significant etiology of SPiN impairment with normal audiometric thresholds. It may be that synaptopathy alone does not have significant perceptual consequences, or is not widespread in humans with normal audiograms. Study of adults with impaired speech perception in noise (SPiN) and normal audiograms. A subset of those with reported SPiN impairment exhibited measurable SPiN deficits. SPiN-impaired participants were matched with controls for age, sex, and audiogram. Impaired SPiN was not associated with ABR or EFR measures of cochlear synaptopathy. Impaired SPiN was not associated with a detailed measure of lifetime noise exposure.
Collapse
Affiliation(s)
- Hannah Guest
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, UK.
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, UK
| | - Garreth Prendergast
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, UK
| | - Rebecca E Millman
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, UK
| | - Christopher J Plack
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, UK; Department of Psychology, Lancaster University, UK
| |
Collapse
|
26
|
Benefits to Speech Perception in Noise From the Binaural Integration of Electric and Acoustic Signals in Simulated Unilateral Deafness. Ear Hear 2018; 37:248-59. [PMID: 27116049 PMCID: PMC4847646 DOI: 10.1097/aud.0000000000000252] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. DESIGN Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. RESULTS Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural performance (50%), binaural integration advantages were found regardless of whether a mismatch was simulated or not. When the CI-simulation ear supported a superior level of monaural performance (71%), evidence of binaural integration was absent when a mismatch was simulated using both the Realistic and the Ideal processing strategies. This absence of integration could not be accounted for by ceiling effects or by changes in SNR. CONCLUSIONS If generalizable to unilaterally deaf CI users, the results of the current simulation study would suggest that benefits to speech perception in noise can be obtained by integrating information from an implanted ear and an NH ear. A mismatch in the delivery of spectral information between the ears due to a misalignment in the mapping of frequency to place may disrupt binaural integration in situations where both ears cannot support a similar level of monaural speech understanding. Previous studies that have measured the speech perception of unilaterally deaf individuals after CI but with nonindividualized frequency-to-electrode allocations may therefore have underestimated the potential benefits of providing binaural hearing. However, it remains unclear whether the size and nature of the potential incremental benefits from individualized allocations are sufficient to justify the time and resources required to derive them based on cochlear imaging or pitch-matching tasks.
Collapse
|
27
|
Rowland SC, Hartley DEH, Wiggins IM. Listening in Naturalistic Scenes: What Can Functional Near-Infrared Spectroscopy and Intersubject Correlation Analysis Tell Us About the Underlying Brain Activity? Trends Hear 2018; 22:2331216518804116. [PMID: 30345888 PMCID: PMC6198387 DOI: 10.1177/2331216518804116] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 08/17/2018] [Accepted: 09/06/2018] [Indexed: 12/24/2022] Open
Abstract
Listening to speech in the noisy conditions of everyday life can be effortful, reflecting the increased cognitive workload involved in extracting meaning from a degraded acoustic signal. Studying the underlying neural processes has the potential to provide mechanistic insight into why listening is effortful under certain conditions. In a move toward studying listening effort under ecologically relevant conditions, we used the silent and flexible neuroimaging technique functional near-infrared spectroscopy (fNIRS) to examine brain activity during attentive listening to speech in naturalistic scenes. Thirty normally hearing participants listened to a series of narratives continuously varying in acoustic difficulty while undergoing fNIRS imaging. Participants then listened to another set of closely matched narratives and rated perceived effort and intelligibility for each scene. As expected, self-reported effort generally increased with worsening signal-to-noise ratio. After controlling for better-ear signal-to-noise ratio, perceived effort was greater in scenes that contained competing speech than in those that did not, potentially reflecting an additional cognitive cost of overcoming informational masking. We analyzed the fNIRS data using intersubject correlation, a data-driven approach suitable for analyzing data collected under naturalistic conditions. Significant intersubject correlation was seen in the bilateral auditory cortices and in a range of channels across the prefrontal cortex. The involvement of prefrontal regions is consistent with the notion that higher order cognitive processes are engaged during attentive listening to speech in complex real-world conditions. However, further research is needed to elucidate the relationship between perceived listening effort and activity in these extended cortical networks.
Collapse
Affiliation(s)
- Stephen C. Rowland
- National Institute for Health Research Nottingham Biomedical Research Centre, UK
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, UK
| | - Douglas E. H. Hartley
- National Institute for Health Research Nottingham Biomedical Research Centre, UK
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, UK
- Medical Research Council Institute of Hearing Research, School of Medicine, University of Nottingham, UK
- Nottingham University Hospitals NHS Trust, Queens Medical Centre, UK
| | - Ian M. Wiggins
- National Institute for Health Research Nottingham Biomedical Research Centre, UK
- Hearing Sciences, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, UK
- Medical Research Council Institute of Hearing Research, School of Medicine, University of Nottingham, UK
| |
Collapse
|
28
|
Prendergast G, Millman RE, Guest H, Munro KJ, Kluk K, Dewey RS, Hall DA, Heinz MG, Plack CJ. Effects of noise exposure on young adults with normal audiograms II: Behavioral measures. Hear Res 2017; 356:74-86. [PMID: 29126651 PMCID: PMC5714059 DOI: 10.1016/j.heares.2017.10.007] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/10/2017] [Revised: 10/17/2017] [Accepted: 10/23/2017] [Indexed: 12/24/2022]
Abstract
An estimate of lifetime noise exposure was used as the primary predictor of performance on a range of behavioral tasks: frequency and intensity difference limens, amplitude modulation detection, interaural phase discrimination, the digit triplet speech test, the co-ordinate response speech measure, an auditory localization task, a musical consonance task and a subjective report of hearing ability. One hundred and thirty-eight participants (81 females) aged 18-36 years were tested, with a wide range of self-reported noise exposure. All had normal pure-tone audiograms up to 8 kHz. It was predicted that increased lifetime noise exposure, which we assume to be concordant with noise-induced cochlear synaptopathy, would elevate behavioral thresholds, in particular for stimuli with high levels in a high spectral region. However, the results showed little effect of noise exposure on performance. There were a number of weak relations with noise exposure across the test battery, although many of these were in the opposite direction to the predictions, and none were statistically significant after correction for multiple comparisons. There were also no strong correlations between electrophysiological measures of synaptopathy published previously and the behavioral measures reported here. Consistent with our previous electrophysiological results, the present results provide no evidence that noise exposure is related to significant perceptual deficits in young listeners with normal audiometric hearing. It is possible that the effects of noise-induced cochlear synaptopathy are only measurable in humans with extreme noise exposures, and that these effects always co-occur with a loss of audiometric sensitivity.
Collapse
Affiliation(s)
- Garreth Prendergast
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK.
| | - Rebecca E Millman
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| | - Hannah Guest
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| | - Karolina Kluk
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK
| | - Rebecca S Dewey
- Sir Peter Mansfield Imaging Centre, School of Physics and Astronomy, University of Nottingham Nottingham, NG7 2RD, UK; National Institute for Health Research (NIHR) Nottingham Biomedical Research Centre, Nottingham, NG1 5DU, UK; Otology and Hearing Group, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, UK
| | - Deborah A Hall
- National Institute for Health Research (NIHR) Nottingham Biomedical Research Centre, Nottingham, NG1 5DU, UK; Otology and Hearing Group, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Nottingham, NG7 2UH, UK
| | - Michael G Heinz
- Department of Speech, Language, & Hearing Sciences and Biomedical Engineering, Purdue University, West Lafayette, IN, 47907, USA
| | - Christopher J Plack
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester Academic Health Science Centre, M13 9PL, UK; NIHR Manchester Biomedical Research Centre, Central Manchester University Hospitals NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, M13 9WL, UK; Department of Psychology, Lancaster University, Lancaster, LA1 4YF, UK
| |
Collapse
|
29
|
Kreitewolf J, Mathias SR, von Kriegstein K. Implicit Talker Training Improves Comprehension of Auditory Speech in Noise. Front Psychol 2017; 8:1584. [PMID: 28959226 PMCID: PMC5603660 DOI: 10.3389/fpsyg.2017.01584] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2017] [Accepted: 08/29/2017] [Indexed: 11/13/2022] Open
Abstract
Previous studies have shown that listeners are better able to understand speech when they are familiar with the talker's voice. In most of these studies, talker familiarity was ensured by explicit voice training; that is, listeners learned to identify the familiar talkers. In the real world, however, the characteristics of familiar talkers are learned incidentally, through communication. The present study investigated whether speech comprehension benefits from implicit voice training; that is, through exposure to talkers' voices without listeners explicitly trying to identify them. During four training sessions, listeners heard short sentences containing a single verb (e.g., "he writes"), spoken by one talker. The sentences were mixed with noise, and listeners identified the verb within each sentence while their speech-reception thresholds (SRT) were measured. In a final test session, listeners performed the same task, but this time they heard different sentences spoken by the familiar talker and three unfamiliar talkers. Familiar and unfamiliar talkers were counterbalanced across listeners. Half of the listeners performed a test session in which the four talkers were presented in separate blocks (blocked paradigm). For the other half, talkers varied randomly from trial to trial (interleaved paradigm). The results showed that listeners had lower SRT when the speech was produced by the familiar talker than the unfamiliar talkers. The type of talker presentation (blocked vs. interleaved) had no effect on this familiarity benefit. These findings suggest that listeners implicitly learn talker-specific information during a speech-comprehension task, and exploit this information to improve the comprehension of novel speech material from familiar talkers.
Collapse
Affiliation(s)
- Jens Kreitewolf
- Department of Psychology, University of LübeckLübeck, Germany.,Max Planck Institute for Human Cognitive and Brain SciencesLeipzig, Germany
| | - Samuel R Mathias
- Department of Psychiatry, Yale University, New HavenCT, United States
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain SciencesLeipzig, Germany.,Department of Psychology, Humboldt University of BerlinBerlin, Germany
| |
Collapse
|
30
|
Koelewijn T, Versfeld NJ, Kramer SE. Effects of attention on the speech reception threshold and pupil response of people with impaired and normal hearing. Hear Res 2017; 354:56-63. [PMID: 28869841 DOI: 10.1016/j.heares.2017.08.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2017] [Revised: 08/21/2017] [Accepted: 08/25/2017] [Indexed: 11/26/2022]
Abstract
For people with hearing difficulties, following a conversation in a noisy environment requires substantial cognitive processing, which is often perceived as effortful. Recent studies with normal hearing (NH) listeners showed that the pupil dilation response, a measure of cognitive processing load, is affected by 'attention related' processes. How these processes affect the pupil dilation response for hearing impaired (HI) listeners remains unknown. Therefore, the current study investigated the effect of auditory attention on various pupil response parameters for 15 NH adults (median age 51 yrs.) and 15 adults with mild to moderate sensorineural hearing loss (median age 52 yrs.). Both groups listened to two different sentences presented simultaneously, one to each ear and partially masked by stationary noise. Participants had to repeat either both sentences or only one, for which they had to divide or focus attention, respectively. When repeating one sentence, the target sentence location (left or right) was either randomized or blocked across trials, which in the latter case allowed for a better spatial focus of attention. The speech-to-noise ratio was adjusted to yield about 50% sentences correct for each task and condition. NH participants had lower ('better') speech reception thresholds (SRT) than HI participants. The pupil measures showed no between-group effects, with the exception of a shorter peak latency for HI participants, which indicated a shorter processing time. Both groups showed higher SRTs and a larger pupil dilation response when two sentences were processed instead of one. Additionally, SRTs were higher and dilation responses were larger for both groups when the target location was randomized instead of fixed. We conclude that although HI participants could cope with less noise than the NH group, their ability to focus attention on a single talker, thereby improving SRTs and lowering cognitive processing load, was preserved. Shorter peak latencies could indicate that HI listeners adapt their listening strategy by not processing some information, which reduces processing time and thereby listening effort.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, The Netherlands.
| | - Niek J Versfeld
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, The Netherlands
| | - Sophia E Kramer
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and Amsterdam Public Health Research Institute, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
31
|
Semeraro HD, Rowan D, van Besouw RM, Allsopp AA. Development and evaluation of the British English coordinate response measure speech-in-noise test as an occupational hearing assessment tool. Int J Audiol 2017; 56:749-758. [PMID: 28537138 DOI: 10.1080/14992027.2017.1317370] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVE The studies described in this article outline the design and development of a British English version of the coordinate response measure (CRM) speech-in-noise (SiN) test. Our interest in the CRM is as a SiN test with high face validity for occupational auditory fitness for duty (AFFD) assessment. DESIGN Study 1 used the method of constant stimuli to measure and adjust the psychometric functions of each target word, producing a speech corpus with equal intelligibility. After ensuring all the target words had similar intelligibility, for Studies 2 and 3, the CRM was presented in an adaptive procedure in stationary speech-spectrum noise to measure speech reception thresholds and evaluate the test-retest reliability of the CRM SiN test. STUDY SAMPLE Studies 1 (n = 20) and 2 (n = 30) were completed by normal-hearing civilians. Study 3 (n = 22) was completed by hearing impaired military personnel. RESULTS The results display good test-retest reliability (95% confidence interval (CI) < 2.1 dB) and concurrent validity when compared to the triple-digit test (r ≤ 0.65), and the CRM is sensitive to hearing impairment. CONCLUSION The British English CRM using stationary speech-spectrum noise is a "ready to use" SiN test, suitable for investigation as an AFFD assessment tool for military personnel.
Collapse
Affiliation(s)
- Hannah D Semeraro
- a Institute of Sound and Vibration Research, University of Southampton , Southampton , UK and
| | - Daniel Rowan
- a Institute of Sound and Vibration Research, University of Southampton , Southampton , UK and
| | - Rachel M van Besouw
- a Institute of Sound and Vibration Research, University of Southampton , Southampton , UK and
| | | |
Collapse
|
32
|
Peripheral hearing loss reduces the ability of children to direct selective attention during multi-talker listening. Hear Res 2017; 350:160-172. [PMID: 28505526 DOI: 10.1016/j.heares.2017.05.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Revised: 04/28/2017] [Accepted: 05/08/2017] [Indexed: 11/23/2022]
Abstract
Restoring normal hearing requires knowledge of how peripheral and central auditory processes are affected by hearing loss. Previous research has focussed primarily on peripheral changes following sensorineural hearing loss, whereas consequences for central auditory processing have received less attention. We examined the ability of hearing-impaired children to direct auditory attention to a voice of interest (based on the talker's spatial location or gender) in the presence of a common form of background noise: the voices of competing talkers (i.e. during multi-talker, or "Cocktail Party" listening). We measured brain activity using electro-encephalography (EEG) when children prepared to direct attention to the spatial location or gender of an upcoming target talker who spoke in a mixture of three talkers. Compared to normally-hearing children, hearing-impaired children showed significantly less evidence of preparatory brain activity when required to direct spatial attention. This finding is consistent with the idea that hearing-impaired children have a reduced ability to prepare spatial attention for an upcoming talker. Moreover, preparatory brain activity was not restored when hearing-impaired children listened with their acoustic hearing aids. An implication of these findings is that steps to improve auditory attention alongside acoustic hearing aids may be required to improve the ability of hearing-impaired children to understand speech in the presence of competing talkers.
Collapse
|
33
|
Getzmann S, Wascher E. Visually guided auditory attention in a dynamic “cocktail-party” speech perception task: ERP evidence for age-related differences. Hear Res 2017; 344:98-108. [DOI: 10.1016/j.heares.2016.11.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Revised: 10/20/2016] [Accepted: 11/03/2016] [Indexed: 10/20/2022]
|
34
|
Switching of auditory attention in "cocktail-party" listening: ERP evidence of cueing effects in younger and older adults. Brain Cogn 2016; 111:1-12. [PMID: 27814564 DOI: 10.1016/j.bandc.2016.09.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 06/28/2016] [Accepted: 09/13/2016] [Indexed: 11/23/2022]
Abstract
Verbal communication in a "cocktail-party situation" is a major challenge for the auditory system. In particular, changes in target speaker usually result in declined speech perception. Here, we investigated whether speech cues indicating a subsequent change in target speaker reduce the costs of switching in younger and older adults. We employed event-related potential (ERP) measures and a speech perception task, in which sequences of short words were simultaneously presented by four speakers. Changes in target speaker were either unpredictable or semantically cued by a word within the target stream. Cued changes resulted in a less decreased performance than uncued changes in both age groups. The ERP analysis revealed shorter latencies in the change-related N400 and late positive complex (LPC) after cued changes, suggesting an acceleration in context updating and attention switching. Thus, both younger and older listeners used semantic cues to prepare changes in speaker setting.
Collapse
|
35
|
Oberfeld D, Klöckner-Nowotny F. Individual differences in selective attention predict speech identification at a cocktail party. eLife 2016; 5:e16747. [PMID: 27580272 PMCID: PMC5441891 DOI: 10.7554/elife.16747] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 08/08/2016] [Indexed: 11/13/2022] Open
Abstract
Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
Collapse
Affiliation(s)
- Daniel Oberfeld
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| | - Felicitas Klöckner-Nowotny
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität, Mainz, Germany
| |
Collapse
|
36
|
Holmes E, Kitterick PT, Summerfield AQ. EEG activity evoked in preparation for multi-talker listening by adults and children. Hear Res 2016; 336:83-100. [PMID: 27178442 DOI: 10.1016/j.heares.2016.04.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/13/2016] [Revised: 04/04/2016] [Accepted: 04/28/2016] [Indexed: 12/01/2022]
Abstract
Selective attention is critical for successful speech perception because speech is often encountered in the presence of other sounds, including the voices of competing talkers. Faced with the need to attend selectively, listeners perceive speech more accurately when they know characteristics of upcoming talkers before they begin to speak. However, the neural processes that underlie the preparation of selective attention for voices are not fully understood. The current experiments used electroencephalography (EEG) to investigate the time course of brain activity during preparation for an upcoming talker in young adults aged 18-27 years with normal hearing (Experiments 1 and 2) and in typically-developing children aged 7-13 years (Experiment 3). Participants reported key words spoken by a target talker when an opposite-gender distractor talker spoke simultaneously. The two talkers were presented from different spatial locations (±30° azimuth). Before the talkers began to speak, a visual cue indicated either the location (left/right) or the gender (male/female) of the target talker. Adults evoked preparatory EEG activity that started shortly after (<50 ms) the visual cue was presented and was sustained until the talkers began to speak. The location cue evoked similar preparatory activity in Experiments 1 and 2 with different samples of participants. The gender cue did not evoke preparatory activity when it predicted gender only (Experiment 1) but did evoke preparatory activity when it predicted the identity of a specific talker with greater certainty (Experiment 2). Location cues evoked significant preparatory EEG activity in children but gender cues did not. The results provide converging evidence that listeners evoke consistent preparatory brain activity for selecting a talker by their location (regardless of their gender or identity), but not by their gender alone.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Psychology, University of York, UK.
| | - Padraig T Kitterick
- NIHR Nottingham Hearing Biomedical Research Unit, UK; Division of Clinical Neuroscience, School of Medicine, University of Nottingham, UK
| | - A Quentin Summerfield
- Department of Psychology, University of York, UK; Hull York Medical School, University of York, UK
| |
Collapse
|
37
|
Fielden CA, Kitterick PT. Contralateral acoustic hearing aid use in adult unilateral cochlear implant recipients: Current provision, practice, and clinical experience in the UK. Cochlear Implants Int 2016; 17:132-45. [PMID: 27078521 DOI: 10.1080/14670100.2016.1162382] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
OBJECTIVES The study surveyed practising cochlear implant (CI) audiologists with the aim of: (1) characterizing UK clinical practice around the management and fitting of a contralateral hearing aid (HA) in adult unilateral CI users ('bimodal aiding'); (2) identifying factors that may limit the provision of bimodal aiding; and (3) ascertaining the views of audiologists on bimodal aiding. METHODS An online survey was distributed to audiologists working at the 20 centres providing implantation services to adults in the UK. RESULTS Responses were received from 19 of the 20 centres. The majority of centres reported evaluating HAs as part of the candidacy assessment for cochlear implantation. However, a majority also indicated that they do not take responsibility for the contralateral HA following implantation, despite identifying few practical limiting factors. Bimodal aiding was viewed as more beneficial than wearing the implant alone, with most respondents actively encouraging bimodal listening where possible. Respondents reported that fitting bimodal devices to take account of each other's settings was potentially more beneficial than independently fit devices, but such sympathetic fitting was not routine practice in any centre. DISCUSSION The results highlight some potential inconsistencies in the provision of bimodal aiding across the UK as reported by practising audiologists. The views of audiologists about what is best practice appear to be at odds with the nature and structure of the services currently offered. CONCLUSION Stronger evidence that bimodal aiding can be beneficial for UK patients would be required in order for service providers to justify the routine provision of bimodal aiding and to inform guidelines to shape routine clinical practice.
Collapse
Affiliation(s)
- Claire A Fielden
- a NIHR Nottingham Hearing Biomedical Research Unit , 113 The Ropewalk, Nottingham NG1 5DU , UK.,b Nottingham University Hospitals NHS Trust, Queen's Medical Centre , Nottingham NG7 2UH , UK.,c Midlands Hearing Implant Programme, Nuffield House , Queen Elizabeth Hospital , Birmingham B15 2TH , UK
| | - Pádraig T Kitterick
- a NIHR Nottingham Hearing Biomedical Research Unit , 113 The Ropewalk, Nottingham NG1 5DU , UK.,d Otology and Hearing Group, Division of Clinical Neuroscience, School of Medicine , University of Nottingham NG7 2RD , UK
| |
Collapse
|
38
|
Samson F, Johnsrude IS. Effects of a consistent target or masker voice on target speech intelligibility in two- and three-talker mixtures. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:1037-1046. [PMID: 27036241 DOI: 10.1121/1.4942589] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
When the spatial location or identity of a sound is held constant, it is not masked as effectively by competing sounds. This suggests that experience with a particular voice over time might facilitate perceptual organization in multitalker environments. The current study examines whether listeners benefit from experience with a voice only when it is the target, or also when it is a masker, using diotic presentation and a closed-set task (coordinate response measure). A reliable interaction was observed such that, in two-talker mixtures, consistency of masker or target voice over 3-7 trials significantly benefited target recognition performance, whereas in three-talker mixtures, target, but not masker, consistency was beneficial. Overall, this work suggests that voice consistency improves intelligibility, although somewhat differently when two talkers, compared to three talkers, are present, suggesting that consistent-voice information facilitates intelligibility in at least two different ways. Listeners can use a template-matching strategy to extract a known voice from a mixture when it is the target. However, consistent-voice information facilitates segregation only when two, but not three, talkers are present.
Collapse
Affiliation(s)
- Fabienne Samson
- Department of Psychology, The Brain and Mind Institute, Natural Sciences Center, Room 227, The University of Western Ontario, London, Ontario, N6A 5B7, Canada
| | - Ingrid S Johnsrude
- Department of Psychology, The Brain and Mind Institute, Natural Sciences Center, Room 227, The University of Western Ontario, London, Ontario, N6A 5B7, Canada
| |
Collapse
|
39
|
Lin G, Carlile S. Costs of switching auditory spatial attention in following conversational turn-taking. Front Neurosci 2015; 9:124. [PMID: 25941466 PMCID: PMC4403343 DOI: 10.3389/fnins.2015.00124] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2015] [Accepted: 03/26/2015] [Indexed: 11/17/2022] Open
Abstract
Following a multi-talker conversation relies on the ability to rapidly and efficiently shift the focus of spatial attention from one talker to another. The current study investigated the listening costs associated with shifts in spatial attention during conversational turn-taking in 16 normally-hearing listeners using a novel sentence recall task. Three pairs of syntactically fixed but semantically unpredictable matrix sentences, recorded from a single male talker, were presented concurrently through an array of three loudspeakers (directly ahead and +/−30° azimuth). Subjects attended to one spatial location, cued by a tone, and followed the target conversation from one sentence to the next using the call-sign at the beginning of each sentence. Subjects were required to report the last three words of each sentence (speech recall task) or answer multiple choice questions related to the target material (speech comprehension task). The reading span test, attention network test, and trail making test were also administered to assess working memory, attentional control, and executive function. There was a 10.7 ± 1.3% decrease in word recall, a pronounced primacy effect, and a rise in masker confusion errors and word omissions when the target switched location between sentences. Switching costs were independent of the location, direction, and angular size of the spatial shift but did appear to be load dependent and only significant for complex questions requiring multiple cognitive operations. Reading span scores were positively correlated with total words recalled, and negatively correlated with switching costs and word omissions. Task switching speed (Trail-B time) was also significantly correlated with recall accuracy. Overall, this study highlights (i) the listening costs associated with shifts in spatial attention and (ii) the important role of working memory in maintaining goal relevant information and extracting meaning from dynamic multi-talker conversations.
Collapse
Affiliation(s)
- Gaven Lin
- Auditory Neuroscience Laboratory, Department of Physiology, School of Medical Sciences, University of Sydney Sydney, NSW, Australia
| | - Simon Carlile
- Auditory Neuroscience Laboratory, Department of Physiology, School of Medical Sciences, University of Sydney Sydney, NSW, Australia
| |
Collapse
|
40
|
Carlile S, Corkhill C. Selective spatial attention modulates bottom-up informational masking of speech. Sci Rep 2015; 5:8662. [PMID: 25727100 PMCID: PMC4345314 DOI: 10.1038/srep08662] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2014] [Accepted: 01/28/2015] [Indexed: 11/09/2022] Open
Abstract
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Collapse
Affiliation(s)
- Simon Carlile
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, NSW 2006, Australia
| | - Caitlin Corkhill
- School of Medical Sciences, University of Sydney, Sydney, NSW 2006 Australia
| |
Collapse
|
41
|
Koelewijn T, de Kluiver H, Shinn-Cunningham BG, Zekveld AA, Kramer SE. The pupil response reveals increased listening effort when it is difficult to focus attention. Hear Res 2015; 323:81-90. [PMID: 25732724 PMCID: PMC4632994 DOI: 10.1016/j.heares.2015.02.004] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2014] [Revised: 02/05/2015] [Accepted: 02/16/2015] [Indexed: 12/04/2022]
Abstract
Recent studies have shown that prior knowledge about where, when, and who is going to talk improves speech intelligibility. How related attentional processes affect cognitive processing load has not been investigated yet. In the current study, three experiments investigated how the pupil dilation response is affected by prior knowledge of target speech location, target speech onset, and who is going to talk. A total of 56 young adults with normal hearing participated. They had to reproduce a target sentence presented to one ear while ignoring a distracting sentence simultaneously presented to the other ear. The two sentences were independently masked by fluctuating noise. Target location (left or right ear), speech onset, and talker variability were manipulated in separate experiments by keeping these features either fixed during an entire block or randomized over trials. Pupil responses were recorded during listening and performance was scored after recall. The results showed an improvement in performance when the location of the target speech was fixed instead of randomized. Additionally, location uncertainty increased the pupil dilation response, which suggests that prior knowledge of location reduces cognitive load. Interestingly, the observed pupil responses for each condition were consistent with subjective reports of listening effort. We conclude that communicating in a dynamic environment like a cocktail party (where participants in competing conversations move unpredictably) requires substantial listening effort because of the demands placed on attentional processes.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands.
| | - Hilde de Kluiver
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| | - Barbara G Shinn-Cunningham
- Department of Biomedical Engineering, Center for Computational Neuroscience and Neural Technology, Boston University, Boston, USA
| | - Adriana A Zekveld
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands; Linnaeus Centre HEAD, Department of Behavioral Sciences and Learning, Linköping University, Linköping, Sweden
| | - Sophia E Kramer
- Section Ear & Hearing, Department of Otolaryngology-Head and Neck Surgery and EMGO Institute for Health and Care Research, VU University Medical Center, Amsterdam, The Netherlands
| |
Collapse
|
42
|
Getzmann S, Lewald J, Falkenstein M. Using auditory pre-information to solve the cocktail-party problem: electrophysiological evidence for age-specific differences. Front Neurosci 2014; 8:413. [PMID: 25540608 PMCID: PMC4261705 DOI: 10.3389/fnins.2014.00413] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 11/24/2014] [Indexed: 11/13/2022] Open
Abstract
Speech understanding in complex and dynamic listening environments requires (a) auditory scene analysis, namely auditory object formation and segregation, and (b) allocation of the attentional focus to the talker of interest. There is evidence that pre-information is actively used to facilitate these two aspects of the so-called “cocktail-party” problem. Here, a simulated multi-talker scenario was combined with electroencephalography to study scene analysis and allocation of attention in young and middle-aged adults. Sequences of short words (combinations of brief company names and stock-price values) from four talkers at different locations were simultaneously presented, and the detection of target names and the discrimination between critical target values were assessed. Immediately prior to speech sequences, auditory pre-information was provided via cues that either prepared auditory scene analysis or attentional focusing, or non-specific pre-information was given. While performance was generally better in younger than older participants, both age groups benefited from auditory pre-information. The analysis of the cue-related event-related potentials revealed age-specific differences in the use of pre-cues: Younger adults showed a pronounced N2 component, suggesting early inhibition of concurrent speech stimuli; older adults exhibited a stronger late P3 component, suggesting increased resource allocation to process the pre-information. In sum, the results argue for an age-specific utilization of auditory pre-information to improve listening in complex dynamic auditory environments.
Collapse
Affiliation(s)
- Stephan Getzmann
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany
| | - Jörg Lewald
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany ; Faculty of Psychology, Ruhr-University Bochum Bochum, Germany
| | - Michael Falkenstein
- Aging Research Group, Leibniz Research Centre for Working Environment and Human Factors, Technical University of Dortmund (IfADo) Dortmund, Germany
| |
Collapse
|
43
|
Koch I, Lawo V. The flip side of the auditory spatial selection benefit: larger attentional mixing costs for target selection by ear than by gender in auditory task switching. Exp Psychol 2014; 62:66-74. [PMID: 25384645 DOI: 10.1027/1618-3169/a000274] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
In cued auditory task switching, one of two dichotically presented number words, spoken by a female and a male, had to be judged according to its numerical magnitude. One experimental group selected targets by speaker gender and another group by ear of presentation. In mixed-task blocks, the target-defining feature (male/female vs. left/right) was cued prior to each trial, but in pure blocks it remained constant. Compared to selection by gender, selection by ear led to better performance in pure blocks than in mixed blocks, resulting in larger "global" mixing costs for ear-based selection. Selection by ear also led to larger "local" switch costs in mixed blocks, but this finding was partially mediated by differential cue-repetition benefits. Together, the data suggest that requirements of attention shifting diminish the auditory spatial selection benefit.
Collapse
Affiliation(s)
- Iring Koch
- Institute of Psychology, RWTH Aachen University, Aachen, Germany
| | - Vera Lawo
- Institute of Psychology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
44
|
Lawo V, Fels J, Oberem J, Koch I. Intentional attention switching in dichotic listening: Exploring the efficiency of nonspatial and spatial selection. Q J Exp Psychol (Hove) 2014; 67:2010-24. [DOI: 10.1080/17470218.2014.898079] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Using an auditory variant of task switching, we examined the ability to intentionally switch attention in a dichotic-listening task. In our study, participants responded selectively to one of two simultaneously presented auditory number words (spoken by a female and a male, one for each ear) by categorizing its numerical magnitude. The mapping of gender (female vs. male) and ear (left vs. right) was unpredictable. The to-be-attended feature for gender or ear, respectively, was indicated by a visual selection cue prior to auditory stimulus onset. In Experiment 1, explicitly cued switches of the relevant feature dimension (e.g., from gender to ear) and switches of the relevant feature within a dimension (e.g., from male to female) occurred in an unpredictable manner. We found large performance costs when the relevant feature switched, but switches of the relevant feature dimension incurred only small additional costs. The feature-switch costs were larger in ear-relevant than in gender-relevant trials. In Experiment 2, we replicated these findings using a simplified design (i.e., only within-dimension switches with blocked dimensions). In Experiment 3, we examined preparation effects by manipulating the cueing interval and found a preparation benefit only when ear was cued. Together, our data suggest that the large part of attentional switch costs arises from reconfiguration at the level of relevant auditory features (e.g., left vs. right) rather than feature dimensions (ear vs. gender). Additionally, our findings suggest that ear-based target selection benefits more from preparation time (i.e., time to direct attention to one ear) than gender-based target selection.
Collapse
Affiliation(s)
- Vera Lawo
- Institute of Psychology, RWTH Aachen University, Aachen, Germany
| | - Janina Fels
- Institute of Technical Acoustics, RWTH Aachen University, Aachen, Germany
| | - Josefa Oberem
- Institute of Technical Acoustics, RWTH Aachen University, Aachen, Germany
| | - Iring Koch
- Institute of Psychology, RWTH Aachen University, Aachen, Germany
| |
Collapse
|
45
|
Bendixen A. Predictability effects in auditory scene analysis: a review. Front Neurosci 2014; 8:60. [PMID: 24744695 PMCID: PMC3978260 DOI: 10.3389/fnins.2014.00060] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 03/14/2014] [Indexed: 12/02/2022] Open
Abstract
Many sound sources emit signals in a predictable manner. The idea that predictability can be exploited to support the segregation of one source's signal emissions from the overlapping signals of other sources has been expressed for a long time. Yet experimental evidence for a strong role of predictability within auditory scene analysis (ASA) has been scarce. Recently, there has been an upsurge in experimental and theoretical work on this topic resulting from fundamental changes in our perspective on how the brain extracts predictability from series of sensory events. Based on effortless predictive processing in the auditory system, it becomes more plausible that predictability would be available as a cue for sound source decomposition. In the present contribution, empirical evidence for such a role of predictability in ASA will be reviewed. It will be shown that predictability affects ASA both when it is present in the sound source of interest (perceptual foreground) and when it is present in other sound sources that the listener wishes to ignore (perceptual background). First evidence pointing toward age-related impairments in the latter capacity will be addressed. Moreover, it will be illustrated how effects of predictability can be shown by means of objective listening tests as well as by subjective report procedures, with the latter approach typically exploiting the multi-stable nature of auditory perception. Critical aspects of study design will be delineated to ensure that predictability effects can be unambiguously interpreted. Possible mechanisms for a functional role of predictability within ASA will be discussed, and an analogy with the old-plus-new heuristic for grouping simultaneous acoustic signals will be suggested.
Collapse
Affiliation(s)
- Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all," European Medical School, Carl von Ossietzky University of Oldenburg Oldenburg, Germany
| |
Collapse
|
46
|
Kreitewolf J, Gaudrain E, von Kriegstein K. A neural mechanism for recognizing speech spoken by different speakers. Neuroimage 2014; 91:375-85. [PMID: 24434677 DOI: 10.1016/j.neuroimage.2014.01.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2013] [Revised: 11/28/2013] [Accepted: 01/03/2014] [Indexed: 11/18/2022] Open
Abstract
Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers.
Collapse
Affiliation(s)
- Jens Kreitewolf
- Max Planck Institute for Human Cognitive and Brain Sciences, Max Planck Research Group Neural Mechanisms of Human Communication, D-04103 Leipzig, Germany.
| | - Etienne Gaudrain
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology/Head and Neck Surgery, 9700 RB Groningen, Netherlands; University of Groningen, Graduate School of Medical Sciences, Research School of Behavioural and Cognitive Neurosciences, 9713 GZ Groningen, Netherlands
| | - Katharina von Kriegstein
- Max Planck Institute for Human Cognitive and Brain Sciences, Max Planck Research Group Neural Mechanisms of Human Communication, D-04103 Leipzig, Germany; Humboldt University of Berlin, Psychology Department, D-12489 Berlin, Germany
| |
Collapse
|
47
|
Nielsen JB, Dau T, Neher T. A Danish open-set speech corpus for competing-speech studies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:407-420. [PMID: 24437781 DOI: 10.1121/1.4835935] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed in a setup with a frontal target sentence and two concurrent masker sentences at ±50° azimuth. For a group of 16 normal-hearing listeners and a group of 15 elderly (linearly aided) hearing-impaired listeners, overall SRTs of, respectively, +1.3 dB and +6.3 dB target-to-masker ratio were obtained. The new corpus was found to be very sensitive to inter-individual differences and produced consistent results across test lists. The corpus is publicly available.
Collapse
Affiliation(s)
- Jens Bo Nielsen
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Torsten Dau
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Lyngby, Denmark
| | - Tobias Neher
- Eriksholm Research Centre, Oticon A/S, Ro̸rtangvej 20, DK-3070 Snekkersten, Denmark
| |
Collapse
|
48
|
Dawes P, Munro KJ, Kalluri S, Edwards B. Unilateral and bilateral hearing aids, spatial release from masking and auditory acclimatization. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:596-606. [PMID: 23862834 DOI: 10.1121/1.4807783] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Spatial release from masking (SRM) was tested within the first week of fitting and after 12 weeks hearing aid use for unilateral and bilateral adult hearing aid users. A control group of experienced hearing aid users completed testing over a similar time frame. The main research aims were (1) to examine auditory acclimatization effects on SRM performance for unilateral and bilateral hearing aid users, (2) to examine whether hearing aid use, level of hearing loss, age or cognitive ability mediate acclimatization, and (3) to compare and contrast the outcome of unilateral versus bilateral aiding on SRM. Hearing aid users were tested with and without hearing aids, with SRM calculated as the 50% speech recognition threshold advantage when maskers and target are spatially separated at ±90° azimuth to the listener compared to a co-located condition. The conclusions were (1) on average there was no improvement over time in familiar aided listening conditions, (2) there was large test-retest variability which may overshadow small average acclimatization effects; greater improvement was associated with better cognitive ability and younger age, but not associated with hearing aid use, and (3) overall, bilateral aids facilitated better SRM performance than unilateral aids.
Collapse
Affiliation(s)
- Piers Dawes
- School of Psychological Sciences, University of Manchester, M13 9PL Manchester, United Kingdom.
| | | | | | | |
Collapse
|
49
|
Woods WS, Kalluri S, Pentony S, Nooraei N. Predicting the effect of hearing loss and audibility on amplified speech reception in a multi-talker listening scenario. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:4268-78. [PMID: 23742377 DOI: 10.1121/1.4803859] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Auditive and cognitive influences on speech perception in a complex situation were investigated in listeners with normal hearing (NH) and hearing loss (HL). The speech corpus used was the Nonsense-Syllable Response Measure [NSRM; Woods and Kalluri, (2010). International Hearing Aid Research Conference, pp. 40-41], a 12-talker corpus which combines 154 nonsense syllables with 8 different carrier phrases. Listeners heard NSRM sentences in quiet, background noise, and in background noise plus other "jammer" NSRM sentences. All stimuli were linearly amplified. A "proficiency" value, determined from the results in quiet and the quiet-condition speech intelligibility index (SII), was used with the SII in predicting results in the other conditions. Results for nine of ten NH subjects were well-predicted (within the limits of binomial variability) in the noise condition, as were eight of these subjects in the noise-plus-jammers condition. All 16 HL results were well-predicted in the noise condition, as were 9 of the HL in the noise-plus-jammers condition. Hierarchical regression partialling out the effects of age found proficiency in noise-plus-jammers significantly correlated with results of "trail-making" tests, thought to index processing speed and attention-deployment ability, and proficiency in quiet and noise was found significantly correlated with results from a backward digit-span memory test.
Collapse
Affiliation(s)
- William S Woods
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704, USA.
| | | | | | | |
Collapse
|
50
|
Kitterick PT, Clarke E, O'Shea C, Seymour J, Summerfield AQ. Target identification using relative level in multi-talker listening. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:2899-2909. [PMID: 23654395 DOI: 10.1121/1.4799810] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Previous studies have suggested that listeners can identify words spoken by a target talker amidst competing talkers if they are distinguished by their spatial location or vocal characteristics. This "direct" identification of individual words is distinct from an "indirect" identification based on an association with other words (call-signs) that uniquely label the target. The present study assessed listeners' ability to use differences in presentation level between a target and overlapping maskers to identify target words. A new sentence was spoken every 800 ms by an unpredictable talker from an unpredictable location. Listeners reported color and number words in a target sentence distinguished by a unique call-sign. When masker levels were fixed, target words could be identified directly based on their relative level. Speech-reception thresholds (SRTs) were low (-12.9 dB) and were raised by 5 dB when direct identification was disrupted by randomizing masker levels. Thus, direct identification is possible using relative level. The underlying psychometric functions were monotonic even when relative level was a reliable cue. In a further experiment, indirect identification was prevented by removing the unique call-sign cue. SRTs did not change provided that other cues were available to identify target words directly. Thus, direct identification is possible without indirect identification.
Collapse
Affiliation(s)
- Pádraig T Kitterick
- Department of Psychology, University of York, York YO10 5DD, United Kingdom.
| | | | | | | | | |
Collapse
|