1
|
Strivens A, Koch I, Lavric A. Does preparation help to switch auditory attention between simultaneous voices: Effects of switch probability and prevalence of conflict. Atten Percept Psychophys 2024; 86:750-767. [PMID: 38212478 PMCID: PMC11062987 DOI: 10.3758/s13414-023-02841-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/22/2023] [Indexed: 01/13/2024]
Abstract
Switching auditory attention to one of two (or more) simultaneous voices incurs a substantial performance overhead. Whether/when this voice 'switch cost' reduces when the listener has opportunity to prepare in silence is not clear-the findings on the effect of preparation on the switch cost range from (near) null to substantial. We sought to determine which factors are crucial for encouraging preparation and detecting its effect on the switch cost in a paradigm where participants categorized the number spoken by one of two simultaneous voices; the target voice, which changed unpredictably, was specified by a visual cue depicting the target's gender. First, we manipulated the probability of a voice switch. When 25% of trials were switches, increasing the preparation interval (50/800/1,400 ms) resulted in substantial (~50%) reduction in switch cost. No reduction was observed when 75% of trials were switches. Second, we examined the relative prevalence of low-conflict, 'congruent' trials (where the numbers spoken by the two voices were mapped onto the same response) and high-conflict, 'incongruent' trials (where the voices afforded different responses). 'Conflict prevalence' had a strong effect on selectivity-the incongruent-congruent difference ('congruence effect') was reduced in the 66%-incongruent condition relative to the 66%-congruent condition-but conflict prevalence did not discernibly interact with preparation and its effect on the switch cost. Thus, conditions where switches of target voice are relatively rare are especially conducive to preparation, possibly because attention is committed more strongly to (and/or disengaged less rapidly from) the perceptual features of target voice.
Collapse
Affiliation(s)
- Amy Strivens
- Institute for Psychology, RWTH Aachen University, Jägerstraße 17-19, 52066, Aachen, Germany.
| | - Iring Koch
- Institute for Psychology, RWTH Aachen University, Jägerstraße 17-19, 52066, Aachen, Germany
| | - Aureliu Lavric
- Department of Psychology, University of Exeter, Exeter, UK
| |
Collapse
|
2
|
Byrne AJ, Conroy C, Kidd G. Individual differences in speech-on-speech masking are correlated with cognitive and visual task performance. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2137-2153. [PMID: 37800988 PMCID: PMC10631817 DOI: 10.1121/10.0021301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/19/2023] [Accepted: 09/17/2023] [Indexed: 10/07/2023]
Abstract
Individual differences in spatial tuning for masked target speech identification were determined using maskers that varied in type and proximity to the target source. The maskers were chosen to produce three strengths of informational masking (IM): high [same-gender, speech-on-speech (SOS) masking], intermediate (the same masker speech time-reversed), and low (speech-shaped, speech-envelope-modulated noise). Typical for this task, individual differences increased as IM increased, while overall performance decreased. To determine the extent to which auditory performance might generalize to another sensory modality, a comparison visual task was also implemented. Visual search time was measured for identifying a cued object among "clouds" of distractors that were varied symmetrically in proximity to the target. The visual maskers also were chosen to produce three strengths of an analog of IM based on feature similarities between the target and maskers. Significant correlations were found for overall auditory and visual task performance, and both of these measures were correlated with an index of general cognitive reasoning. Overall, the findings provide qualified support for the proposition that the ability of an individual to solve IM-dominated tasks depends on cognitive mechanisms that operate in common across sensory modalities.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| | - Christopher Conroy
- Department of Biological and Vision Sciences, State University of New York College of Optometry, New York, New York 10036, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
3
|
Layer N, Abdel-Latif KHA, Radecke JO, Müller V, Weglage A, Lang-Roth R, Walger M, Sandmann P. Effects of noise and noise reduction on audiovisual speech perception in cochlear implant users: An ERP study. Clin Neurophysiol 2023; 154:141-156. [PMID: 37611325 DOI: 10.1016/j.clinph.2023.07.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Revised: 06/19/2023] [Accepted: 07/14/2023] [Indexed: 08/25/2023]
Abstract
OBJECTIVE Hearing with a cochlear implant (CI) is difficult in noisy environments, but the use of noise reduction algorithms, specifically ForwardFocus, can improve speech intelligibility. The current event-related potentials (ERP) study examined the electrophysiological correlates of this perceptual improvement. METHODS Ten bimodal CI users performed a syllable-identification task in auditory and audiovisual conditions, with syllables presented from the front and stationary noise presented from the sides. Brainstorm was used for spatio-temporal evaluation of ERPs. RESULTS CI users revealed an audiovisual benefit as reflected by shorter response times and greater activation in temporal and occipital regions at P2 latency. However, in auditory and audiovisual conditions, background noise hampered speech processing, leading to longer response times and delayed auditory-cortex-activation at N1 latency. Nevertheless, activating ForwardFocus resulted in shorter response times, reduced listening effort and enhanced superior-frontal-cortex-activation at P2 latency, particularly in audiovisual conditions. CONCLUSIONS ForwardFocus enhances speech intelligibility in audiovisual speech conditions by potentially allowing the reallocation of attentional resources to relevant auditory speech cues. SIGNIFICANCE This study shows for CI users that background noise and ForwardFocus differentially affect spatio-temporal cortical response patterns, both in auditory and audiovisual speech conditions.
Collapse
Affiliation(s)
- Natalie Layer
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany.
| | | | - Jan-Ole Radecke
- Dept. of Psychiatry and Psychotherapy, University of Lübeck, Germany; Center for Brain, Behaviour and Metabolism (CBBM), University of Lübeck, Germany
| | - Verena Müller
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Anna Weglage
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Ruth Lang-Roth
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany
| | - Martin Walger
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany; Jean-Uhrmacher-Institute for Clinical ENT Research, University of Cologne, Germany
| | - Pascale Sandmann
- University of Cologne, Faculty of Medicine and University Hospital Cologne, Department of Otorhinolaryngology, Head and Neck Surgery, Audiology and Pediatric Audiology, Cochlear Implant Center, Germany; Department of Otolaryngology, Head and Neck Surgery, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
4
|
Nisha KV, Uppunda AK, Kumar RT. Spatial rehabilitation using virtual auditory space training paradigm in individuals with sensorineural hearing impairment. Front Neurosci 2023; 16:1080398. [PMID: 36733923 PMCID: PMC9887142 DOI: 10.3389/fnins.2022.1080398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 12/20/2022] [Indexed: 01/18/2023] Open
Abstract
Purpose The present study aimed to quantify the effects of spatial training using virtual sources on a battery of spatial acuity measures in listeners with sensorineural hearing impairment (SNHI). Methods An intervention-based time-series comparison design involving 82 participants divided into three groups was adopted. Group I (n = 27, SNHI-spatially trained) and group II (n = 25, SNHI-untrained) consisted of SNHI listeners, while group III (n = 30) had listeners with normal hearing (NH). The study was conducted in three phases. In the pre-training phase, all the participants underwent a comprehensive assessment of their spatial processing abilities using a battery of tests including spatial acuity in free-field and closed-field scenarios, tests for binaural processing abilities (interaural time threshold [ITD] and level difference threshold [ILD]), and subjective ratings. While spatial acuity in the free field was assessed using a loudspeaker-based localization test, the closed-field source identification test was performed using virtual stimuli delivered through headphones. The ITD and ILD thresholds were obtained using a MATLAB psychoacoustic toolbox, while the participant ratings on the spatial subsection of speech, spatial, and qualities questionnaire in Kannada were used for the subjective ratings. Group I listeners underwent virtual auditory spatial training (VAST), following pre-evaluation assessments. All tests were re-administered on the group I listeners halfway through training (mid-training evaluation phase) and after training completion (post-training evaluation phase), whereas group II underwent these tests without any training at the same time intervals. Results and discussion Statistical analysis showed the main effect of groups in all tests at the pre-training evaluation phase, with post hoc comparisons that revealed group equivalency in spatial performance of both SNHI groups (groups I and II). The effect of VAST in group I was evident on all the tests, with the localization test showing the highest predictive power for capturing VAST-related changes on Fischer discriminant analysis (FDA). In contrast, group II demonstrated no changes in spatial acuity across timelines of measurements. FDA revealed increased errors in the categorization of NH as SNHI-trained at post-training evaluation compared to pre-training evaluation, as the spatial performance of the latter improved with VAST in the post-training phase. Conclusion The study demonstrated positive outcomes of spatial training using VAST in listeners with SNHI. The utility of this training program can be extended to other clinical population with spatial auditory processing deficits such as auditory neuropathy spectrum disorder, cochlear implants, central auditory processing disorders etc.
Collapse
|
5
|
Hládek Ľ, Seeber BU. Speech Intelligibility in Reverberation is Reduced During Self-Rotation. Trends Hear 2023; 27:23312165231188619. [PMID: 37475460 PMCID: PMC10363862 DOI: 10.1177/23312165231188619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 06/23/2023] [Accepted: 07/02/2023] [Indexed: 07/22/2023] Open
Abstract
Speech intelligibility in cocktail party situations has been traditionally studied for stationary sound sources and stationary participants. Here, speech intelligibility and behavior were investigated during active self-rotation of standing participants in a spatialized speech test. We investigated if people would rotate to improve speech intelligibility, and we asked if knowing the target location would be further beneficial. Target sentences randomly appeared at one of four possible locations: 0°, ± 90°, 180° relative to the participant's initial orientation on each trial, while speech-shaped noise was presented from the front (0°). Participants responded naturally with self-rotating motion. Target sentences were presented either without (Audio-only) or with a picture of an avatar (Audio-Visual). In a baseline (Static) condition, people were standing still without visual location cues. Participants' self-orientation undershot the target location and orientations were close to acoustically optimal. Participants oriented more often in an acoustically optimal way, and speech intelligibility was higher in the Audio-Visual than in the Audio-only condition for the lateral targets. The intelligibility of the individual words in Audio-Visual and Audio-only increased during self-rotation towards the rear target, but it was reduced for the lateral targets when compared to Static, which could be mostly, but not fully, attributed to changes in spatial unmasking. Speech intelligibility prediction based on a model of static spatial unmasking considering self-rotations overestimated the participant performance by 1.4 dB. The results suggest that speech intelligibility is reduced during self-rotation, and that visual cues of location help to achieve more optimal self-rotations and better speech intelligibility.
Collapse
Affiliation(s)
- Ľuboš Hládek
- Audio Information Processing, Technical University of Munich, Munich, Germany
| | - Bernhard U. Seeber
- Audio Information Processing, Technical University of Munich, Munich, Germany
| |
Collapse
|
6
|
Gulli A, Fontana F, Orzan E, Aruffo A, Muzzi E. Spontaneous head movements support accurate horizontal auditory localization in a virtual visual environment. PLoS One 2022; 17:e0278705. [PMID: 36473012 PMCID: PMC9725155 DOI: 10.1371/journal.pone.0278705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 11/21/2022] [Indexed: 12/12/2022] Open
Abstract
This study investigates the relationship between auditory localization accuracy in the horizontal plane and the spontaneous translation and rotation of the head in response to an acoustic stimulus from an invisible sound source. Although a number of studies have suggested that localization ability improves with head movements, most of them measured the perceived source elevation and front-back disambiguation. We investigated the contribution of head movements to auditory localization in the anterior horizontal field in normal hearing subjects. A virtual reality scenario was used to conceal visual cues during the test through a head mounted display. In this condition, we found that an active search of the sound origin using head movements is not strictly necessary, yet sufficient for achieving greater sound source localization accuracy. This result may have important implications in the clinical assessment and training of adults and children affected by hearing and motor impairments.
Collapse
Affiliation(s)
- Andrea Gulli
- HCI Lab, Department of Mathematics, Computer Science and Physics, University of Udine, Udine, Italy
- * E-mail:
| | - Federico Fontana
- HCI Lab, Department of Mathematics, Computer Science and Physics, University of Udine, Udine, Italy
| | - Eva Orzan
- Otorhinolaryngology and Audiology, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, Italy
| | - Alessandro Aruffo
- Otorhinolaryngology and Audiology, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, Italy
| | - Enrico Muzzi
- Otorhinolaryngology and Audiology, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, Italy
| |
Collapse
|
7
|
van Wieringen A, Van Wilderode M, Van Humbeeck N, Krampe R. Coupling of sensorimotor and cognitive functions in middle- and late adulthood. Front Neurosci 2022; 16:1049639. [PMID: 36532286 PMCID: PMC9752872 DOI: 10.3389/fnins.2022.1049639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Accepted: 11/08/2022] [Indexed: 11/03/2023] Open
Abstract
Introduction The present study explored age effects and the coupling of sensorimotor and cognitive functions in a stratified sample of 96 middle-aged and older adults (age 45-86 years) with no indication of mild cognitive decline. In our sensorimotor tasks, we had an emphasis on listening in noise and postural control, but we also assessed functional mobility and tactile sensitivity. Methods Our cognitive measures comprised processing speed and assessments of core cognitive control processes (executive functions), notably inhibition, task switching, and working memory updating. We explored whether our measures of sensorimotor functioning mediated age differences in cognitive variables and compared their effect to processing speed. Subsequently, we examined whether individuals who had poorer (or better) than median cognitive performance for their age group also performed relatively poorer (or better) on sensorimotor tasks. Moreover, we examined whether the link between cognitive and sensorimotor functions becomes more pronounced in older age groups. Results Except for tactile sensitivity, we observed substantial age-related differences in all sensorimotor and cognitive variables from middle age onward. Processing speed and functional mobility were reliable mediators of age in task switching and inhibitory control. Regarding coupling between sensorimotor and cognition, we observed that individuals with poor cognitive control do not necessarily have poor listening in noise skills or poor postural control. Discussion As most conditions do not show an interdependency between sensorimotor and cognitive performance, other domain-specific factors that were not accounted for must also play a role. These need to be researched in order to gain a better understanding of how rehabilitation may impact cognitive functioning in aging persons.
Collapse
Affiliation(s)
- Astrid van Wieringen
- Research Group Experimental Oto-Rhino-Laryngology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Mira Van Wilderode
- Research Group Experimental Oto-Rhino-Laryngology, Department of Neurosciences, KU Leuven, Leuven, Belgium
| | - Nathan Van Humbeeck
- Research Group Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| | - Ralf Krampe
- Research Group Brain and Cognition, Faculty of Psychology and Educational Sciences, KU Leuven, Leuven, Belgium
| |
Collapse
|
8
|
Bestel J, Legris E, Rembaud F, Mom T, Galvin JJ. Speech understanding in diffuse steady noise in typically hearing and hard of hearing listeners. PLoS One 2022; 17:e0274435. [PMID: 36103551 PMCID: PMC9473430 DOI: 10.1371/journal.pone.0274435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 08/29/2022] [Indexed: 12/02/2022] Open
Abstract
Spatial cues can facilitate segregation of target speech from maskers. However, in clinical practice, masked speech understanding is most often evaluated using co-located speech and maskers (i.e., without spatial cues). Many hearing aid centers in France are equipped with five-loudspeaker arrays, allowing masked speech understanding to be measured with spatial cues. It is unclear how hearing status may affect utilization of spatial cues to segregate speech and noise. In this study, speech reception thresholds (SRTs) for target speech in “diffuse noise” (target speech from 1 speaker, noise from the remaining 4 speakers) in 297 adult listeners across 9 Audilab hearing centers. Participants were categorized according to pure-tone-average (PTA) thresholds: typically-hearing (TH; ≤ 20 dB HL), mild hearing loss (Mild; >20 ≤ 40 dB HL), moderate hearing loss 1 (Mod-1; >40 ≤ 55 dB HL), and moderate hearing loss 2 (Mod-2; >55 ≤ 65 dB HL). All participants were tested without aided hearing. SRTs in diffuse noise were significantly correlated with PTA thresholds, age at testing, as well as word and phoneme recognition scores in quiet. Stepwise linear regression analysis showed that SRTs in diffuse noise were significantly predicted by a combination of PTA threshold and word recognition scores in quiet. SRTs were also measured in co-located and diffuse noise in 65 additional participants. SRTs were significantly lower in diffuse noise than in co-located noise only for the TH and Mild groups; masking release with diffuse noise (relative to co-located noise) was significant only for the TH group. The results are consistent with previous studies that found that hard of hearing listeners have greater difficulty using spatial cues to segregate competing speech. The data suggest that speech understanding in diffuse noise provides additional insight into difficulties that hard of hearing individuals experience in complex listening environments.
Collapse
Affiliation(s)
| | | | | | - Thierry Mom
- Centre Hospitalier Universitaire de Clermont-Ferrand, Clermont-Ferrand, France
| | - John J. Galvin
- University Hospital Center of Tours, Tours, France
- House Institute Foundation, Los Angeles, CA, United States of America
- * E-mail:
| |
Collapse
|
9
|
Cho AY, Kidd G. Auditory motion as a cue for source segregation and selection in a "cocktail party" listening environment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1684. [PMID: 36182296 PMCID: PMC9489258 DOI: 10.1121/10.0013990] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Source motion was examined as a cue for segregating concurrent speech or noise sources. In two different headphone-based tasks-motion detection (MD) and speech-on-speech masking (SI)-one source among three was designated as the target only by imposing sinusoidal variation in azimuth during the stimulus presentation. For MD, the lstener was asked which of the three concurrent sources was in motion during the trial. For SI, the listener was asked to report the words spoken by the moving speech source. MD performance improved as the amplitude of the sinusoidal motion (i.e., displacement in azimuth) increased over the range of values tested (±5° to ±30°) for both modulated noise and speech targets, with better performance found for speech. SI performance also improved as the amplitude of target motion increased. Furthermore, SI performance improved as word position progressed throughout the sentence. Performance on the MD task was correlated with performance on SI task across individual subjects. For the SI conditions tested here, these findings are consistent with the proposition that listeners first detect the moving target source, then focus attention on the target location as the target sentence unfolds.
Collapse
Affiliation(s)
- Adrian Y Cho
- Speech and Hearing Bioscience and Technology Program, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
10
|
Motor Influence in Developing Auditory Spatial Cognition in Hemiplegic Children with and without Visual Field Disorder. CHILDREN 2022; 9:children9071055. [PMID: 35884039 PMCID: PMC9320626 DOI: 10.3390/children9071055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 07/06/2022] [Accepted: 07/12/2022] [Indexed: 12/05/2022]
Abstract
Spatial representation is a crucial skill for everyday interaction with the environment. Different factors seem to influence spatial perception, such as body movements and vision. However, it is still unknown if motor impairment affects the building of simple spatial perception. To investigate this point, we tested hemiplegic children with (HV) and without visual field (H) disorders in an auditory and visual-spatial localization and pitch discrimination task. Fifteen hemiplegic children (nine H and six HV) and twenty with typical development took part in the experiment. The tasks consisted in listening to a sound coming from a series of speakers positioned at the front or back of the subject. In one condition, subjects were asked to discriminate the pitch, while in the other, subjects had to localize the position of the sound. We also replicated the spatial task in a visual modality. Both groups of hemiplegic children performed worse in the auditory spatial localization task compared with the control, while no difference was found in the pitch discrimination task. For the visual-spatial localization task, only HV children differed from the two other groups. These results suggest that movement is important for the development of auditory spatial representation.
Collapse
|
11
|
Temporal and Directional Cue Effects on the Cocktail Party Problem for Patients With Listening Difficulties Without Clinical Hearing Loss. Ear Hear 2022; 43:1740-1751. [DOI: 10.1097/aud.0000000000001247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
12
|
Yang T, Kang J. Perception difference for approaching and receding sound sources of a listener in motion in architectural sequential spaces. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:685. [PMID: 35232108 DOI: 10.1121/10.0009231] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Accepted: 12/10/2021] [Indexed: 06/14/2023]
Abstract
This study investigates the dynamic auditory perception in large sequential public spaces for listeners in motion with a stationary primary sound source. Virtual soundwalks, involving four music and voice sources and validated with in situ soundwalks, were conducted in an exhibition space. The perception differences between the approaching and receding sound sources were explored, and three major effects were found. The rising sound received a higher rating in each room with a greater perceived change in the loudness than the falling sound despite equal changes in both levels (approach effect). The difference was greater for the room connected to the source room. The loudness in the room connected to the source room receives a sharp drop (plummet effect) for the receding sound source, which was larger for music than for voice. The effect of the background sound impairing the perceptual priority of rising sound was profound in the receiving rooms. The loudness patterns could not be extended to other perceptual attributes, including reverberation. An increasing symmetry of the overall perception between the different sound source types was observed (convergence effect) either by the approaching or receding sound sources. The overall asymmetry of the directional aspects occurring with the noise and voice was not as distinguishable as with music.
Collapse
Affiliation(s)
- Tingting Yang
- Institute for Environmental Design and Engineering, The Bartlett, University College London, London WC1H 0NN, United Kingdom
| | - Jian Kang
- Institute for Environmental Design and Engineering, The Bartlett, University College London, London WC1H 0NN, United Kingdom
| |
Collapse
|
13
|
Deep neural network models of sound localization reveal how perception is adapted to real-world environments. Nat Hum Behav 2022; 6:111-133. [PMID: 35087192 PMCID: PMC8830739 DOI: 10.1038/s41562-021-01244-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 10/29/2021] [Indexed: 11/15/2022]
Abstract
Mammals localize sounds using information from their two ears.
Localization in real-world conditions is challenging, as echoes provide
erroneous information, and noises mask parts of target sounds. To better
understand real-world localization we equipped a deep neural network with human
ears and trained it to localize sounds in a virtual environment. The resulting
model localized accurately in realistic conditions with noise and reverberation.
In simulated experiments, the model exhibited many features of human spatial
hearing: sensitivity to monaural spectral cues and interaural time and level
differences, integration across frequency, biases for sound onsets, and limits
on localization of concurrent sources. But when trained in unnatural
environments without either reverberation, noise, or natural sounds, these
performance characteristics deviated from those of humans. The results show how
biological hearing is adapted to the challenges of real-world environments and
illustrate how artificial neural networks can reveal the real-world constraints
that shape perception.
Collapse
|
14
|
Uhrig S, Perkis A, Möller S, Svensson UP, Behne DM. Effects of Spatial Speech Presentation on Listener Response Strategy for Talker-Identification. Front Neurosci 2022; 15:730744. [PMID: 35153653 PMCID: PMC8831717 DOI: 10.3389/fnins.2021.730744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 12/13/2021] [Indexed: 11/28/2022] Open
Abstract
This study investigates effects of spatial auditory cues on human listeners' response strategy for identifying two alternately active talkers (“turn-taking” listening scenario). Previous research has demonstrated subjective benefits of audio spatialization with regard to speech intelligibility and talker-identification effort. So far, the deliberate activation of specific perceptual and cognitive processes by listeners to optimize their task performance remained largely unexamined. Spoken sentences selected as stimuli were either clean or degraded due to background noise or bandpass filtering. Stimuli were presented via three horizontally positioned loudspeakers: In a non-spatial mode, both talkers were presented through a central loudspeaker; in a spatial mode, each talker was presented through the central or a talker-specific lateral loudspeaker. Participants identified talkers via speeded keypresses and afterwards provided subjective ratings (speech quality, speech intelligibility, voice similarity, talker-identification effort). In the spatial mode, presentations at lateral loudspeaker locations entailed quicker behavioral responses, which were significantly slower in comparison to a talker-localization task. Under clean speech, response times globally increased in the spatial vs. non-spatial mode (across all locations); these “response time switch costs,” presumably being caused by repeated switching of spatial auditory attention between different locations, diminished under degraded speech. No significant effects of spatialization on subjective ratings were found. The results suggested that when listeners could utilize task-relevant auditory cues about talker location, they continued to rely on voice recognition instead of localization of talker sound sources as primary response strategy. Besides, the presence of speech degradations may have led to increased cognitive control, which in turn compensated for incurring response time switch costs.
Collapse
Affiliation(s)
- Stefan Uhrig
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- *Correspondence: Stefan Uhrig
| | - Andrew Perkis
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Sebastian Möller
- Quality and Usability Lab, Technische Universität Berlin, Berlin, Germany
- Speech and Language Technology, German Research Center for Artificial Intelligence, Berlin, Germany
| | - U. Peter Svensson
- Department of Electronic Systems, Norwegian University of Science and Technology, Trondheim, Norway
| | - Dawn M. Behne
- Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| |
Collapse
|
15
|
Cutting Through the Noise: Noise-Induced Cochlear Synaptopathy and Individual Differences in Speech Understanding Among Listeners With Normal Audiograms. Ear Hear 2022; 43:9-22. [PMID: 34751676 PMCID: PMC8712363 DOI: 10.1097/aud.0000000000001147] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Following a conversation in a crowded restaurant or at a lively party poses immense perceptual challenges for some individuals with normal hearing thresholds. A number of studies have investigated whether noise-induced cochlear synaptopathy (CS; damage to the synapses between cochlear hair cells and the auditory nerve following noise exposure that does not permanently elevate hearing thresholds) contributes to this difficulty. A few studies have observed correlations between proxies of noise-induced CS and speech perception in difficult listening conditions, but many have found no evidence of a relationship. To understand these mixed results, we reviewed previous studies that have examined noise-induced CS and performance on speech perception tasks in adverse listening conditions in adults with normal or near-normal hearing thresholds. Our review suggests that superficially similar speech perception paradigms used in previous investigations actually placed very different demands on sensory, perceptual, and cognitive processing. Speech perception tests that use low signal-to-noise ratios and maximize the importance of fine sensory details- specifically by using test stimuli for which lexical, syntactic, and semantic cues do not contribute to performance-are more likely to show a relationship to estimated CS levels. Thus, the current controversy as to whether or not noise-induced CS contributes to individual differences in speech perception under challenging listening conditions may be due in part to the fact that many of the speech perception tasks used in past studies are relatively insensitive to CS-induced deficits.
Collapse
|
16
|
Wächtler M, Kessler J, Walger M, Meister H. Revealing Perceptional and Cognitive Mechanisms in Static and Dynamic Cocktail Party Listening by Means of Error Analyses. Trends Hear 2022; 26:23312165221111676. [PMID: 35849353 PMCID: PMC9297473 DOI: 10.1177/23312165221111676] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
In cocktail party situations multiple talkers speak simultaneously, which causes listening to be perceptually and cognitively challenging. Such situations can either be static (fixed target talker) or dynamic, meaning the target talker switches occasionally and in a potentially unpredictable way. To shed light on the perceptional and cognitive mechanisms in static and dynamic cocktail party situations, we conducted an analysis of error types that occur during a multi-talker speech recognition test. The error analysis distinguished between misunderstood or omitted words (random errors) and target-masker confusions. To investigate the effects of aging and hearing impairment, we compared data from three listener groups, comprised of young as well as older adults with and without hearing loss. In the static condition, error rates were generally very low, except for the older hearing-impaired listeners. Consistent with the assumption of decreased audibility, they showed a notable amount of random errors. In the dynamic condition, errors increased compared to the static condition, especially immediately following a target talker switch. Those increases were similar for random and confusion errors. The older hearing-impaired listeners showed greater difficulties than the younger adults in trials not preceded by a switch. These results suggest that the load associated with dynamic cocktail party listening affects the ability to focus attention on the talker of interest and the retrieval of words from short-term memory, as indicated by the increased amount of confusion and random errors. This was most pronounced in the older hearing-impaired listeners proposing an interplay of perceptual and cognitive mechanisms.
Collapse
Affiliation(s)
- Moritz Wächtler
- Jean-Uhrmacher-Institute for Clinical ENT-Research, 14309University of Cologne, Cologne, Germany
| | - Josef Kessler
- Department of Neurology, University Hospital of Cologne, Cologne, Germany
| | - Martin Walger
- Jean-Uhrmacher-Institute for Clinical ENT-Research, 14309University of Cologne, Cologne, Germany.,Clinic of Otorhinolaryngology, Head and Neck Surgery, 14309University of Cologne, Cologne, Germany
| | - Hartmut Meister
- Jean-Uhrmacher-Institute for Clinical ENT-Research, 14309University of Cologne, Cologne, Germany
| |
Collapse
|
17
|
Abstract
Identification of speech from a "target" talker was measured in a speech-on-speech
masking task with two simultaneous "masker" talkers. The overall level of each talker was
either fixed or randomized throughout each stimulus presentation to investigate the
effectiveness of level as a cue for segregating competing talkers and attending to the
target. Experimental manipulations included varying the level difference between talkers
and imposing three types of target level uncertainty: 1) fixed target level across trials,
2) random target level across trials, or 3) random target levels on a word-by-word basis
within a trial. When the target level was predictable performance was better than
corresponding conditions when the target level was uncertain. Masker confusions were
consistent with a high degree of informational masking (IM). Furthermore, evidence was
found for "tuning" in level and a level "release" from IM. These findings suggest that
conforming to listener expectation about relative level, in addition to cues signaling
talker identity, facilitates segregation of, and maintaining focus of attention on, a
specific talker in multiple-talker communication situations.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Christopher Conroy
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Gerald Kidd
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA.,Department of Otolaryngology, Head-Neck Surgery, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
18
|
Cortical Processing of Binaural Cues as Shown by EEG Responses to Random-Chord Stereograms. J Assoc Res Otolaryngol 2021; 23:75-94. [PMID: 34904205 PMCID: PMC8783002 DOI: 10.1007/s10162-021-00820-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/06/2021] [Indexed: 10/26/2022] Open
Abstract
Spatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.
Collapse
|
19
|
Kane SG, Dean KM, Buss E. Speech-in-Speech Recognition and Spatially Selective Attention in Children and Adults. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3617-3626. [PMID: 34403280 PMCID: PMC8642097 DOI: 10.1044/2021_jslhr-21-00108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2021] [Revised: 05/06/2021] [Accepted: 05/11/2021] [Indexed: 06/13/2023]
Abstract
Purpose Knowing target location can improve adults' speech-in-speech recognition in complex auditory environments, but it is unknown whether young children listen selectively in space. This study evaluated masked word recognition with and without a pretrial cue to location to characterize the influence of listener age and masker type on the benefit of spatial cues. Method Participants were children (5-13 years of age) and adults with normal hearing. Testing occurred in a 180° arc of 11 loudspeakers. Targets were spondees produced by a female talker and presented from a randomly selected loudspeaker; that location was either known, based on a pretrial cue, or unknown. Maskers were two sequences comprising spondees or speech-shaped noise bursts, each presented from a random loudspeaker. Speech maskers were produced by one male talker or by three talkers, two male and one female. Results Children and adults benefited from the pretrial cue to target location with the three-voice masker, and the magnitude of benefit increased with increasing child age. There was no benefit of location cues in the one-voice or noise-burst maskers. Incorrect responses in the three-voice masker tended to correspond to masker words produced by the female talker, and in the location-known condition, those masker intrusions were more likely near the cued loudspeaker for both age groups. Conclusions Increasing benefit of the location cue with increasing child age in the three-voice masker suggests maturation of spatially selective attention, but error patterns do not support this idea. Differences in performance in the location-unknown condition could play a role in the differential benefit of the location cue.
Collapse
Affiliation(s)
- Stacey G. Kane
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Kelly M. Dean
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| |
Collapse
|
20
|
Turri S, Rizvi M, Rabini G, Melonio A, Gennari R, Pavani F. Orienting Auditory Attention through Vision: the Impact of Monaural Listening. Multisens Res 2021; 35:1-28. [PMID: 34384046 DOI: 10.1163/22134808-bja10059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 07/14/2021] [Indexed: 11/19/2022]
Abstract
The understanding of linguistic messages can be made extremely complex by the simultaneous presence of interfering sounds, especially when they are also linguistic in nature. In two experiments, we tested if visual cues directing attention to spatial or temporal components of speech in noise can improve its identification. The hearing-in-noise task required identification of a five-digit sequence (target) embedded in a stream of time-reversed speech. Using a custom-built device located in front of the participant, we delivered visual cues to orient attention to the location of target sounds and/or their temporal window. In Exp. 1 ( n = 14), we validated this visual-to-auditory cueing method in normal-hearing listeners, tested under typical binaural listening conditions. In Exp. 2 ( n = 13), we assessed the efficacy of the same visual cues in normal-hearing listeners wearing a monaural ear plug, to study the effects of simulated monaural and conductive hearing loss on visual-to-auditory attention orienting. While Exp. 1 revealed a benefit of both spatial and temporal visual cues for hearing in noise, Exp. 2 showed that only the temporal visual cues remained effective during monaural listening. These findings indicate that when the acoustic experience is altered, visual-to-auditory attention orienting is more robust for temporal compared to spatial attributes of the auditory stimuli. These findings have implications for the relation between spatial and temporal attributes of sound objects, and when planning devices to orient audiovisual attention for subjects suffering from hearing loss.
Collapse
Affiliation(s)
- Silvia Turri
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,Dipartimento di Psicologia e Scienze Cognitive, Università di Trento, 38068 Rovereto, Italy
| | - Mehdi Rizvi
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Giuseppe Rabini
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy
| | - Alessandra Melonio
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Rosella Gennari
- Faculty of Computer Science, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
| | - Francesco Pavani
- Centro Interdipartimentale Mente/Cervello - CIMeC, Università di Trento, 38068 Rovereto, Italy.,IMPACT, Centre de Recherche en Neurosciences de Lyon (CRNL), 69500 Bron, France
| |
Collapse
|
21
|
Warnecke M, Litovsky RY. Signal envelope and speech intelligibility differentially impact auditory motion perception. Sci Rep 2021; 11:15117. [PMID: 34302032 PMCID: PMC8302594 DOI: 10.1038/s41598-021-94662-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Accepted: 07/14/2021] [Indexed: 11/09/2022] Open
Abstract
Our acoustic environment contains a plethora of complex sounds that are often in motion. To gauge approaching danger and communicate effectively, listeners need to localize and identify sounds, which includes determining sound motion. This study addresses which acoustic cues impact listeners' ability to determine sound motion. Signal envelope (ENV) cues are implicated in both sound motion tracking and stimulus intelligibility, suggesting that these processes could be competing for sound processing resources. We created auditory chimaera from speech and noise stimuli and varied the number of frequency bands, effectively manipulating speech intelligibility. Normal-hearing adults were presented with stationary or moving chimaeras and reported perceived sound motion and content. Results show that sensitivity to sound motion is not affected by speech intelligibility, but shows a clear difference for original noise and speech stimuli. Further, acoustic chimaera with speech-like ENVs which had intelligible content induced a strong bias in listeners to report sounds as stationary. Increasing stimulus intelligibility systematically increased that bias and removing intelligible content reduced it, suggesting that sound content may be prioritized over sound motion. These findings suggest that sound motion processing in the auditory system can be biased by acoustic parameters related to speech intelligibility.
Collapse
Affiliation(s)
- Michaela Warnecke
- University of Wisconsin-Madison, Waisman Center, 1500 Highland Ave, Madison, WI, 53705, USA.
| | - Ruth Y Litovsky
- University of Wisconsin-Madison, Waisman Center, 1500 Highland Ave, Madison, WI, 53705, USA
| |
Collapse
|
22
|
Begau A, Klatt LI, Wascher E, Schneider D, Getzmann S. Do congruent lip movements facilitate speech processing in a dynamic audiovisual multi-talker scenario? An ERP study with older and younger adults. Behav Brain Res 2021; 412:113436. [PMID: 34175355 DOI: 10.1016/j.bbr.2021.113436] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 04/26/2021] [Accepted: 06/21/2021] [Indexed: 11/26/2022]
Abstract
In natural conversations, visible mouth and lip movements play an important role in speech comprehension. There is evidence that visual speech information improves speech comprehension, especially for older adults and under difficult listening conditions. However, the neurocognitive basis is still poorly understood. The present EEG experiment investigated the benefits of audiovisual speech in a dynamic cocktail-party scenario with 22 (aged 20-34 years) younger and 20 (aged 55-74 years) older participants. We presented three simultaneously talking faces with a varying amount of visual speech input (still faces, visually unspecific and audiovisually congruent). In a two-alternative forced-choice task, participants had to discriminate target words ("yes" or "no") among two distractors (one-digit number words). In half of the experimental blocks, the target was always presented from a central position, in the other half, occasional switches to a lateral position could occur. We investigated behavioral and electrophysiological modulations due to age, location switches and the content of visual information, analyzing response times and accuracy as well as the P1, N1, P2, N2 event-related potentials (ERPs) and the contingent negative variation (CNV) in the EEG. We found that audiovisually congruent speech information improved performance and modulated ERP amplitudes in both age groups, suggesting enhanced preparation and integration of the subsequent auditory input. In the older group, larger amplitude measures were found in early phases of processing (P1-N1). Here, amplitude measures were reduced in response to audiovisually congruent stimuli. In later processing phases (P2-N2) we found decreased amplitude measures in the older group, while an amplitude reduction for audiovisually congruent compared to visually unspecific stimuli was still observable. However, these benefits were only observed as long as no location switches occurred, leading to enhanced amplitude measures in later processing phases (P2-N2). To conclude, meaningful visual information in a multi-talker setting, when presented from the expected location, is shown to be beneficial for both younger and older adults.
Collapse
Affiliation(s)
- Alexandra Begau
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany.
| | - Laura-Isabelle Klatt
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Edmund Wascher
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Daniel Schneider
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| | - Stephan Getzmann
- Leibniz Research Centre for Working Environment and Human Factors, TU Dortmund, Germany
| |
Collapse
|
23
|
Wisniewski MG, Zakrzewski AC, Bell DR, Wheeler M. EEG power spectral dynamics associated with listening in adverse conditions. Psychophysiology 2021; 58:e13877. [PMID: 34161612 DOI: 10.1111/psyp.13877] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Revised: 05/15/2021] [Accepted: 05/17/2021] [Indexed: 01/08/2023]
Abstract
Adverse listening conditions increase the demand on cognitive resources needed for speech comprehension. In an exploratory study, we aimed to identify independent power spectral features in the EEG useful for studying the cognitive processes involved in this effortful listening. Listeners performed the coordinate response measure task with a single-talker masker at a 0-dB signal-to-noise ratio. Sounds were left unfiltered or degraded with low-pass filtering. Independent component analysis (ICA) was used to identify independent components (ICs) in the EEG data, the power spectral dynamics of which were then analyzed. Frontal midline theta, left frontal, right frontal, left mu, right mu, left temporal, parietal, left occipital, central occipital, and right occipital clusters of ICs were identified. All IC clusters showed some significant listening-related changes in their power spectrum. This included sustained theta enhancements, gamma enhancements, alpha enhancements, alpha suppression, beta enhancements, and mu rhythm suppression. Several of these effects were absent or negligible using traditional channel analyses. Comparison of filtered to unfiltered speech revealed a stronger alpha suppression in the parietal and central occipital clusters of ICs for the filtered speech condition. This not only replicates recent findings showing greater alpha suppression as listening difficulty increases but also suggests that such alpha-band effects can stem from multiple cortical sources. We lay out the advantages of the ICA approach over the restrictive analyses that have been used as of late in the study of listening effort. We also make suggestions for moving into hypothesis-driven studies regarding the power spectral features that were revealed.
Collapse
Affiliation(s)
- Matthew G Wisniewski
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
| | | | - Destiny R Bell
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
| | - Michelle Wheeler
- Department of Psychological Sciences, Kansas State University, Manhattan, KS, USA
| |
Collapse
|
24
|
Wang J, Chen J, Yang X, Liu L, Wu C, Lu L, Li L, Wu Y. Common Brain Substrates Underlying Auditory Speech Priming and Perceived Spatial Separation. Front Neurosci 2021; 15:664985. [PMID: 34220425 PMCID: PMC8247760 DOI: 10.3389/fnins.2021.664985] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 05/10/2021] [Indexed: 11/22/2022] Open
Abstract
Under a “cocktail party” environment, listeners can utilize prior knowledge of the content and voice of the target speech [i.e., auditory speech priming (ASP)] and perceived spatial separation to improve recognition of the target speech among masking speech. Previous studies suggest that these two unmasking cues are not processed independently. However, it is unclear whether the unmasking effects of these two cues are supported by common neural bases. In the current study, we aimed to first confirm that ASP and perceived spatial separation contribute to the improvement of speech recognition interactively in a multitalker condition and further investigate whether there exist intersectant brain substrates underlying both unmasking effects, by introducing these two unmasking cues in a unified paradigm and using functional magnetic resonance imaging. The results showed that neural activations by the unmasking effects of ASP and perceived separation partly overlapped in brain areas: the left pars triangularis (TriIFG) and orbitalis of the inferior frontal gyrus, left inferior parietal lobule, left supramarginal gyrus, and bilateral putamen, all of which are involved in the sensorimotor integration and the speech production. The activations of the left TriIFG were correlated with behavioral improvements caused by ASP and perceived separation. Meanwhile, ASP and perceived separation also enhanced the functional connectivity between the left IFG and brain areas related to the suppression of distractive speech signals: the anterior cingulate cortex and the left middle frontal gyrus, respectively. Therefore, these findings suggest that the motor representation of speech is important for both the unmasking effects of ASP and perceived separation and highlight the critical role of the left IFG in these unmasking effects in “cocktail party” environments.
Collapse
Affiliation(s)
- Junxian Wang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jing Chen
- Department of Machine Intelligence, Peking University, Beijing, China.,Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | - Xiaodong Yang
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Lei Liu
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Chao Wu
- School of Nursing, Peking University, Beijing, China
| | - Lingxi Lu
- Center for the Cognitive Science of Language, Beijing Language and Culture University, Beijing, China
| | - Liang Li
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China.,Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China.,Beijing Institute for Brain Disorders, Beijing, China
| | - Yanhong Wu
- School of Psychological and Cognitive Sciences and Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China.,Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| |
Collapse
|
25
|
Keshavarzi M, Varano E, Reichenbach T. Cortical Tracking of a Background Speaker Modulates the Comprehension of a Foreground Speech Signal. J Neurosci 2021; 41:5093-5101. [PMID: 33926996 PMCID: PMC8197648 DOI: 10.1523/jneurosci.3200-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/23/2021] [Accepted: 04/12/2021] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore providing evidence of a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we use a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech. Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope provides evidence that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.SIGNIFICANCE STATEMENT Loud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we use transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results provide evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| |
Collapse
|
26
|
Nagels L, Gaudrain E, Vickers D, Hendriks P, Başkent D. School-age children benefit from voice gender cue differences for the perception of speech in competing speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3328. [PMID: 34241121 DOI: 10.1121/10.0004791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2020] [Accepted: 04/08/2021] [Indexed: 06/13/2023]
Abstract
Differences in speakers' voice characteristics, such as mean fundamental frequency (F0) and vocal-tract length (VTL), that primarily define speakers' so-called perceived voice gender facilitate the perception of speech in competing speech. Perceiving speech in competing speech is particularly challenging for children, which may relate to their lower sensitivity to differences in voice characteristics than adults. This study investigated the development of the benefit from F0 and VTL differences in school-age children (4-12 years) for separating two competing speakers while tasked with comprehending one of them and also the relationship between this benefit and their corresponding voice discrimination thresholds. Children benefited from differences in F0, VTL, or both cues at all ages tested. This benefit proportionally remained the same across age, although overall accuracy continued to differ from that of adults. Additionally, children's benefit from F0 and VTL differences and their overall accuracy were not related to their discrimination thresholds. Hence, although children's voice discrimination thresholds and speech in competing speech perception abilities develop throughout the school-age years, children already show a benefit from voice gender cue differences early on. Factors other than children's discrimination thresholds seem to relate more closely to their developing speech in competing speech perception abilities.
Collapse
Affiliation(s)
- Leanne Nagels
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen 9712EK, Netherlands
| | - Etienne Gaudrain
- CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Inserm UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Deborah Vickers
- Sound Lab, Cambridge Hearing Group, Clinical Neurosciences Department, University of Cambridge, Cambridge CB2 0SZ, United Kingdom
| | - Petra Hendriks
- Center for Language and Cognition Groningen (CLCG), University of Groningen, Groningen 9712EK, Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen 9713GZ, Netherlands
| |
Collapse
|
27
|
Versfeld NJ, Lie S, Kramer SE, Zekveld AA. Informational masking with speech-on-speech intelligibility: Pupil response and time-course of learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:2353. [PMID: 33940918 DOI: 10.1121/10.0003952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 03/09/2021] [Indexed: 06/12/2023]
Abstract
Previous research has shown a learning effect on speech perception in nonstationary maskers. The present study addressed the time-course of this learning effect and the role of informational masking. To that end, speech reception thresholds (SRTs) were measured for speech in either a stationary noise masker, an interrupted noise masker, or a single-talker masker. The utterance of the single talker was either time-forward (intelligible) or time-reversed (unintelligible), and the sample of the utterance was either frozen (same utterance at each presentation) or random (different utterance at each presentation but from the same speaker). Simultaneously, the pupil dilation response was measured to assess differences in the listening effort between conditions and to track changes in the listening effort over time within each condition. The results showed a learning effect for all conditions but the stationary noise condition-that is, improvement in SRT over time while maintaining equal pupil responses. There were no significant differences in pupil responses between conditions despite large differences in the SRT. Time reversal of the frozen speech affected neither the SRT nor pupil responses.
Collapse
Affiliation(s)
- Niek J Versfeld
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Sisi Lie
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Sophia E Kramer
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Adriana A Zekveld
- Amsterdam Universitair Medisch Centrum, Vrije Universiteit Amsterdam, Otolaryngology Head and Neck Surgery, Ear and Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
28
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
29
|
Brodbeck C, Jiao A, Hong LE, Simon JZ. Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers. PLoS Biol 2020; 18:e3000883. [PMID: 33091003 PMCID: PMC7644085 DOI: 10.1371/journal.pbio.3000883] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 11/05/2020] [Accepted: 09/14/2020] [Indexed: 01/09/2023] Open
Abstract
Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers’ spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech. How do humans focus on one speaker when several are talking? MEG responses to a continuous two-talker mixture suggest that, even though listeners attend only to one of the talkers, their auditory cortex tracks acoustic features from both speakers. This occurs even when those features are locally masked by the other speaker.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Alex Jiao
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
30
|
Jorgensen EJ, Stangl E, Chipara O, Hernandez H, Oleson J, Wu YH. GPS predicts stability of listening environment characteristics in one location over time among older hearing aid users. Int J Audiol 2020; 60:328-340. [PMID: 33074752 DOI: 10.1080/14992027.2020.1831083] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
OBJECTIVE Hearing aid technology can allow users to "geo-tag" hearing aid preferences using the Global Positioning System (GPS). This technology assumes that listening environment characteristics that affect hearing aid benefit change little in a location over time. The purpose of this study was to investigate whether certain characteristics (reverberation, signal type, listening activity, noise location, noisiness, talker familiarity, talker location, and visual cues) changed in a location over time. Design: Participants completed GPS-tagged surveys on smartphones to report on characteristics of their listening environments. Coordinates were used to create indices that described how much listening environment characteristics changed in a location over time. Indices computed in one location were compared to indices computed across all locations for each participant. Study sample: 54 adults with hearing loss participated in this study (26 males and 38 females; 30 experienced hearing aid users and 24 new users). Results: A location dependency was observed for all characteristics. Characteristics were significantly different from one another in their stability over time. Conclusions: Listening environment characteristics changed less over time in a given location than in participants' lives generally. The effectiveness of GPS-dependent hearing aid settings likely depends on the accuracy and location definition of the GPS feature.
Collapse
Affiliation(s)
- Erik J Jorgensen
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| | - Elizabeth Stangl
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| | - Octav Chipara
- Department of Computer Science, University of Iowa, Iowa City, IA, USA
| | - Helin Hernandez
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Yu-Hsiang Wu
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
31
|
EEG correlates of spatial shifts of attention in a dynamic multi-talker speech perception scenario in younger and older adults. Hear Res 2020; 398:108077. [PMID: 32987238 DOI: 10.1016/j.heares.2020.108077] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 08/13/2020] [Accepted: 09/10/2020] [Indexed: 12/23/2022]
Abstract
Speech perception under "cocktail-party" conditions critically depends on the focusing of attention toward the talker of interest. In dynamic auditory scenes, changes in talker settings require rapid shifts of attention, which is especially relevant when the position of a target talker switches from one location to another. Here, we explored electrophysiological correlates of shifts in spatial auditory attention, using a free-field speech perception task, in which sequences of short words (a company name, followed by a numeric value, e.g., "Bosch-6") were presented in the participants' left and right horizontal plane. Younger and older participants responded to the value of a pre-defined target company, while ignoring three simultaneously presented pairs of concurrent company names and values from different locations. All four stimulus pairs were spoken by different talkers, alternating from trial-to-trial. The location of the target company was within either the left or right hemisphere for a variable number of consecutive trials (between 3 and 42 trials) and then changed, switching from the left to the right hemispace or vice versa. Thus, when a switch occurred, the participants had to search for the new position of the target company among the concurrent streams of auditory information and re-focus their attention on the relevant location. As correlates of lateralized spatial auditory attention, the anterior contralateral N2 subcomponent (N2ac) and the posterior alpha power lateralization were analyzed in trials immediately before and after switches of the target location. Both measures were increased after switches, while only the increase in N2ac was related to better speech perception performance (i.e., a reduced post-switch decline in accuracy). While both age groups showed a similar pattern of switch-related attentional modulations, N2ac and alpha lateralization to the task-relevant stimulus (the target company's value) was overall greater in the younger, than older, group. The results suggest that N2ac and alpha lateralization reflect different attentional processes in multi-talker speech perception, the first being primarily associated with auditory search and the focusing of attention, and the second with the in-depth attentional processing of task-relevant information. Especially the second process appears to be prone to age-related cognitive decline.
Collapse
|
32
|
Static and dynamic cocktail party listening in younger and older adults. Hear Res 2020; 395:108020. [DOI: 10.1016/j.heares.2020.108020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Revised: 05/13/2020] [Accepted: 06/11/2020] [Indexed: 11/21/2022]
|
33
|
Joint Representation of Spatial and Phonetic Features in the Human Core Auditory Cortex. Cell Rep 2020; 24:2051-2062.e2. [PMID: 30134167 DOI: 10.1016/j.celrep.2018.07.076] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 04/09/2018] [Accepted: 07/22/2018] [Indexed: 12/12/2022] Open
Abstract
The human auditory cortex simultaneously processes speech and determines the location of a speaker in space. Neuroimaging studies in humans have implicated core auditory areas in processing the spectrotemporal and the spatial content of sound; however, how these features are represented together is unclear. We recorded directly from human subjects implanted bilaterally with depth electrodes in core auditory areas as they listened to speech from different directions. We found local and joint selectivity to spatial and spectrotemporal speech features, where the spatial and spectrotemporal features are organized independently of each other. This representation enables successful decoding of both spatial and phonetic information. Furthermore, we found that the location of the speaker does not change the spectrotemporal tuning of the electrodes but, rather, modulates their mean response level. Our findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of speech processing.
Collapse
|
34
|
Bologna WJ, Ahlstrom JB, Dubno JR. Contributions of Voice Expectations to Talker Selection in Younger and Older Adults With Normal Hearing. Trends Hear 2020; 24:2331216520915110. [PMID: 32372720 PMCID: PMC7225833 DOI: 10.1177/2331216520915110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 03/02/2020] [Accepted: 03/03/2020] [Indexed: 11/17/2022] Open
Abstract
Focused attention on expected voice features, such as fundamental frequency (F0) and spectral envelope, may facilitate segregation and selection of a target talker in competing talker backgrounds. Age-related declines in attention may limit these abilities in older adults, resulting in poorer speech understanding in complex environments. To test this hypothesis, younger and older adults with normal hearing listened to sentences with a single competing talker. For most trials, listener attention was directed to the target by a cue phrase that matched the target talker's F0 and spectral envelope. For a small percentage of randomly occurring probe trials, the target's voice unexpectedly differed from the cue phrase in terms of F0 and spectral envelope. Overall, keyword recognition for the target talker was poorer for older adults than younger adults. Keyword recognition was poorer on probe trials than standard trials for both groups, and incorrect responses on probe trials contained keywords from the single-talker masker. No interaction was observed between age-group and the decline in keyword recognition on probe trials. Thus, reduced performance by older adults overall could not be attributed to declines in attention to an expected voice. Rather, other cognitive abilities, such as speed of processing and linguistic closure, were predictive of keyword recognition for younger and older adults. Moreover, the effects of age interacted with the sex of the target talker, such that older adults had greater difficulty understanding target keywords from female talkers than male talkers.
Collapse
Affiliation(s)
- William J. Bologna
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Jayne B. Ahlstrom
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| | - Judy R. Dubno
- Department of Otolaryngology—Head and Neck Surgery, Medical University of South Carolina
| |
Collapse
|
35
|
Rennies J, Best V, Roverud E, Kidd G. Energetic and Informational Components of Speech-on-Speech Masking in Binaural Speech Intelligibility and Perceived Listening Effort. Trends Hear 2019; 23:2331216519854597. [PMID: 31172880 PMCID: PMC6557024 DOI: 10.1177/2331216519854597] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Speech perception in complex sound fields can greatly benefit from different unmasking cues to segregate the target from interfering voices. This study investigated the role of three unmasking cues (spatial separation, gender differences, and masker time reversal) on speech intelligibility and perceived listening effort in normal-hearing listeners. Speech intelligibility and categorically scaled listening effort were measured for a female target talker masked by two competing talkers with no unmasking cues or one to three unmasking cues. In addition to natural stimuli, all measurements were also conducted with glimpsed speech—which was created by removing the time–frequency tiles of the speech mixture in which the maskers dominated the mixture—to estimate the relative amounts of informational and energetic masking as well as the effort associated with source segregation. The results showed that all unmasking cues as well as glimpsing improved intelligibility and reduced listening effort and that providing more than one cue was beneficial in overcoming informational masking. The reduction in listening effort due to glimpsing corresponded to increases in signal-to-noise ratio of 8 to 18 dB, indicating that a significant amount of listening effort was devoted to segregating the target from the maskers. Furthermore, the benefit in listening effort for all unmasking cues extended well into the range of positive signal-to-noise ratios at which speech intelligibility was at ceiling, suggesting that listening effort is a useful tool for evaluating speech-on-speech masking conditions at typical conversational levels.
Collapse
Affiliation(s)
- Jan Rennies
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA.,2 Fraunhofer Institute for Digital Media Technology IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany.,3 Cluster of Excellence Hearing4all, Carl-von-Ossietzky University, Oldenburg, Germany
| | - Virginia Best
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | - Elin Roverud
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| | - Gerald Kidd
- 1 Department of Speech, Language and Hearing Sciences, Boston University, MA, USA
| |
Collapse
|
36
|
Rigato C, Reinfeldt S, Asp F. The effect of an active transcutaneous bone conduction device on spatial release from masking. Int J Audiol 2019; 59:348-359. [PMID: 31873054 DOI: 10.1080/14992027.2019.1705406] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Objective: The aim was to quantify the effect of the experimental active transcutaneous Bone Conduction Implant (BCI) on spatial release from masking (SRM) in subjects with bilateral or unilateral conductive and mixed hearing loss.Design: Measurements were performed in a sound booth with five loudspeakers at 0°, +/-30° and +/-150° azimuth. Target speech was presented frontally, and interfering speech from either the front (co-located) or surrounding (separated) loudspeakers. SRM was calculated as the difference between the separated and the co-located speech recognition threshold (SRT).Study Sample: Twelve patients (aged 22-76 years) unilaterally implanted with the BCI were included.Results: A positive SRM, reflecting a benefit of spatially separating interferers from target speech, existed for all subjects in unaided condition, and for nine subjects (75%) in aided condition. Aided SRM was lower compared to unaided in nine of the subjects. There was no difference in SRM between patients with bilateral and unilateral hearing loss. In aided situation, SRT improved only for patients with bilateral hearing loss.Conclusions: The BCI fitted unilaterally in patients with bilateral or unilateral conductive/mixed hearing loss seems to reduce SRM. However, data indicates that SRT is improved or maintained for patients with bilateral and unilateral hearing loss, respectively.
Collapse
Affiliation(s)
- Cristina Rigato
- Division of Signal Processing and Biomedical Engineering, Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Sabine Reinfeldt
- Division of Signal Processing and Biomedical Engineering, Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden
| | - Filip Asp
- Division of Signal Processing and Biomedical Engineering, Department of Electrical Engineering, Chalmers University of Technology, Gothenburg, Sweden.,Division of Ear, Nose and Throat Diseases, Department of Clinical Science, Intervention and Technology Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
37
|
Jain C, Dwarakanath VM, G A. Suprathreshold Processing and Cocktail Party Listening in Younger and Older Adults with Normal Hearing. AGEING INTERNATIONAL 2019. [DOI: 10.1007/s12126-019-09356-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
38
|
Li Y, Wang F, Chen Y, Cichocki A, Sejnowski T. The Effects of Audiovisual Inputs on Solving the Cocktail Party Problem in the Human Brain: An fMRI Study. Cereb Cortex 2019; 28:3623-3637. [PMID: 29029039 DOI: 10.1093/cercor/bhx235] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Indexed: 11/13/2022] Open
Abstract
At cocktail parties, our brains often simultaneously receive visual and auditory information. Although the cocktail party problem has been widely investigated under auditory-only settings, the effects of audiovisual inputs have not. This study explored the effects of audiovisual inputs in a simulated cocktail party. In our fMRI experiment, each congruent audiovisual stimulus was a synthesis of 2 facial movie clips, each of which could be classified into 1 of 2 emotion categories (crying and laughing). Visual-only (faces) and auditory-only stimuli (voices) were created by extracting the visual and auditory contents from the synthesized audiovisual stimuli. Subjects were instructed to selectively attend to 1 of the 2 objects contained in each stimulus and to judge its emotion category in the visual-only, auditory-only, and audiovisual conditions. The neural representations of the emotion features were assessed by calculating decoding accuracy and brain pattern-related reproducibility index based on the fMRI data. We compared the audiovisual condition with the visual-only and auditory-only conditions and found that audiovisual inputs enhanced the neural representations of emotion features of the attended objects instead of the unattended objects. This enhancement might partially explain the benefits of audiovisual inputs for the brain to solve the cocktail party problem.
Collapse
Affiliation(s)
- Yuanqing Li
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Fangyi Wang
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Yongbin Chen
- Center for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, China.,Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou, China
| | - Andrzej Cichocki
- Riken Brain Science Institute, Wako shi, Japan.,Skolkovo Institute of Science and Technology (SKOTECH), Moscow, Russia
| | - Terrence Sejnowski
- Neurobiology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA, USA
| |
Collapse
|
39
|
Choi JY, Perrachione TK. Time and information in perceptual adaptation to speech. Cognition 2019; 192:103982. [PMID: 31229740 PMCID: PMC6732236 DOI: 10.1016/j.cognition.2019.05.019] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 05/11/2019] [Accepted: 05/25/2019] [Indexed: 11/18/2022]
Abstract
Perceptual adaptation to a talker enables listeners to efficiently resolve the many-to-many mapping between variable speech acoustics and abstract linguistic representations. However, models of speech perception have not delved into the variety or the quantity of information necessary for successful adaptation, nor how adaptation unfolds over time. In three experiments using speeded classification of spoken words, we explored how the quantity (duration), quality (phonetic detail), and temporal continuity of talker-specific context contribute to facilitating perceptual adaptation to speech. In single- and mixed-talker conditions, listeners identified phonetically-confusable target words in isolation or preceded by carrier phrases of varying lengths and phonetic content, spoken by the same talker as the target word. Word identification was always slower in mixed-talker conditions than single-talker ones. However, interference from talker variability decreased as the duration of preceding speech increased but was not affected by the amount of preceding talker-specific phonetic information. Furthermore, efficiency gains from adaptation depended on temporal continuity between preceding speech and the target word. These results suggest that perceptual adaptation to speech may be understood via models of auditory streaming, where perceptual continuity of an auditory object (e.g., a talker) facilitates allocation of attentional resources, resulting in more efficient perceptual processing.
Collapse
Affiliation(s)
- Ja Young Choi
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, United States
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, United States.
| |
Collapse
|
40
|
Addleman DA, Jiang YV. Experience-Driven Auditory Attention. Trends Cogn Sci 2019; 23:927-937. [PMID: 31521482 DOI: 10.1016/j.tics.2019.08.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 08/19/2019] [Accepted: 08/19/2019] [Indexed: 12/01/2022]
Abstract
In addition to conscious goals and stimulus salience, an observer's prior experience also influences selective attention. Early studies demonstrated experience-driven effects on attention mainly in the visual modality, but increasing evidence shows that experience drives auditory selection as well. We review evidence for a multiple-levels framework of auditory attention, in which experience-driven attention relies on mechanisms that acquire control settings and mechanisms that guide attention towards selected stimuli. Mechanisms of acquisition include cue-target associative learning, reward learning, and sensitivity to prior selection history. Once acquired, implementation of these biases can occur either consciously or unconsciously. Future research should more fully characterize the sources of experience-driven auditory attention and investigate the neural mechanisms used to acquire and implement experience-driven auditory attention.
Collapse
Affiliation(s)
- Douglas A Addleman
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Yuhong V Jiang
- Department of Psychology, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
41
|
DU YIHANG, FANG WEINING, QIU HANZHAO. DEVELOPMENT AND VALIDATION OF A METHOD TO ENHANCE AUDITORY ATTENTION DURING CONTINUOUS SPEECH-SHAPED NOISE ENVIRONMENT. J MECH MED BIOL 2019. [DOI: 10.1142/s0219519419500489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Auditory training (AT) may strengthen auditory skills that help human not only in on-task auditory perception performance but in continuous speech-shaped noise (SSN) environment. AT based on musical material has provided some evidence for an “auditory advantage” in understanding speech-in-noise (SIN), but with a long period training and complex procedure. Experimental research is essential to develop a simplified method named auditory target tracking training (ATT) which refined from musical material is necessary to determine the benefits of training. We developed two kinds of refined AT method: basic auditory target tracking (BAT) training and enhanced auditory target tracking (EAT) training to adult participants ([Formula: see text]) separately for 20 units, assessing performance to perceive speech in noise environment after training. The EAT group presented better speech perception performance than the other groups and no significant differences between BAT group and control group. The training effect of EAT is the most significant when uni-gender SSN and [Formula: see text] dB. Outcomes suggest that efficacy of trained EAT can improve speech perception performance and selective attention during SSN environment. These findings provide an important link between musical-based training and auditory selective attention in real-world, and extended to special vocational training.
Collapse
Affiliation(s)
- YIHANG DU
- School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, P. R. China
| | - WEINING FANG
- State Key Lab of Rail Traffic Control & Safety, Beijing Jiaotong University, Beijing 100044, P. R. China
| | - HANZHAO QIU
- School of Mechanical, Electronic and Control Engineering, Beijing Jiaotong University, Beijing 100044, P. R. China
| |
Collapse
|
42
|
Multisensory feature integration in (and out) of the focus of spatial attention. Atten Percept Psychophys 2019; 82:363-376. [DOI: 10.3758/s13414-019-01813-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
43
|
Zobel BH, Wagner A, Sanders LD, Başkent D. Spatial release from informational masking declines with age: Evidence from a detection task in a virtual separation paradigm. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:548. [PMID: 31370625 DOI: 10.1121/1.5118240] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Accepted: 06/28/2019] [Indexed: 06/10/2023]
Abstract
Declines in spatial release from informational masking may contribute to the speech-processing difficulties that older adults often experience within complex listening environments. The present study sought to answer two fundamental questions: (1) Does spatial release from informational masking decline with age and, if so, (2) does age predict this decline independently of age-typical hearing loss? Younger (18-34 years) and older (60-80 years) adults with age-typical hearing completed a yes/no target-detection task with low-pass filtered noise-vocoded speech designed to reduce non-spatial segregation cues and control for hearing loss. Participants detected a target voice among two-talker masking babble while a virtual spatial separation paradigm [Freyman, Helfer, McCall, and Clifton, J. Acoust. Soc. Am. 106(6), 3578-3588 (1999)] was used to isolate informational masking release. The younger and older adults both exhibited spatial release from informational masking, but masking release was reduced among the older adults. Furthermore, age predicted this decline controlling for hearing loss, while there was no indication that hearing loss played a role. These findings provide evidence that declines specific to aging limit spatial release from informational masking under challenging listening conditions.
Collapse
Affiliation(s)
- Benjamin H Zobel
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Anita Wagner
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Lisa D Sanders
- Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, Massachusetts 01003, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology-Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| |
Collapse
|
44
|
Lin G, Carlile S. The Effects of Switching Non-Spatial Attention During Conversational Turn Taking. Sci Rep 2019; 9:8057. [PMID: 31147609 PMCID: PMC6542845 DOI: 10.1038/s41598-019-44560-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 05/17/2019] [Indexed: 11/09/2022] Open
Abstract
This study examined the effect of a change in target voice on word recall during a multi-talker conversation. Two experiments were conducted using matrix sentences to assess the cost of a single endogenous switch in non-spatial attention. Performance in a yes-no recognition task was significantly worse when a target voice changed compared to when it remained the same after a turn-taking gap. We observed a decrease in target hit rate and sensitivity, and an increase in masker confusion errors following a change in voice. These results highlight the cognitive demands of not only engaging attention on a new talker, but also of disengaging attention from a previous target voice. This shows that exposure to a voice can have a biasing effect on attention that persists well after a turn-taking gap. A second experiment showed that there was no change in switching performance using different talker combinations. This demonstrates that switching costs were consistent and did not depend on the degree of acoustic differences in target voice characteristics.
Collapse
Affiliation(s)
- Gaven Lin
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia.
| | - Simon Carlile
- School of Medical Sciences and The Bosch Institute, University of Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
45
|
Szalárdy O, Tóth B, Farkas D, György E, Winkler I. Neuronal Correlates of Informational and Energetic Masking in the Human Brain in a Multi-Talker Situation. Front Psychol 2019; 10:786. [PMID: 31024409 PMCID: PMC6465330 DOI: 10.3389/fpsyg.2019.00786] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2018] [Accepted: 03/21/2019] [Indexed: 11/13/2022] Open
Abstract
Human listeners can follow the voice of one speaker while several others are talking at the same time. This process requires segregating the speech streams from each other and continuously directing attention to the target stream. We investigated the functional brain networks underlying this ability. Two speech streams were presented simultaneously to participants, who followed one of them and detected targets within it (target stream). The loudness of the distractor speech stream varied on five levels: moderately softer, slightly softer, equal, slightly louder, or moderately louder than the attended. Performance measures showed that the most demanding task was the moderately softer distractors condition, which indicates that a softer distractor speech may receive more covert attention than louder distractors and, therefore, they require more cognitive resources. EEG-based measurement of functional connectivity between various brain regions revealed frequency-band specific networks: (1) energetic masking (comparing the louder distractor conditions with the equal loudness condition) was predominantly associated with stronger connectivity between the frontal and temporal regions at the lower alpha (8–10 Hz) and gamma (30–70 Hz) bands; (2) informational masking (comparing the softer distractor conditions with the equal loudness condition) was associated with a distributed network between parietal, frontal, and temporal regions at the theta (4–8 Hz) and beta (13–30 Hz) bands. These results suggest the presence of distinct cognitive and neural processes for solving the interference from energetic vs. informational masking.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary.,Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Erika György
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
46
|
Evaluating the Performance of a Visually Guided Hearing Aid Using a Dynamic Auditory-Visual Word Congruence Task. Ear Hear 2019; 39:756-769. [PMID: 29252977 DOI: 10.1097/aud.0000000000000532] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The "visually guided hearing aid" (VGHA), consisting of a beamforming microphone array steered by eye gaze, is an experimental device being tested for effectiveness in laboratory settings. Previous studies have found that beamforming without visual steering can provide significant benefits (relative to natural binaural listening) for speech identification in spatialized speech or noise maskers when sound sources are fixed in location. The aim of the present study was to evaluate the performance of the VGHA in listening conditions in which target speech could switch locations unpredictably, requiring visual steering of the beamforming. To address this aim, the present study tested an experimental simulation of the VGHA in a newly designed dynamic auditory-visual word congruence task. DESIGN Ten young normal-hearing (NH) and 11 young hearing-impaired (HI) adults participated. On each trial, three simultaneous spoken words were presented from three source positions (-30, 0, and 30 azimuth). An auditory-visual word congruence task was used in which participants indicated whether there was a match between the word printed on a screen at a location corresponding to the target source and the spoken target word presented acoustically from that location. Performance was compared for a natural binaural condition (stimuli presented using impulse responses measured on KEMAR), a simulated VGHA condition (BEAM), and a hybrid condition that combined lowpass-filtered KEMAR and highpass-filtered BEAM information (BEAMAR). In some blocks, the target remained fixed at one location across trials, and in other blocks, the target could transition in location between one trial and the next with a fixed but low probability. RESULTS Large individual variability in performance was observed. There were significant benefits for the hybrid BEAMAR condition relative to the KEMAR condition on average for both NH and HI groups when the targets were fixed. Although not apparent in the averaged data, some individuals showed BEAM benefits relative to KEMAR. Under dynamic conditions, BEAM and BEAMAR performance dropped significantly immediately following a target location transition. However, performance recovered by the second word in the sequence and was sustained until the next transition. CONCLUSIONS When performance was assessed using an auditory-visual word congruence task, the benefits of beamforming reported previously were generally preserved under dynamic conditions in which the target source could move unpredictably from one location to another (i.e., performance recovered rapidly following source transitions) while the observer steered the beamforming via eye gaze, for both young NH and young HI groups.
Collapse
|
47
|
Neural Switch Asymmetry in Feature-Based Auditory Attention Tasks. J Assoc Res Otolaryngol 2019; 20:205-215. [PMID: 30675674 DOI: 10.1007/s10162-018-00713-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 12/28/2018] [Indexed: 10/27/2022] Open
Abstract
Active listening involves dynamically switching attention between competing talkers and is essential to following conversations in everyday environments. Previous investigations in human listeners have examined the neural mechanisms that support switching auditory attention within the acoustic featural cues of pitch and auditory space. Here, we explored the cortical circuitry underlying endogenous switching of auditory attention between pitch and spatial cues necessary to discern target from masker words. Because these tasks are of unequal difficulty, we expected an asymmetry in behavioral switch costs for hard-to-easy versus easy-to-hard switches, mirroring prior evidence from vision-based cognitive task-switching paradigms. We investigated the neural correlates of this behavioral switch asymmetry and associated cognitive control operations in the present auditory paradigm. Behaviorally, we observed no switch-cost asymmetry, i.e., no performance difference for switching from the more difficult attend-pitch to the easier attend-space condition (P→S) versus switching from easy-to-hard (S→P). However, left lateral prefrontal cortex activity, correlated with improved performance, was observed during a silent gap period when listeners switched attention from P→S, relative to switching within pitch cues. No such differential activity was seen for the analogous easy-to-hard switch. We hypothesize that this neural switch asymmetry reflects proactive cognitive control mechanisms that successfully reconfigured neurally-specified task parameters and resolved competition from other such "task sets," thereby obviating the expected behavioral switch-cost asymmetry. The neural switch activity observed was generally consistent with that seen in cognitive paradigms, suggesting that established cognitive models of attention switching may be productively applied to better understand similar processes in audition.
Collapse
|
48
|
Jain C, Dwarakanath VM, G A. Influence of subcortical auditory processing and cognitive measures on cocktail party listening in younger and older adults. Int J Audiol 2019; 58:87-96. [PMID: 30646763 DOI: 10.1080/14992027.2018.1543962] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
OBJECTIVE The study aimed to investigate the influence of subcortical auditory processing and cognitive measures on cocktail party listening in younger and older adults with normal hearing sensitivity. DESIGN Tests administered included quick speech perception in noise test to assess cocktail party listening, speech auditory brainstem response to assess subcortical auditory processing and digit span, digit sequencing and spatial selective attention test to assess cognitive processing. STUDY SAMPLE A total of 92 participants with normal hearing sensitivity participated in the study. They were divided into two groups: 52 young adults (20-40 years) and 40 older adults (60-80 years). RESULTS The older adults performed significantly poorer than, the younger adults on the quick speech perception in noise test and various cognitive measures. Further, cognitive measures correlated with speech perception in noise in younger and older adults. The results of this study also showed that there was a significant deterioration in brainstem encoding of speech with ageing. Further, it was also noted that the fundamental frequency of the speech auditory brainstem response correlated with speech perception in noise. CONCLUSIONS It can be concluded from this study that subcortical auditory processing and cognitive measures play a role in cocktail party listening.
Collapse
Affiliation(s)
- Chandni Jain
- a Department of Audiology , All India Institute of Speech and Hearing , Mysore , India
| | | | - Amritha G
- a Department of Audiology , All India Institute of Speech and Hearing , Mysore , India
| |
Collapse
|
49
|
Kidd G, Mason CR, Best V, Roverud E, Swaminathan J, Jennings T, Clayton K, Steven Colburn H. Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:440. [PMID: 30710924 PMCID: PMC6347574 DOI: 10.1121/1.5087555] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 11/19/2018] [Accepted: 12/18/2018] [Indexed: 05/20/2023]
Abstract
The ability to identify the words spoken by one talker masked by two or four competing talkers was tested in young-adult listeners with sensorineural hearing loss (SNHL). In a reference/baseline condition, masking speech was colocated with target speech, target and masker talkers were female, and the masker was intelligible. Three comparison conditions included replacing female masker talkers with males, time-reversal of masker speech, and spatial separation of sources. All three variables produced significant release from masking. To emulate energetic masking (EM), stimuli were subjected to ideal time-frequency segregation retaining only the time-frequency units where target energy exceeded masker energy. Subjects were then tested with these resynthesized "glimpsed stimuli." For either two or four maskers, thresholds only varied about 3 dB across conditions suggesting that EM was roughly equal. Compared to normal-hearing listeners from an earlier study [Kidd, Mason, Swaminathan, Roverud, Clayton, and Best, J. Acoust. Soc. Am. 140, 132-144 (2016)], SNHL listeners demonstrated both greater energetic and informational masking as well as higher glimpsed thresholds. Individual differences were correlated across masking release conditions suggesting that listeners could be categorized according to their general ability to solve the task. Overall, both peripheral and central factors appear to contribute to the higher thresholds for SNHL listeners.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Todd Jennings
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kameron Clayton
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - H Steven Colburn
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
50
|
Villard S, Kidd G. Effects of Acquired Aphasia on the Recognition of Speech Under Energetic and Informational Masking Conditions. Trends Hear 2019; 23:2331216519884480. [PMID: 31694486 PMCID: PMC7000861 DOI: 10.1177/2331216519884480] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 09/24/2019] [Accepted: 10/01/2019] [Indexed: 11/16/2022] Open
Abstract
Persons with aphasia (PWA) often report difficulty understanding spoken language in noisy environments that require listeners to identify and selectively attend to target speech while ignoring competing background sounds or “maskers.” This study compared the performance of PWA and age-matched healthy controls (HC) on a masked speech identification task and examined the consequences of different types of masking on performance. Twelve PWA and 12 age-matched HC completed a speech identification task comprising three conditions designed to differentiate between the effects of energetic and informational masking on receptive speech processing. The target and masker speech materials were taken from a closed-set matrix-style corpus, and a forced-choice word identification task was used. Target and maskers were spatially separated from one another in order to simulate real-world listening environments and allow listeners to make use of binaural cues for source segregation. Individualized frequency-specific gain was applied to compensate for the effects of hearing loss. Although both groups showed similar susceptibility to the effects of energetic masking, PWA were more susceptible than age-matched HC to the effects of informational masking. Results indicate that this increased susceptibility cannot be attributed to age, hearing loss, or comprehension deficits and is therefore a consequence of acquired cognitive-linguistic impairments associated with aphasia. This finding suggests that aphasia may result in increased difficulty segregating target speech from masker speech, which in turn may have implications for the ability of PWA to comprehend target speech in multitalker environments, such as restaurants, family gatherings, and other everyday situations.
Collapse
Affiliation(s)
- Sarah Villard
- Department of Speech, Language & Hearing Sciences,
Boston University, MA, USA
| | - Gerald Kidd
- Department of Speech, Language & Hearing Sciences,
Boston University, MA, USA
| |
Collapse
|