1
|
Visuospatial attention revamps cortical processing of sound amid audiovisual uncertainty. Psychophysiology 2023; 60:e14329. [PMID: 37166096 DOI: 10.1111/psyp.14329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Revised: 04/13/2023] [Accepted: 04/25/2023] [Indexed: 05/12/2023]
Abstract
Selective attentional biases arising from one sensory modality manifest in others. The effects of visuospatial attention, important in visual object perception, are unclear in the auditory domain during audiovisual (AV) scene processing. We investigate temporal and spatial factors that underlie such transfer neurally. Auditory encoding of random tone pips in AV scenes was addressed via a temporal response function model (TRF) of participants' electroencephalogram (N = 30). The spatially uninformative pips were associated with spatially distributed visual contrast reversals ("flips"), through asynchronous probabilistic AV temporal onset distributions. Participants deployed visuospatial selection on these AV stimuli to perform a task. A late (~300 ms) cross-modal influence over the neural representation of pips was found in the original and a replication study (N = 21). Transfer depended on selected visual input being (i) presented during or shortly after a related sound, in relatively limited temporal distributions (<165 ms); (ii) positioned across limited (1:4) visual foreground to background ratios. Neural encoding of auditory input, as a function of visual input, was largest at visual foreground quadrant sectors and lowest at locations opposite to the target. The results indicate that ongoing neural representations of sounds incorporate visuospatial attributes for auditory stream segregation, as cross-modal transfer conveys information that specifies the identity of multisensory signals. A potential mechanism is by enhancing or recalibrating the tuning properties of the auditory populations that represent them as objects. The results account for the dynamic evolution under visual attention of multisensory integration, specifying critical latencies at which relevant cortical networks operate.
Collapse
|
2
|
Adaptation in the sensory cortex drives bistable switching during auditory stream segregation. Neurosci Conscious 2023; 2023:niac019. [PMID: 36751309 PMCID: PMC9899071 DOI: 10.1093/nc/niac019] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 10/17/2022] [Accepted: 12/26/2022] [Indexed: 02/06/2023] Open
Abstract
Current theories of perception emphasize the role of neural adaptation, inhibitory competition, and noise as key components that lead to switches in perception. Supporting evidence comes from neurophysiological findings of specific neural signatures in modality-specific and supramodal brain areas that appear to be critical to switches in perception. We used functional magnetic resonance imaging to study brain activity around the time of switches in perception while participants listened to a bistable auditory stream segregation stimulus, which can be heard as one integrated stream of tones or two segregated streams of tones. The auditory thalamus showed more activity around the time of a switch from segregated to integrated compared to time periods of stable perception of integrated; in contrast, the rostral anterior cingulate cortex and the inferior parietal lobule showed more activity around the time of a switch from integrated to segregated compared to time periods of stable perception of segregated streams, consistent with prior findings of asymmetries in brain activity depending on the switch direction. In sound-responsive areas in the auditory cortex, neural activity increased in strength preceding switches in perception and declined in strength over time following switches in perception. Such dynamics in the auditory cortex are consistent with the role of adaptation proposed by computational models of visual and auditory bistable switching, whereby the strength of neural activity decreases following a switch in perception, which eventually destabilizes the current percept enough to lead to a switch to an alternative percept.
Collapse
|
3
|
The Effect of Sound Localization on Auditory-Only and Audiovisual Speech Recognition in a Simulated Multitalker Environment. Trends Hear 2023; 27:23312165231186040. [PMID: 37415497 PMCID: PMC10331332 DOI: 10.1177/23312165231186040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 06/13/2023] [Accepted: 06/17/2023] [Indexed: 07/08/2023] Open
Abstract
Information regarding sound-source spatial location provides several speech-perception benefits, including auditory spatial cues for perceptual talker separation and localization cues to face the talker to obtain visual speech information. These benefits have typically been examined separately. A real-time processing algorithm for sound-localization degradation (LocDeg) was used to investigate how spatial-hearing benefits interact in a multitalker environment. Normal-hearing adults performed auditory-only and auditory-visual sentence recognition with target speech and maskers presented from loudspeakers at -90°, -36°, 36°, or 90° azimuths. For auditory-visual conditions, one target and three masking talker videos (always spatially separated) were rendered virtually in rectangular windows at these locations on a head-mounted display. Auditory-only conditions presented blank windows at these locations. Auditory target speech (always spatially aligned with the target video) was presented in co-located speech-shaped noise (experiment 1) or with three co-located or spatially separated auditory interfering talkers corresponding to the masker videos (experiment 2). In the co-located conditions, the LocDeg algorithm did not affect auditory-only performance but reduced target orientation accuracy, reducing auditory-visual benefit. In the multitalker environment, two spatial-hearing benefits were observed: perceptually separating competing speech based on auditory spatial differences and orienting to the target talker to obtain visual speech cues. These two benefits were additive, and both were diminished by the LocDeg algorithm. Although visual cues always improved performance when the target was accurately localized, there was no strong evidence that they provided additional assistance in perceptually separating co-located competing speech. These results highlight the importance of sound localization in everyday communication.
Collapse
|
4
|
Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
|
5
|
Evaluation of Auditory Stream Segregation in Musicians and Nonmusicians. Int Arch Otorhinolaryngol 2021; 25:e77-e80. [PMID: 33542755 PMCID: PMC7851367 DOI: 10.1055/s-0040-1709116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Accepted: 01/30/2020] [Indexed: 11/26/2022] Open
Abstract
Introduction
One of the major cues that help in auditory stream segregation is spectral profiling. Musicians are trained to perceive a fine structural variation in the acoustic stimuli and have enhanced temporal perception and speech perception in noise.
Objective
To analyze the differences in spectral profile thresholds in musicians and nonmusicians.
Methods
The spectral profile analysis threshold was compared between 2 groups (musicians and nonmusicians) in the age range between 15 and 30 years old. The stimuli had 5 harmonics, all at the same amplitude (f0 = 330 Hz, mi4). The third (variable tone) has a similar harmonic structure; however, the amplitude of the third harmonic component was higher, producing a different timbre in comparison with the standards. The subject had to identify the odd timbre tone. The testing was performed at 60 dB HL in a sound-treated room.
Results
The results of the study showed that the profile analysis thresholds were significantly better in musicians compared with nonmusicians. The result of the study also showed that the profile analysis thresholds were better with an increase in the duration of music training. Thus, improved auditory processing in musicians could have resulted in a better profile analysis threshold.
Conclusions
Auditory stream segregation was found to be better in musicians compared with nonmusicians, and the performance improved with an increase in several years of training. However, further studies are essential on a larger group with more variables for validation of the results.
Collapse
|
6
|
Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:268-287. [PMID: 37215227 PMCID: PMC10158587 DOI: 10.1162/nol_a_00013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/25/2020] [Indexed: 05/24/2023]
Abstract
Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound's amplitude envelope (i.e., syllabic rate or rhythm; 1-9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain's response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.
Collapse
|
7
|
Abstract
Objective: To investigate whether British children's performance is equivalent to North American norms on the listening in spatialised noise-sentences test (LiSN-S). Design: Prospective study comparing the performance of a single British group of children to North-American norms on the LiSN-S (North American version). Study sample: The British group was composed of 46 typically developing children, aged 6-11 years 11 months, from a mainstream primary school in London. Results: No significant difference was observed between the British's group performance and the North-American norms for Low-cue, High-cue, Spatial Advantage and Total Advantage measure. The British group presented a significantly lower performance only for Talker Advantage measure (z-score: 0.35, 95% confidence interval -0.12 to -0.59). Age was significantly correlated with all unstandardised measures. Conclusion: Our results indicate that, when assessing British children, it would be appropriate to add a corrective factor of 0.35 to the z-score value obtained for the Talker Advantage in order to compare it to the North-American norms. This strategy would enable the use of LiSN-S in the UK to assess auditory stream segregation based on spatial cues.
Collapse
|
8
|
The Music-In-Noise Task (MINT): A Tool for Dissecting Complex Auditory Perception. Front Neurosci 2019; 13:199. [PMID: 30930734 PMCID: PMC6427094 DOI: 10.3389/fnins.2019.00199] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 02/20/2019] [Indexed: 11/30/2022] Open
Abstract
The ability to segregate target sounds in noisy backgrounds is relevant both to neuroscience and to clinical applications. Recent research suggests that hearing-in-noise (HIN) problems are solved using combinations of sub-skills that are applied according to task demand and information availability. While evidence is accumulating for a musician advantage in HIN, the exact nature of the reported training effect is not fully understood. Existing HIN tests focus on tasks requiring understanding of speech in the presence of competing sound. Because visual, spatial and predictive cues are not systematically considered in these tasks, few tools exist to investigate the most relevant components of cognitive processes involved in stream segregation. We present the Music-In-Noise Task (MINT) as a flexible tool to expand HIN measures beyond speech perception, and for addressing research questions pertaining to the relative contributions of HIN sub-skills, inter-individual differences in their use, and their neural correlates. The MINT uses a match-mismatch trial design: in four conditions (Baseline, Rhythm, Spatial, and Visual) subjects first hear a short instrumental musical excerpt embedded in an informational masker of "multi-music" noise, followed by either a matching or scrambled repetition of the target musical excerpt presented in silence; the four conditions differ according to the presence or absence of additional cues. In a fifth condition (Prediction), subjects hear the excerpt in silence as a target first, which helps to anticipate incoming information when the target is embedded in masking sound. Data from samples of young adults show that the MINT has good reliability and internal consistency, and demonstrate selective benefits of musicianship in the Prediction, Rhythm, and Visual subtasks. We also report a performance benefit of multilingualism that is separable from that of musicianship. Average MINT scores were correlated with scores on a sentence-in-noise perception task, but only accounted for a relatively small percentage of the variance, indicating that the MINT is sensitive to additional factors and can provide a complement and extension of speech-based tests for studying stream segregation. A customizable version of the MINT is made available for use and extension by the scientific community.
Collapse
|
9
|
Assessing Top-Down and Bottom-Up Contributions to Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2018; 12:121. [PMID: 29563861 PMCID: PMC5845899 DOI: 10.3389/fnins.2018.00121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 02/15/2018] [Indexed: 11/24/2022] Open
Abstract
Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Collapse
|
10
|
Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
|
11
|
Stimulus Pauses and Perturbations Differentially Delay or Promote the Segregation of Auditory Objects: Psychoacoustics and Modeling. Front Neurosci 2017; 11:198. [PMID: 28473747 PMCID: PMC5397483 DOI: 10.3389/fnins.2017.00198] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 03/23/2017] [Indexed: 11/21/2022] Open
Abstract
Segregating distinct sound sources is fundamental for auditory perception, as in the cocktail party problem. In a process called the build-up of stream segregation, distinct sound sources that are perceptually integrated initially can be segregated into separate streams after several seconds. Previous research concluded that abrupt changes in the incoming sounds during build-up—for example, a step change in location, loudness or timing—reset the percept to integrated. Following this reset, the multisecond build-up process begins again. Neurophysiological recordings in auditory cortex (A1) show fast (subsecond) adaptation, but unified mechanistic explanations for the bias toward integration, multisecond build-up and resets remain elusive. Combining psychoacoustics and modeling, we show that initial unadapted A1 responses bias integration, that the slowness of build-up arises naturally from competition downstream, and that recovery of adaptation can explain resets. An early bias toward integrated perceptual interpretations arising from primary cortical stages that encode low-level features and feed into competition downstream could also explain similar phenomena in vision. Further, we report a previously overlooked class of perturbations that promote segregation rather than integration. Our results challenge current understanding for perturbation effects on the emergence of sound source segregation, leading to a new hypothesis for differential processing downstream of A1. Transient perturbations can momentarily redirect A1 responses as input to downstream competition units that favor segregation.
Collapse
|
12
|
Decision making and ambiguity in auditory stream segregation. Front Neurosci 2015; 9:266. [PMID: 26321899 PMCID: PMC4531241 DOI: 10.3389/fnins.2015.00266] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 07/14/2015] [Indexed: 12/01/2022] Open
Abstract
Researchers of auditory stream segregation have largely taken a bottom-up view on the link between physical stimulus parameters and the perceptual organization of sequences of ABAB sounds. However, in the majority of studies, researchers have relied on the reported decisions of the subjects regarding which of the predefined percepts (e.g., one stream or two streams) predominated when subjects listened to more or less ambiguous streaming sequences. When searching for neural mechanisms of stream segregation, it should be kept in mind that such decision processes may contribute to brain activation, as also suggested by recent human imaging data. The present study proposes that the uncertainty of a subject in making a decision about the perceptual organization of ambiguous streaming sequences may be reflected in the time required to make an initial decision. To this end, subjects had to decide on their current percept while listening to ABAB auditory streaming sequences. Each sequence had a duration of 30 s and was composed of A and B harmonic tone complexes differing in fundamental frequency (ΔF). Sequences with seven different ΔF were tested. We found that the initial decision time varied non-monotonically with ΔF and that it was significantly correlated with the degree of perceptual ambiguity defined from the proportions of time the subjects reported a one-stream or a two-stream percept subsequent to the first decision. This strong relation of the proposed measures of decision uncertainty and perceptual ambiguity should be taken into account when searching for neural correlates of auditory stream segregation.
Collapse
|
13
|
Auditory stream segregation using amplitude modulated bandpass noise. Front Psychol 2015; 6:1151. [PMID: 26300831 PMCID: PMC4528102 DOI: 10.3389/fpsyg.2015.01151] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 07/23/2015] [Indexed: 12/23/2022] Open
Abstract
The purpose of this study was to investigate the roles of spectral overlap and amplitude modulation (AM) rate for stream segregation for noise signals, as well as to test the build-up effect based on these two cues. Segregation ability was evaluated using an objective paradigm with listeners' attention focused on stream segregation. Stimulus sequences consisted of two interleaved sets of bandpass noise bursts (A and B bursts). The A and B bursts differed in spectrum, AM-rate, or both. The amount of the difference between the two sets of noise bursts was varied. Long and short sequences were studied to investigate the build-up effect for segregation based on spectral and AM-rate differences. Results showed the following: (1). Stream segregation ability increased with greater spectral separation. (2). Larger AM-rate separations were associated with stronger segregation abilities. (3). Spectral separation was found to elicit the build-up effect for the range of spectral differences assessed in the current study. (4). AM-rate separation interacted with spectral separation suggesting an additive effect of spectral separation and AM-rate separation on segregation build-up. The findings suggest that, when normal-hearing listeners direct their attention towards segregation, they are able to segregate auditory streams based on reduced spectral contrast cues that vary by the amount of spectral overlap. Further, regardless of the spectral separation they are able to use AM-rate difference as a secondary/weaker cue. Based on the spectral differences, listeners can segregate auditory streams better as the listening duration is prolonged—i.e., sparse spectral cues elicit build-up segregation; however, AM-rate differences only appear to elicit build-up when in combination with spectral difference cues.
Collapse
|
14
|
Auditory stream segregation using bandpass noises: evidence from event-related potentials. Front Neurosci 2014; 8:277. [PMID: 25309306 PMCID: PMC4162371 DOI: 10.3389/fnins.2014.00277] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 08/18/2014] [Indexed: 11/13/2022] Open
Abstract
The current study measured neural responses to investigate auditory stream segregation of noise stimuli with or without clear spectral contrast. Sequences of alternating A and B noise bursts were presented to elicit stream segregation in normal-hearing listeners. The successive B bursts in each sequence maintained an equal amount of temporal separation with manipulations introduced on the last stimulus. The last B burst was either delayed for 50% of the sequences or not delayed for the other 50%. The A bursts were jittered in between every two adjacent B bursts. To study the effects of spectral separation on streaming, the A and B bursts were further manipulated by using either bandpass-filtered noises widely spaced in center frequency or broadband noises. Event-related potentials (ERPs) to the last B bursts were analyzed to compare the neural responses to the delay vs. no-delay trials in both passive and attentive listening conditions. In the passive listening condition, a trend for a possible late mismatch negativity (MMN) or late discriminative negativity (LDN) response was observed only when the A and B bursts were spectrally separate, suggesting that spectral separation in the A and B burst sequences could be conducive to stream segregation at the pre-attentive level. In the attentive condition, a P300 response was consistently elicited regardless of whether there was spectral separation between the A and B bursts, indicating the facilitative role of voluntary attention in stream segregation. The results suggest that reliable ERP measures can be used as indirect indicators for auditory stream segregation in conditions of weak spectral contrast. These findings have important implications for cochlear implant (CI) studies-as spectral information available through a CI device or simulation is substantially degraded, it may require more attention to achieve stream segregation.
Collapse
|
15
|
Do audio-visual motion cues promote segregation of auditory streams? Front Neurosci 2014; 8:64. [PMID: 24778604 PMCID: PMC3985028 DOI: 10.3389/fnins.2014.00064] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 03/19/2014] [Indexed: 11/19/2022] Open
Abstract
An audio-visual experiment using moving sound sources was designed to investigate whether the analysis of auditory scenes is modulated by synchronous presentation of visual information. Listeners were presented with an alternating sequence of two pure tones delivered by two separate sound sources. In different conditions, the two sound sources were either stationary or moving on random trajectories around the listener. Both the sounds and the movement trajectories were derived from recordings in which two humans were moving with loudspeakers attached to their heads. Visualized movement trajectories modeled by a computer animation were presented together with the sounds. In the main experiment, behavioral reports on sound organization were collected from young healthy volunteers. The proportion and stability of the different sound organizations were compared between the conditions in which the visualized trajectories matched the movement of the sound sources and when the two were independent of each other. The results corroborate earlier findings that separation of sound sources in space promotes segregation. However, no additional effect of auditory movement per se on the perceptual organization of sounds was obtained. Surprisingly, the presentation of movement-congruent visual cues did not strengthen the effects of spatial separation on segregating auditory streams. Our findings are consistent with the view that bistability in the auditory modality can occur independently from other modalities.
Collapse
|
16
|
Predictability effects in auditory scene analysis: a review. Front Neurosci 2014; 8:60. [PMID: 24744695 PMCID: PMC3978260 DOI: 10.3389/fnins.2014.00060] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 03/14/2014] [Indexed: 12/02/2022] Open
Abstract
Many sound sources emit signals in a predictable manner. The idea that predictability can be exploited to support the segregation of one source's signal emissions from the overlapping signals of other sources has been expressed for a long time. Yet experimental evidence for a strong role of predictability within auditory scene analysis (ASA) has been scarce. Recently, there has been an upsurge in experimental and theoretical work on this topic resulting from fundamental changes in our perspective on how the brain extracts predictability from series of sensory events. Based on effortless predictive processing in the auditory system, it becomes more plausible that predictability would be available as a cue for sound source decomposition. In the present contribution, empirical evidence for such a role of predictability in ASA will be reviewed. It will be shown that predictability affects ASA both when it is present in the sound source of interest (perceptual foreground) and when it is present in other sound sources that the listener wishes to ignore (perceptual background). First evidence pointing toward age-related impairments in the latter capacity will be addressed. Moreover, it will be illustrated how effects of predictability can be shown by means of objective listening tests as well as by subjective report procedures, with the latter approach typically exploiting the multi-stable nature of auditory perception. Critical aspects of study design will be delineated to ensure that predictability effects can be unambiguously interpreted. Possible mechanisms for a functional role of predictability within ASA will be discussed, and an analogy with the old-plus-new heuristic for grouping simultaneous acoustic signals will be suggested.
Collapse
|
17
|
Attention effects on auditory scene analysis in children. Neuropsychologia 2009; 47:771-85. [PMID: 19124031 PMCID: PMC2643319 DOI: 10.1016/j.neuropsychologia.2008.12.007] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2008] [Revised: 11/24/2008] [Accepted: 12/03/2008] [Indexed: 10/21/2022]
Abstract
Auditory scene analysis begins in infancy, making it possible for the baby to distinguish its mother's voice from other noises in the environment. Despite the importance of this process for human behavior, the question of how perceptual sound organization develops during childhood is not well understood. The current study investigated the role of attention for perceiving sound streams in a group of school-aged children and young adults. We behaviorally determined the frequency separation at which a set of sounds was detected as one integrated or two separated streams and compared these measures with passively and actively obtained electrophysiological indices (mismatch negativity (MMN) and P3b) of the same sounds. In adults, there was a high degree of concordance between passive and active electrophysiological indices of stream segregation that matched with perception. In contrast, there was a large disparity in children. Active electrophysiological indices of streaming were concordant with behavioral measures of perception, whereas passive indices were not. In addition, children required larger frequency separations to perceive two streams compared to adults. Our results suggest that differences in stream segregation between children and adults reflect an under-development of basic auditory processing mechanisms, and indicate a developmental role of attention for shaping physiological responses that optimize processes engaged during passive audition.
Collapse
|
18
|
Abstract
This study investigates the effects of spectral separation of sounds on the ability of goldfish to acquire independent information about two simultaneous complex sources. Goldfish were conditioned to a complex sound made up of two sets of repeated acoustic pulses: a high-frequency pulse with a spectral envelope centered at 625 Hz, and a low-frequency pulse type centered at 240, 305, 390, or 500 Hz. The pulses were presented with each pulse type alternating with an overall pulse repetition rate of 40 pulses per second (pps), and a 20-pps rate between identical pulses. Two control groups were conditioned to the 625-Hz pulse alone, repeated at 40 and 20 pps, respectively. All groups were tested for generalization to the 625-Hz pulse repeated alone at several rates. If the two pulse types in the complex resulted in independent auditory streams, the animals were expected to generalize to the 625-Hz pulse trains as if they were repeated at 20 pps during conditioning. It was hypothesized that as the center frequency of the low-frequency pulse approached that of the 625-Hz pulse, the alternating trains would be perceived as a single auditory stream with a repetition rate of 40 pps. The group conditioned to alternating 625- and 240-Hz pulses generalized least, with maximum generalization at 20 Hz, suggesting that the animals formed at least one perceptual stream with a repetition rate of 20 pps. The other alternating pulse groups generalized to intermediate degrees. Goldfish can segregate at least one "auditory stream" from a complex mixture of sources. Segregation can be based on spectral envelope and grows more robust with growing spectral separation between the simultaneous sources. Auditory stream segregation and auditory scene analysis are shared among human listeners, European starlings, and goldfish, and may be primitive characteristics of the vertebrate sense of hearing.
Collapse
|