1
|
Thomassen S, Hartung K, Einhäuser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2758. [PMID: 36456271 DOI: 10.1121/10.0015054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 06/17/2023]
Abstract
Sequential auditory scene analysis (ASA) is often studied using sequences of two alternating tones, such as ABAB or ABA_, with "_" denoting a silent gap, and "A" and "B" sine tones differing in frequency (nominally low and high). Many studies implicitly assume that the specific arrangement (ABAB vs ABA_, as well as low-high-low vs high-low-high within ABA_) plays a negligible role, such that decisions about the tone pattern can be governed by other considerations. To explicitly test this assumption, a systematic comparison of different tone patterns for two-tone sequences was performed in three different experiments. Participants were asked to report whether they perceived the sequences as originating from a single sound source (integrated) or from two interleaved sources (segregated). Results indicate that core findings of sequential ASA, such as an effect of frequency separation on the proportion of integrated and segregated percepts, are similar across the different patterns during prolonged listening. However, at sequence onset, the integrated percept was more likely to be reported by the participants in ABA_low-high-low than in ABA_high-low-high sequences. This asymmetry is important for models of sequential ASA, since the formation of percepts at onset is an integral part of understanding how auditory interpretations build up.
Collapse
Affiliation(s)
- Sabine Thomassen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Kevin Hartung
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
2
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Winkler I. Do we parse the background into separate streams in the cocktail party? Front Hum Neurosci 2022; 16:952557. [DOI: 10.3389/fnhum.2022.952557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.
Collapse
|
3
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Honbolygó F, Winkler I. Linguistic predictability influences auditory stimulus classification within two concurrent speech streams. Psychophysiology 2020; 57:e13547. [DOI: 10.1111/psyp.13547] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 01/20/2020] [Accepted: 01/22/2020] [Indexed: 11/30/2022]
Affiliation(s)
- Orsolya Szalárdy
- Faculty of Medicine Institute of Behavioural Sciences Semmelweis University Budapest Hungary
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| | - Dávid Farkas
- Analytics Development, Performance Management and Analytics, Business Development, Integrated Supply Chain Management, Nokia Business Services, Nokia Operations, Nokia Budapest Hungary
| | - Gábor Orosz
- Department of Psychology Stanford University Stanford CA USA
| | - Ferenc Honbolygó
- Brain Imaging Centre Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
- Institute of Psychology ELTE Eötvös Loránd University Budapest Hungary
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology Research Centre for Natural Sciences Hungarian Academy of Sciences Budapest Hungary
| |
Collapse
|
4
|
Little DF, Snyder JS, Elhilali M. Ensemble modeling of auditory streaming reveals potential sources of bistability across the perceptual hierarchy. PLoS Comput Biol 2020; 16:e1007746. [PMID: 32275706 PMCID: PMC7185718 DOI: 10.1371/journal.pcbi.1007746] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 04/27/2020] [Accepted: 02/25/2020] [Indexed: 11/19/2022] Open
Abstract
Perceptual bistability-the spontaneous, irregular fluctuation of perception between two interpretations of a stimulus-occurs when observing a large variety of ambiguous stimulus configurations. This phenomenon has the potential to serve as a tool for, among other things, understanding how function varies across individuals due to the large individual differences that manifest during perceptual bistability. Yet it remains difficult to interpret the functional processes at work, without knowing where bistability arises during perception. In this study we explore the hypothesis that bistability originates from multiple sources distributed across the perceptual hierarchy. We develop a hierarchical model of auditory processing comprised of three distinct levels: a Peripheral, tonotopic analysis, a Central analysis computing features found more centrally in the auditory system, and an Object analysis, where sounds are segmented into different streams. We model bistable perception within this system by applying adaptation, inhibition and noise into one or all of the three levels of the hierarchy. We evaluate a large ensemble of variations of this hierarchical model, where each model has a different configuration of adaptation, inhibition and noise. This approach avoids the assumption that a single configuration must be invoked to explain the data. Each model is evaluated based on its ability to replicate two hallmarks of bistability during auditory streaming: the selectivity of bistability to specific stimulus configurations, and the characteristic log-normal pattern of perceptual switches. Consistent with a distributed origin, a broad range of model parameters across this hierarchy lead to a plausible form of perceptual bistability.
Collapse
Affiliation(s)
- David F. Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas; Las Vegas, Nevada, United States of America
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
5
|
Gandras K, Grimm S, Bendixen A. Electrophysiological Correlates of Speaker Segregation and Foreground-Background Selection in Ambiguous Listening Situations. Neuroscience 2018; 389:19-29. [PMID: 28735101 DOI: 10.1016/j.neuroscience.2017.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2017] [Revised: 07/10/2017] [Accepted: 07/10/2017] [Indexed: 11/15/2022]
Abstract
In everyday listening environments, a main task for our auditory system is to follow one out of multiple speakers talking simultaneously. The present study was designed to find electrophysiological indicators of two central processes involved - segregating the speech mixture into distinct speech sequences corresponding to the two speakers, and then attending to one of the speech sequences. We generated multistable speech stimuli that were set up to create ambiguity as to whether only one or two speakers are talking. Thereby we were able to investigate three perceptual alternatives (no segregation, segregated - speaker A in the foreground, segregated - speaker B in the foreground) without any confounding stimulus changes. Participants listened to a continuously repeating sequence of syllables, which were uttered alternately by two human speakers, and indicated whether they perceived the sequence as an inseparable mixture or as originating from two separate speakers. In the latter case, they distinguished which speaker was in their attentional foreground. Our data show a long-lasting event-related potential (ERP) modulation starting at 130ms after stimulus onset, which can be explained by the perceptual organization of the two speech sequences into attended foreground and ignored background streams. Our paradigm extends previous work with pure-tone sequences toward speech stimuli and adds the possibility to obtain neural correlates of the difficulty to segregate a speech mixture into distinct streams.
Collapse
Affiliation(s)
- Katharina Gandras
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany.
| | - Sabine Grimm
- Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| | - Alexandra Bendixen
- Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany; Department of Physics, School of Natural Sciences, Chemnitz University of Technology, D-09126 Chemnitz, Germany.
| |
Collapse
|
6
|
Neural Decoding of Bistable Sounds Reveals an Effect of Intention on Perceptual Organization. J Neurosci 2018; 38:2844-2853. [PMID: 29440556 PMCID: PMC5852662 DOI: 10.1523/jneurosci.3022-17.2018] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 01/21/2018] [Accepted: 02/06/2018] [Indexed: 12/05/2022] Open
Abstract
Auditory signals arrive at the ear as a mixture that the brain must decompose into distinct sources based to a large extent on acoustic properties of the sounds. An important question concerns whether listeners have voluntary control over how many sources they perceive. This has been studied using pure high (H) and low (L) tones presented in the repeating pattern HLH-HLH-, which can form a bistable percept heard either as an integrated whole (HLH-) or as segregated into high (H-H-) and low (-L-) sequences. Although instructing listeners to try to integrate or segregate sounds affects reports of what they hear, this could reflect a response bias rather than a perceptual effect. We had human listeners (15 males, 12 females) continuously report their perception of such sequences and recorded neural activity using MEG. During neutral listening, a classifier trained on patterns of neural activity distinguished between periods of integrated and segregated perception. In other conditions, participants tried to influence their perception by allocating attention either to the whole sequence or to a subset of the sounds. They reported hearing the desired percept for a greater proportion of time than when listening neutrally. Critically, neural activity supported these reports; stimulus-locked brain responses in auditory cortex were more likely to resemble the signature of segregation when participants tried to hear segregation than when attempting to perceive integration. These results indicate that listeners can influence how many sound sources they perceive, as reflected in neural responses that track both the input and its perceptual organization. SIGNIFICANCE STATEMENT Can we consciously influence our perception of the external world? We address this question using sound sequences that can be heard either as coming from a single source or as two distinct auditory streams. Listeners reported spontaneous changes in their perception between these two interpretations while we recorded neural activity to identify signatures of such integration and segregation. They also indicated that they could, to some extent, choose between these alternatives. This claim was supported by corresponding changes in responses in auditory cortex. By linking neural and behavioral correlates of perception, we demonstrate that the number of objects that we perceive can depend not only on the physical attributes of our environment, but also on how we intend to experience it.
Collapse
|
7
|
Thomassen S, Bendixen A. Subjective perceptual organization of a complex auditory scene. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:265. [PMID: 28147594 DOI: 10.1121/1.4973806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Empirical research on the sequential decomposition of an auditory scene primarily relies on interleaved sound mixtures of only two tone sequences (e.g., ABAB…). This oversimplifies the sound decomposition problem by limiting the number of putative perceptual organizations. The current study used a sound mixture composed of three different tones (ABCABC…) that could be perceptually organized in many different ways. Participants listened to these sequences and reported their subjective perception by continuously choosing one out of 12 visually presented perceptual organization alternatives. Different levels of frequency and spatial separation were implemented to check whether participants' perceptual reports would be systematic and plausible. As hypothesized, while perception switched back and forth in each condition between various perceptual alternatives (multistability), spatial as well as frequency separation generally raised the proportion of segregated and reduced the proportion of integrated alternatives. During segregated percepts, in contrast to the hypothesis, many participants had a tendency to perceive two streams in the foreground, rather than reporting alternatives with a clear foreground-background differentiation. Finally, participants perceived the organization with intermediate feature values (e.g., middle tones of the pattern) segregated in the foreground slightly less often than similar alternatives with outer feature values (e.g., higher tones).
Collapse
Affiliation(s)
- Sabine Thomassen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| |
Collapse
|
8
|
Billig AJ, Carlyon RP. Automaticity and primacy of auditory streaming: Concurrent subjective and objective measures. J Exp Psychol Hum Percept Perform 2015; 42:339-353. [PMID: 26414168 PMCID: PMC4763253 DOI: 10.1037/xhp0000146] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Two experiments used subjective and objective measures to study the automaticity and primacy of auditory streaming. Listeners heard sequences of “ABA–” triplets, where “A” and “B” were tones of different frequencies and “–” was a silent gap. Segregation was more frequently reported, and rhythmically deviant triplets less well detected, for a greater between-tone frequency separation and later in the sequence. In Experiment 1, performing a competing auditory task for the first part of the sequence led to a reduction in subsequent streaming compared to when the tones were attended throughout. This is consistent with focused attention promoting streaming, and/or with attention switches resetting it. However, the proportion of segregated reports increased more rapidly following a switch than at the start of a sequence, indicating that some streaming occurred automatically. Modeling ruled out a simple “covert attention” account of this finding. Experiment 2 required listeners to perform subjective and objective tasks concurrently. It revealed superior performance during integrated compared to segregated reports, beyond that explained by the codependence of the two measures on stimulus parameters. We argue that listeners have limited access to low-level stimulus representations once perceptual organization has occurred, and that subjective and objective streaming measures partly index the same processes.
Collapse
|
9
|
Fabiani M. The embodied brain. Psychophysiology 2014; 52:1-5. [DOI: 10.1111/psyp.12381] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 10/17/2014] [Indexed: 12/19/2022]
Affiliation(s)
- Monica Fabiani
- Beckman Institute; University of Illinois at Urbana-Champaign; Urbana Illinois USA
| |
Collapse
|
10
|
Schröger E, Bendixen A, Denham SL, Mill RW, Bőhm TM, Winkler I. Predictive Regularity Representations in Violation Detection and Auditory Stream Segregation: From Conceptual to Computational Models. Brain Topogr 2013; 27:565-77. [DOI: 10.1007/s10548-013-0334-6] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 11/13/2013] [Indexed: 11/24/2022]
|