1
|
Suzuki M, Pennartz CMA, Aru J. How deep is the brain? The shallow brain hypothesis. Nat Rev Neurosci 2023; 24:778-791. [PMID: 37891398 DOI: 10.1038/s41583-023-00756-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/25/2023] [Indexed: 10/29/2023]
Abstract
Deep learning and predictive coding architectures commonly assume that inference in neural networks is hierarchical. However, largely neglected in deep learning and predictive coding architectures is the neurobiological evidence that all hierarchical cortical areas, higher or lower, project to and receive signals directly from subcortical areas. Given these neuroanatomical facts, today's dominance of cortico-centric, hierarchical architectures in deep learning and predictive coding networks is highly questionable; such architectures are likely to be missing essential computational principles the brain uses. In this Perspective, we present the shallow brain hypothesis: hierarchical cortical processing is integrated with a massively parallel process to which subcortical areas substantially contribute. This shallow architecture exploits the computational capacity of cortical microcircuits and thalamo-cortical loops that are not included in typical hierarchical deep learning and predictive coding networks. We argue that the shallow brain architecture provides several critical benefits over deep hierarchical structures and a more complete depiction of how mammalian brains achieve fast and flexible computational capabilities.
Collapse
Affiliation(s)
- Mototaka Suzuki
- Department of Cognitive and Systems Neuroscience, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands.
| | - Cyriel M A Pennartz
- Department of Cognitive and Systems Neuroscience, Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Jaan Aru
- Institute of Computer Science, University of Tartu, Tartu, Estonia.
| |
Collapse
|
2
|
Schröger E, Roeber U, Coy N. Markov chains as a proxy for the predictive memory representations underlying mismatch negativity. Front Hum Neurosci 2023; 17:1249413. [PMID: 37771348 PMCID: PMC10525344 DOI: 10.3389/fnhum.2023.1249413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 08/22/2023] [Indexed: 09/30/2023] Open
Abstract
Events not conforming to a regularity inherent to a sequence of events elicit prediction error signals of the brain such as the Mismatch Negativity (MMN) and impair behavioral task performance. Events conforming to a regularity lead to attenuation of brain activity such as stimulus-specific adaptation (SSA) and behavioral benefits. Such findings are usually explained by theories stating that the information processing system predicts the forthcoming event of the sequence via detected sequential regularities. A mathematical model that is widely used to describe, to analyze and to generate event sequences are Markov chains: They contain a set of possible events and a set of probabilities for transitions between these events (transition matrix) that allow to predict the next event on the basis of the current event and the transition probabilities. The accuracy of such a prediction depends on the distribution of the transition probabilities. We argue that Markov chains also have useful applications when studying cognitive brain functions. The transition matrix can be regarded as a proxy for generative memory representations that the brain uses to predict the next event. We assume that detected regularities in a sequence of events correspond to (a subset of) the entries in the transition matrix. We apply this idea to the Mismatch Negativity (MMN) research and examine three types of MMN paradigms: classical oddball paradigms emphasizing sound probabilities, between-sound regularity paradigms manipulating transition probabilities between adjacent sounds, and action-sound coupling paradigms in which sounds are associated with actions and their intended effects. We show that the Markovian view on MMN yields theoretically relevant insights into the brain processes underlying MMN and stimulates experimental designs to study the brain's processing of event sequences.
Collapse
Affiliation(s)
- Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Leipzig, Germany
| | - Urte Roeber
- Wilhelm Wundt Institute for Psychology, Leipzig University, Leipzig, Germany
| | - Nina Coy
- Wilhelm Wundt Institute for Psychology, Leipzig University, Leipzig, Germany
- Max Planck School of Cognition, Leipzig, Germany
| |
Collapse
|
3
|
Thomassen S, Hartung K, Einhäuser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2758. [PMID: 36456271 DOI: 10.1121/10.0015054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 06/17/2023]
Abstract
Sequential auditory scene analysis (ASA) is often studied using sequences of two alternating tones, such as ABAB or ABA_, with "_" denoting a silent gap, and "A" and "B" sine tones differing in frequency (nominally low and high). Many studies implicitly assume that the specific arrangement (ABAB vs ABA_, as well as low-high-low vs high-low-high within ABA_) plays a negligible role, such that decisions about the tone pattern can be governed by other considerations. To explicitly test this assumption, a systematic comparison of different tone patterns for two-tone sequences was performed in three different experiments. Participants were asked to report whether they perceived the sequences as originating from a single sound source (integrated) or from two interleaved sources (segregated). Results indicate that core findings of sequential ASA, such as an effect of frequency separation on the proportion of integrated and segregated percepts, are similar across the different patterns during prolonged listening. However, at sequence onset, the integrated percept was more likely to be reported by the participants in ABA_low-high-low than in ABA_high-low-high sequences. This asymmetry is important for models of sequential ASA, since the formation of percepts at onset is an integral part of understanding how auditory interpretations build up.
Collapse
Affiliation(s)
- Sabine Thomassen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Kevin Hartung
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
4
|
Szalárdy O, Tóth B, Farkas D, Orosz G, Winkler I. Do we parse the background into separate streams in the cocktail party? Front Hum Neurosci 2022; 16:952557. [DOI: 10.3389/fnhum.2022.952557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/06/2022] [Indexed: 11/13/2022] Open
Abstract
In the cocktail party situation, people with normal hearing usually follow a single speaker among multiple concurrent ones. However, there is no agreement in the literature as to whether the background is segregated into multiple streams/speakers. The current study varied the number of concurrent speech streams and investigated target detection and memory for the contents of a target stream as well as the processing of distractors. A male-voiced target stream was either presented alone (single-speech), together with one male-voiced distractor (one-distractor), or a male- and a female-voiced distractor (two-distractor). Behavioral measures of target detection and content tracking performance as well as target- and distractor detection related event-related brain potentials (ERPs) were assessed. We found that the N2 amplitude decreased whereas the P3 amplitude increased from the single-speech to the concurrent speech streams conditions. Importantly, the behavioral effect of distractors differed between the conditions with one vs. two distractor speech streams and the non-zero voltages in the N2 time window for distractor numerals and in the P3 time window for syntactic violations appearing in the non-target speech stream significantly differed between the one- and two-distractor conditions for the same (male) speaker. These results support the notion that the two background speech streams are segregated, as they show that distractors and syntactic violations appearing in the non-target streams are processed even when two speech non-target speech streams are delivered together with the target stream.
Collapse
|
5
|
Widmann A, Schröger E. Intention-based predictive information modulates auditory deviance processing. Front Neurosci 2022; 16:995119. [PMID: 36248631 PMCID: PMC9554204 DOI: 10.3389/fnins.2022.995119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 09/08/2022] [Indexed: 11/26/2022] Open
Abstract
The human brain is highly responsive to (deviant) sounds violating an auditory regularity. Respective brain responses are usually investigated in situations when the sounds were produced by the experimenter. Acknowledging that humans also actively produce sounds, the present event-related potential study tested for differences in the brain responses to deviants that were produced by the listeners by pressing one of two buttons. In one condition, deviants were unpredictable with respect to the button-sound association. In another condition, deviants were predictable with high validity yielding correctly predicted deviants and incorrectly predicted (mispredicted) deviants. Temporal principal component analysis revealed deviant-specific N1 enhancement, mismatch negativity (MMN) and P3a. N1 enhancements were highly similar for each deviant type, indicating that the underlying neural mechanism is not affected by intention-based expectation about the self-produced forthcoming sound. The MMN was abolished for predictable deviants, suggesting that the intention-based prediction for a deviant can overwrite the prediction derived from the auditory regularity (predicting a standard). The P3a was present for each deviant type but was largest for mispredicted deviants. It is argued that the processes underlying P3a not only evaluate the deviant with respect to the fact that it violates an auditory regularity but also with respect to the intended sensorial effect of an action. Overall, our results specify current theories of auditory predictive processing, as they reveal that intention-based predictions exert different effects on different deviance-specific brain responses.
Collapse
Affiliation(s)
- Andreas Widmann
- Wilhelm Wundt Institute for Psychology, Leipzig University, Leipzig, Germany
- Leibniz Institute for Neurobiology, Magdeburg, Germany
- *Correspondence: Andreas Widmann,
| | - Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Leipzig, Germany
- Erich Schröger,
| |
Collapse
|
6
|
Attentional control via synaptic gain mechanisms in auditory streaming. Brain Res 2021; 1778:147720. [PMID: 34785256 DOI: 10.1016/j.brainres.2021.147720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/13/2021] [Accepted: 11/05/2021] [Indexed: 11/21/2022]
Abstract
Attention is a crucial component in sound source segregation allowing auditory objects of interest to be both singled out and held in focus. Our study utilizes a fundamental paradigm for sound source segregation: a sequence of interleaved tones, A and B, of different frequencies that can be heard as a single integrated stream or segregated into two streams (auditory streaming paradigm). We focus on the irregular alternations between integrated and segregated that occur for long presentations, so-called auditory bistability. Psychaoustic experiments demonstrate how attentional control, a listener's intention to experience integrated or segregated, biases perception in favour of different perceptual interpretations. Our data show that this is achieved by prolonging the dominance times of the attended percept and, to a lesser extent, by curtailing the dominance times of the unattended percept, an effect that remains consistent across a range of values for the difference in frequency between A and B. An existing neuromechanistic model describes the neural dynamics of perceptual competition downstream of primary auditory cortex (A1). The model allows us to propose plausible neural mechanisms for attentional control, as linked to different attentional strategies, in a direct comparison with behavioural data. A mechanism based on a percept-specific input gain best accounts for the effects of attentional control.
Collapse
|
7
|
Cao R, Pastukhov A, Aleshin S, Mattia M, Braun J. Binocular rivalry reveals an out-of-equilibrium neural dynamics suited for decision-making. eLife 2021; 10:61581. [PMID: 34369875 PMCID: PMC8352598 DOI: 10.7554/elife.61581] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 05/24/2021] [Indexed: 12/19/2022] Open
Abstract
In ambiguous or conflicting sensory situations, perception is often ‘multistable’ in that it perpetually changes at irregular intervals, shifting abruptly between distinct alternatives. The interval statistics of these alternations exhibits quasi-universal characteristics, suggesting a general mechanism. Using binocular rivalry, we show that many aspects of this perceptual dynamics are reproduced by a hierarchical model operating out of equilibrium. The constitutive elements of this model idealize the metastability of cortical networks. Independent elements accumulate visual evidence at one level, while groups of coupled elements compete for dominance at another level. As soon as one group dominates perception, feedback inhibition suppresses supporting evidence. Previously unreported features in the serial dependencies of perceptual alternations compellingly corroborate this mechanism. Moreover, the proposed out-of-equilibrium dynamics satisfies normative constraints of continuous decision-making. Thus, multistable perception may reflect decision-making in a volatile world: integrating evidence over space and time, choosing categorically between hypotheses, while concurrently evaluating alternatives.
Collapse
Affiliation(s)
- Robin Cao
- Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany.,Gatsby Computational Neuroscience Unit, London, United Kingdom.,Istituto Superiore di Sanità, Rome, Italy
| | | | - Stepan Aleshin
- Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
| | | | - Jochen Braun
- Cognitive Biology, Center for Behavioral Brain Sciences, Magdeburg, Germany
| |
Collapse
|
8
|
Grenzebach J, Wegner TGG, Einhäuser W, Bendixen A. Pupillometry in auditory multistability. PLoS One 2021; 16:e0252370. [PMID: 34086770 PMCID: PMC8177413 DOI: 10.1371/journal.pone.0252370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 05/15/2021] [Indexed: 11/20/2022] Open
Abstract
In multistability, a constant stimulus induces alternating perceptual interpretations. For many forms of visual multistability, the transition from one interpretation to another ("perceptual switch") is accompanied by a dilation of the pupil. Here we ask whether the same holds for auditory multistability, specifically auditory streaming. Two tones were played in alternation, yielding four distinct interpretations: the tones can be perceived as one integrated percept (single sound source), or as segregated with either tone or both tones in the foreground. We found that the pupil dilates significantly around the time a perceptual switch is reported ("multistable condition"). When participants instead responded to actual stimulus changes that closely mimicked the multistable perceptual experience ("replay condition"), the pupil dilated more around such responses than in multistability. This still held when data were corrected for the pupil response to the stimulus change as such. Hence, active responses to an exogeneous stimulus change trigger a stronger or temporally more confined pupil dilation than responses to an endogenous perceptual switch. In another condition, participants randomly pressed the buttons used for reporting multistability. In Study 1, this "random condition" failed to sufficiently mimic the temporal pattern of multistability. By adapting the instructions, in Study 2 we obtained a response pattern more similar to the multistable condition. In this case, the pupil dilated significantly around the random button presses. Albeit numerically smaller, this pupil response was not significantly different from the multistable condition. While there are several possible explanations-related, e.g., to the decision to respond-this underlines the difficulty to isolate a purely perceptual effect in multistability. Our data extend previous findings from visual to auditory multistability. They highlight methodological challenges in interpreting such data and suggest possible approaches to meet them, including a novel stimulus to simulate the experience of perceptual switches in auditory streaming.
Collapse
Affiliation(s)
- Jan Grenzebach
- Cognitive Systems Lab, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Thomas G. G. Wegner
- Cognitive Systems Lab, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Institute of Physics, Chemnitz University of Technology, Chemnitz, Germany
| |
Collapse
|
9
|
Ferrario A, Rankin J. Auditory streaming emerges from fast excitation and slow delayed inhibition. JOURNAL OF MATHEMATICAL NEUROSCIENCE 2021; 11:8. [PMID: 33939042 PMCID: PMC8093365 DOI: 10.1186/s13408-021-00106-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2020] [Accepted: 04/22/2021] [Indexed: 05/29/2023]
Abstract
In the auditory streaming paradigm, alternating sequences of pure tones can be perceived as a single galloping rhythm (integration) or as two sequences with separated low and high tones (segregation). Although studied for decades, the neural mechanisms underlining this perceptual grouping of sound remains a mystery. With the aim of identifying a plausible minimal neural circuit that captures this phenomenon, we propose a firing rate model with two periodically forced neural populations coupled by fast direct excitation and slow delayed inhibition. By analyzing the model in a non-smooth, slow-fast regime we analytically prove the existence of a rich repertoire of dynamical states and of their parameter dependent transitions. We impose plausible parameter restrictions and link all states with perceptual interpretations. Regions of stimulus parameters occupied by states linked with each percept match those found in behavioural experiments. Our model suggests that slow inhibition masks the perception of subsequent tones during segregation (forward masking), whereas fast excitation enables integration for large pitch differences between the two tones.
Collapse
Affiliation(s)
- Andrea Ferrario
- Department of Mathematics, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter, UK.
| | - James Rankin
- Department of Mathematics, College of Engineering, Mathematics & Physical Sciences, University of Exeter, Exeter, UK
| |
Collapse
|
10
|
Skerritt-Davis B, Elhilali M. Computational framework for investigating predictive processing in auditory perception. J Neurosci Methods 2021; 360:109177. [PMID: 33839191 DOI: 10.1016/j.jneumeth.2021.109177] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 03/07/2021] [Accepted: 03/25/2021] [Indexed: 11/24/2022]
Abstract
BACKGROUND The brain tracks sound sources as they evolve in time, collecting contextual information to predict future sensory inputs. Previous work in predictive coding typically focuses on the perception of predictable stimuli, leaving the implementation of these same neural processes in more complex, real-world environments containing randomness and uncertainty up for debate. NEW METHOD To facilitate investigation into the perception of less tightly-controlled listening scenarios, we present a computational model as a tool to ask targeted questions about the underlying predictive processes that connect complex sensory inputs to listener behavior and neural responses. In the modeling framework, observed sound features (e.g. pitch) are tracked sequentially using Bayesian inference. Sufficient statistics are inferred from past observations at multiple time scales and used to make predictions about future observation while tracking the statistical structure of the sensory input. RESULTS Facets of the model are discussed in terms of their application to perceptual research, and examples taken from real-world audio demonstrate the model's flexibility to capture a variety of statistical structures along various perceptual dimensions. COMPARISON WITH EXISTING METHODS Previous models are often targeted toward interpreting a particular experimental paradigm (e.g., oddball paradigm), perceptual dimension (e.g., pitch processing), or task (e.g., speech segregation), thus limiting their ability to generalize to other domains. The presented model is designed as a flexible and practical tool for broad application. CONCLUSION The model is presented as a general framework for generating new hypotheses and guiding investigation into the neural processes underlying predictive coding of complex scenes.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Johns Hopkins University, 3400 N Charles St, Baltimore, MD, USA.
| |
Collapse
|
11
|
Nguyen QA, Rinzel J, Curtu R. Buildup and bistability in auditory streaming as an evidence accumulation process with saturation. PLoS Comput Biol 2020; 16:e1008152. [PMID: 32853256 PMCID: PMC7480857 DOI: 10.1371/journal.pcbi.1008152] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2020] [Revised: 09/09/2020] [Accepted: 07/15/2020] [Indexed: 12/23/2022] Open
Abstract
A repeating triplet-sequence ABA- of non-overlapping brief tones, A and B, is a valued paradigm for studying auditory stream formation and the cocktail party problem. The stimulus is "heard" either as a galloping pattern (integration) or as two interleaved streams (segregation); the initial percept is typically integration then followed by spontaneous alternations between segregation and integration, each being dominant for a few seconds. The probability of segregation grows over seconds, from near-zero to a steady value, defining the buildup function, BUF. Its stationary level increases with the difference in tone frequencies, DF, and the BUF rises faster. Percept durations have DF-dependent means and are gamma-like distributed. Behavioral and computational studies usually characterize triplet streaming either during alternations or during buildup. Here, our experimental design and modeling encompass both. We propose a pseudo-neuromechanistic model that incorporates spiking activity in primary auditory cortex, A1, as input and resolves perception along two network-layers downstream of A1. Our model is straightforward and intuitive. It describes the noisy accumulation of evidence against the current percept which generates switches when reaching a threshold. Accumulation can saturate either above or below threshold; if below, the switching dynamics resemble noise-induced transitions from an attractor state. Our model accounts quantitatively for three key features of data: the BUFs, mean durations, and normalized dominance duration distributions, at various DF values. It describes perceptual alternations without competition per se, and underscores that treating triplets in the sequence independently and averaging across trials, as implemented in earlier widely cited studies, is inadequate.
Collapse
Affiliation(s)
- Quynh-Anh Nguyen
- Department of Mathematics, The University of Iowa, Iowa City, Iowa, United States of America
| | - John Rinzel
- Center for Neural Science, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| | - Rodica Curtu
- Department of Mathematics, The University of Iowa, Iowa City, Iowa, United States of America
- Iowa Neuroscience Institute, Human Brain Research Laboratory, Iowa City, Iowa, United States of America
- * E-mail:
| |
Collapse
|
12
|
Little DF, Snyder JS, Elhilali M. Ensemble modeling of auditory streaming reveals potential sources of bistability across the perceptual hierarchy. PLoS Comput Biol 2020; 16:e1007746. [PMID: 32275706 PMCID: PMC7185718 DOI: 10.1371/journal.pcbi.1007746] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 04/27/2020] [Accepted: 02/25/2020] [Indexed: 11/19/2022] Open
Abstract
Perceptual bistability-the spontaneous, irregular fluctuation of perception between two interpretations of a stimulus-occurs when observing a large variety of ambiguous stimulus configurations. This phenomenon has the potential to serve as a tool for, among other things, understanding how function varies across individuals due to the large individual differences that manifest during perceptual bistability. Yet it remains difficult to interpret the functional processes at work, without knowing where bistability arises during perception. In this study we explore the hypothesis that bistability originates from multiple sources distributed across the perceptual hierarchy. We develop a hierarchical model of auditory processing comprised of three distinct levels: a Peripheral, tonotopic analysis, a Central analysis computing features found more centrally in the auditory system, and an Object analysis, where sounds are segmented into different streams. We model bistable perception within this system by applying adaptation, inhibition and noise into one or all of the three levels of the hierarchy. We evaluate a large ensemble of variations of this hierarchical model, where each model has a different configuration of adaptation, inhibition and noise. This approach avoids the assumption that a single configuration must be invoked to explain the data. Each model is evaluated based on its ability to replicate two hallmarks of bistability during auditory streaming: the selectivity of bistability to specific stimulus configurations, and the characteristic log-normal pattern of perceptual switches. Consistent with a distributed origin, a broad range of model parameters across this hierarchy lead to a plausible form of perceptual bistability.
Collapse
Affiliation(s)
- David F. Little
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas; Las Vegas, Nevada, United States of America
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
13
|
Schröger E, Roeber U. Encoding of deterministic and stochastic auditory rules in the human brain: The mismatch negativity mechanism does not reflect basic probability. Hear Res 2020; 399:107907. [PMID: 32143958 DOI: 10.1016/j.heares.2020.107907] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 01/11/2020] [Accepted: 02/02/2020] [Indexed: 10/25/2022]
Abstract
Regularities in a sequence of sounds can be automatically encoded in a predictive model by the auditory system. When a sound deviates from the one predicted by the model, a mismatch negativity (MMN) is elicited, which is taken to reflect a prediction error at a particular level of the model hierarchy. Although there are many studies on deterministic regularities, only a few have investigated the brain's ability to encode non-deterministic regularities. We studied a simple stochastic regularity: two tone pitches (standards, each occurring on 45% of trials); this regularity was occasionally violated by another tone pitch (deviant, occurring on 10% of trials). We found MMN when the deviant's pitch was outside those of the standards, but not when it was between them. Importantly, when we alternated the occurrence of the same two standards, making them deterministic, the deviant elicited MMN, even when its pitch was between those of the standards. Thus, although the MMN system is extremely powerful in establishing even quite complex deterministic regularities, it fails with a simple stochastic regularity. We argue that the MMN system does not know basic probability.
Collapse
Affiliation(s)
- Erich Schröger
- Institute for Psychology, Leipzig University, Neumarkt 9-19, D-04109, Leipzig, Germany.
| | - Urte Roeber
- Institute for Psychology, Leipzig University, Neumarkt 9-19, D-04109, Leipzig, Germany.
| |
Collapse
|
14
|
Neural correlates of perceptual switching while listening to bistable auditory streaming stimuli. Neuroimage 2020; 204:116220. [DOI: 10.1016/j.neuroimage.2019.116220] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Revised: 08/19/2019] [Accepted: 09/19/2019] [Indexed: 11/15/2022] Open
|
15
|
Auditory streaming and bistability paradigm extended to a dynamic environment. Hear Res 2019; 383:107807. [PMID: 31622836 DOI: 10.1016/j.heares.2019.107807] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 09/19/2019] [Accepted: 10/01/2019] [Indexed: 11/23/2022]
Abstract
We explore stream segregation with temporally modulated acoustic features using behavioral experiments and modelling. The auditory streaming paradigm in which alternating high- A and low-frequency tones B appear in a repeating ABA-pattern, has been shown to be perceptually bistable for extended presentations (order of minutes). For a fixed, repeating stimulus, perception spontaneously changes (switches) at random times, every 2-15 s, between an integrated interpretation with a galloping rhythm and segregated streams. Streaming in a natural auditory environment requires segregation of auditory objects with features that evolve over time. With the relatively idealized ABA-triplet paradigm, we explore perceptual switching in a non-static environment by considering slowly and periodically varying stimulus features. Our previously published model captures the dynamics of auditory bistability and predicts here how perceptual switches are entrained, tightly locked to the rising and falling phase of modulation. In psychoacoustic experiments we find that entrainment depends on both the period of modulation and the intrinsic switch characteristics of individual listeners. The extended auditory streaming paradigm with slowly modulated stimulus features presented here will be of significant interest for future imaging and neurophysiology experiments by reducing the need for subjective perceptual reports of ongoing perception.
Collapse
|
16
|
Rankin J, Rinzel J. Computational models of auditory perception from feature extraction to stream segregation and behavior. Curr Opin Neurobiol 2019; 58:46-53. [PMID: 31326723 DOI: 10.1016/j.conb.2019.06.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 06/22/2019] [Indexed: 10/26/2022]
Abstract
Audition is by nature dynamic, from brainstem processing on sub-millisecond time scales, to segregating and tracking sound sources with changing features, to the pleasure of listening to music and the satisfaction of getting the beat. We review recent advances from computational models of sound localization, of auditory stream segregation and of beat perception/generation. A wealth of behavioral, electrophysiological and imaging studies shed light on these processes, typically with synthesized sounds having regular temporal structure. Computational models integrate knowledge from different experimental fields and at different levels of description. We advocate a neuromechanistic modeling approach that incorporates knowledge of the auditory system from various fields, that utilizes plausible neural mechanisms, and that bridges our understanding across disciplines.
Collapse
Affiliation(s)
- James Rankin
- College of Engineering, Mathematics and Physical Sciences, University of Exeter, Harrison Building, North Park Rd, Exeter EX4 4QF, UK.
| | - John Rinzel
- Center for Neural Science, New York University, 4 Washington Place, 10003 New York, NY, United States; Courant Institute of Mathematical Sciences, New York University, 251 Mercer St, 10012 New York, NY, United States
| |
Collapse
|
17
|
Taranu M, Wimmer MC, Ross J, Farkas D, van Ee R, Winkler I, Denham SL. Children's perception of visual and auditory ambiguity and its link to executive functions and creativity. J Exp Child Psychol 2019; 184:123-138. [PMID: 31029832 DOI: 10.1016/j.jecp.2019.03.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Revised: 03/25/2019] [Accepted: 03/26/2019] [Indexed: 10/26/2022]
Abstract
The phenomenon of perceptual bistability provides insights into aspects of perceptual processing not normally accessible to everyday experience. However, most experiments have been conducted in adults, and it is not clear to what extent key aspects of perceptual switching change through development. The current research examined the ability of 6-, 8-, and 10-year-old children (N = 66) to switch between competing percepts of ambiguous visual and auditory stimuli and links between switching rate, executive functions, and creativity. The numbers of switches participants reported in two visual tasks (ambiguous figure and ambiguous structure from motion) and two auditory tasks (verbal transformation and auditory streaming) were measured in three 60-s blocks. In addition, inhibitory control was measured with a Stroop task, set shifting was measured with a verbal fluency task, and creativity was measured with a divergent thinking task. The numbers of perceptual switches increased in all four tasks from 6 to 10 years of age but differed across tasks in that they were higher in the verbal transformation and ambigous structure-from-motion tasks than in the ambigous figure and auditory streaming tasks for all age groups. Although perceptual switching rates differed across tasks, there were predictive relationships between switching rates in some tasks. However, little evidence for the influence of central processes on perceptual switching was found. Overall, the results support the notion that perceptual switching is largely modality and task specific and that this property is already evident when perceptual switching emerges.
Collapse
Affiliation(s)
- Mihaela Taranu
- Cognition Institute and School of Psychology, University of Plymouth, Plymouth PL4 8AA, UK
| | - Marina C Wimmer
- Cognition Institute and School of Psychology, University of Plymouth, Plymouth PL4 8AA, UK.
| | - Josephine Ross
- Psychology, School of Social Sciences, University of Dundee, Nethergate, Dundee DD1 4HN, Scotland, UK
| | - Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre of Natural Sciences, Hungarian Academy of Sciences, H-1117 Budapest, Hungary
| | - Raymond van Ee
- Biophysics, Donders Institute for Brain, Cognition and Behavior, Radboud University, 6500 GL Nijmegen, The Netherlands; Department of Brain and Cognition, Leuven University, 3000 BE Leuven, Belgium; Department of Brain, Behavior and Cognition, Philips Research, High Tech Campus, 5656 AE Eindhoven, The Netherlands
| | - István Winkler
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, H-1111 Budapest, Hungary
| | - Susan L Denham
- Cognition Institute and School of Psychology, University of Plymouth, Plymouth PL4 8AA, UK
| |
Collapse
|
18
|
Abstract
Research in speech perception has explored how knowledge of a language influences phonetic perception. The current study investigated whether such linguistic influences extend to the perceptual (sequential) organization of speech. Listeners heard sinewave analogs of word pairs (e.g., loose seam, which contains a single [s] frication but is perceived as two /s/ phonemes) cycle continuously, which causes the stimulus to split apart into foreground and background percepts. They had to identify the foreground percept when the stimuli were heard as nonspeech and then again when heard as speech. Of interest was how grouping changed across listening condition when [s] was heard as speech or as a hiss. Although the section of the signal that was identified as the foreground differed little across listening condition, a strong bias to perceive [s] as forming the onset of the foreground was observed in the speech condition (Experiment 1). This effect was reduced in Experiment 2 by increasing the stimulus repetition rate. Findings suggest that the sequential organization of speech arises from the interaction of auditory and linguistic processes, with the former constraining the latter.
Collapse
|
19
|
Skerritt-Davis B, Elhilali M. A Model for Statistical Regularity Extraction from Dynamic Sounds. ACTA ACUST UNITED AC 2019; 105:1-4. [PMID: 31929768 PMCID: PMC6953992 DOI: 10.3813/aaa.919279] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
To understand our surroundings, we effortlessly parse our sound environment into sound sources, extracting invariant information-or regularities-over time to build an internal representation of the world around us. Previous experimental work has shown the brain is sensitive to many types of regularities in sound, but theoretical models that capture underlying principles of regularity tracking across diverse sequence structures have been few and far between. Existing efforts often focus on sound patterns rather the stochastic nature of sequences. In the current study, we employ a perceptual model for regularity extraction based on a Bayesian framework that posits the brain collects statistical information over time. We show this model can be used to simulate various results from the literature with stimuli exhibiting a wide range of predictability. This model can provide a useful tool for both interpreting existing experimental results under a unified model and providing predictions for new ones using more complex stimuli.
Collapse
Affiliation(s)
| | - Mounya Elhilali
- Johns Hopkins University, Baltimore, Maryland, United States.
| |
Collapse
|
20
|
Rajasingam SL, Summers RJ, Roberts B. Stream biasing by different induction sequences: Evaluating stream capture as an account of the segregation-promoting effects of constant-frequency inducers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:3409. [PMID: 30599694 DOI: 10.1121/1.5082300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 11/19/2018] [Indexed: 06/09/2023]
Abstract
Stream segregation for a test sequence comprising high-frequency (H) and low-frequency (L) pure tones, presented in a galloping rhythm, is much greater when preceded by a constant-frequency induction sequence matching one subset than by an inducer configured like the test sequence; this difference persists for several seconds. It has been proposed that constant-frequency inducers promote stream segregation by capturing the matching subset of test-sequence tones into an on-going, pre-established stream. This explanation was evaluated using 2-s induction sequences followed by longer test sequences (12-20 s). Listeners reported the number of streams heard throughout the test sequence. Experiment 1 used LHL- sequences and one or other subset of inducer tones was attenuated (0-24 dB in 6-dB steps, and ∞). Greater attenuation usually caused a progressive increase in segregation, towards that following the constant-frequency inducer. Experiment 2 used HLH- sequences and the L inducer tones were raised or lowered in frequency relative to their test-sequence counterparts (ΔfI = 0, 0.5, 1.0, or 1.5 × ΔfT ). Either change greatly increased segregation. These results are concordant with the notion of attention switching to new sounds but contradict the stream-capture hypothesis, unless a "proto-object" corresponding to the continuing subset is assumed to form during the induction sequence.
Collapse
Affiliation(s)
- Saima L Rajasingam
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
21
|
Kondo HM, Pressnitzer D, Shimada Y, Kochiyama T, Kashino M. Inhibition-excitation balance in the parietal cortex modulates volitional control for auditory and visual multistability. Sci Rep 2018; 8:14548. [PMID: 30267021 PMCID: PMC6162284 DOI: 10.1038/s41598-018-32892-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Accepted: 09/18/2018] [Indexed: 11/25/2022] Open
Abstract
Perceptual organisation must select one interpretation from several alternatives to guide behaviour. Computational models suggest that this could be achieved through an interplay between inhibition and excitation across competing types of neural population coding for each interpretation. Here, to test for such models, we used magnetic resonance spectroscopy to measure non-invasively the concentrations of inhibitory γ-aminobutyric acid (GABA) and excitatory glutamate-glutamine (Glx) in several brain regions. Human participants first performed auditory and visual multistability tasks that produced spontaneous switching between percepts. Then, we observed that longer percept durations during behaviour were associated with higher GABA/Glx ratios in the sensory area coding for each modality. When participants were asked to voluntarily modulate their perception, a common factor across modalities emerged: the GABA/Glx ratio in the posterior parietal cortex tended to be positively correlated with the amount of effective volitional control. Our results provide direct evidence implicating that the balance between neural inhibition and excitation within sensory regions resolves perceptual competition. This powerful computational principle appears to be leveraged by both audition and vision, implemented independently across modalities, but modulated by an integrated control process.
Collapse
Affiliation(s)
- Hirohito M Kondo
- School of Psychology, Chukyo University, Nagoya, Aichi, Japan.
- Human Information Science Laboratory, NTT Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa, Japan.
| | - Daniel Pressnitzer
- Laboratoire des Systèmes Perceptifs, CNRS UMR 8248, Paris, France
- Département d'Études Cognitive, École Normale Supérieure, Paris, France
| | - Yasuhiro Shimada
- Brain Activity Imaging Center, ATR-Promotions, Seika-cho, Kyoto, Japan
| | - Takanori Kochiyama
- Brain Activity Imaging Center, ATR-Promotions, Seika-cho, Kyoto, Japan
- Department of Cognitive Neuroscience, Advanced Telecommunications Research Institute International, Seika-cho, Kyoto, Japan
| | - Makio Kashino
- Sports Brain Science Project, NTT Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa, Japan
- School of Engineering, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan
| |
Collapse
|
22
|
Similar but separate systems underlie perceptual bistability in vision and audition. Sci Rep 2018; 8:7106. [PMID: 29740086 PMCID: PMC5940790 DOI: 10.1038/s41598-018-25587-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 04/13/2018] [Indexed: 12/03/2022] Open
Abstract
The dynamics of perceptual bistability, the phenomenon in which perception switches between different interpretations of an unchanging stimulus, are characterised by very similar properties across a wide range of qualitatively different paradigms. This suggests that perceptual switching may be triggered by some common source. However, it is also possible that perceptual switching may arise from a distributed system, whose components vary according to the specifics of the perceptual experiences involved. Here we used a visual and an auditory task to determine whether individuals show cross-modal commonalities in perceptual switching. We found that individual perceptual switching rates were significantly correlated across modalities. We then asked whether perceptual switching arises from some central (modality-) task-independent process or from a more distributed task-specific system. We found that a log-normal distribution best explained the distribution of perceptual phases in both modalities, suggestive of a combined set of independent processes causing perceptual switching. Modality- and/or task-dependent differences in these distributions, and lack of correlation with the modality-independent central factors tested (ego-resiliency, creativity, and executive function), also point towards perceptual switching arising from a distributed system of similar but independent processes.
Collapse
|
23
|
Skerritt-Davis B, Elhilali M. Detecting change in stochastic sound sequences. PLoS Comput Biol 2018; 14:e1006162. [PMID: 29813049 PMCID: PMC5993325 DOI: 10.1371/journal.pcbi.1006162] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Revised: 06/08/2018] [Accepted: 04/30/2018] [Indexed: 01/18/2023] Open
Abstract
Our ability to parse our acoustic environment relies on the brain's capacity to extract statistical regularities from surrounding sounds. Previous work in regularity extraction has predominantly focused on the brain's sensitivity to predictable patterns in sound sequences. However, natural sound environments are rarely completely predictable, often containing some level of randomness, yet the brain is able to effectively interpret its surroundings by extracting useful information from stochastic sounds. It has been previously shown that the brain is sensitive to the marginal lower-order statistics of sound sequences (i.e., mean and variance). In this work, we investigate the brain's sensitivity to higher-order statistics describing temporal dependencies between sound events through a series of change detection experiments, where listeners are asked to detect changes in randomness in the pitch of tone sequences. Behavioral data indicate listeners collect statistical estimates to process incoming sounds, and a perceptual model based on Bayesian inference shows a capacity in the brain to track higher-order statistics. Further analysis of individual subjects' behavior indicates an important role of perceptual constraints in listeners' ability to track these sensory statistics with high fidelity. In addition, the inference model facilitates analysis of neural electroencephalography (EEG) responses, anchoring the analysis relative to the statistics of each stochastic stimulus. This reveals both a deviance response and a change-related disruption in phase of the stimulus-locked response that follow the higher-order statistics. These results shed light on the brain's ability to process stochastic sound sequences.
Collapse
Affiliation(s)
- Benjamin Skerritt-Davis
- Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Mounya Elhilali
- Electrical & Computer Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| |
Collapse
|
24
|
Neural Decoding of Bistable Sounds Reveals an Effect of Intention on Perceptual Organization. J Neurosci 2018; 38:2844-2853. [PMID: 29440556 PMCID: PMC5852662 DOI: 10.1523/jneurosci.3022-17.2018] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 01/21/2018] [Accepted: 02/06/2018] [Indexed: 12/05/2022] Open
Abstract
Auditory signals arrive at the ear as a mixture that the brain must decompose into distinct sources based to a large extent on acoustic properties of the sounds. An important question concerns whether listeners have voluntary control over how many sources they perceive. This has been studied using pure high (H) and low (L) tones presented in the repeating pattern HLH-HLH-, which can form a bistable percept heard either as an integrated whole (HLH-) or as segregated into high (H-H-) and low (-L-) sequences. Although instructing listeners to try to integrate or segregate sounds affects reports of what they hear, this could reflect a response bias rather than a perceptual effect. We had human listeners (15 males, 12 females) continuously report their perception of such sequences and recorded neural activity using MEG. During neutral listening, a classifier trained on patterns of neural activity distinguished between periods of integrated and segregated perception. In other conditions, participants tried to influence their perception by allocating attention either to the whole sequence or to a subset of the sounds. They reported hearing the desired percept for a greater proportion of time than when listening neutrally. Critically, neural activity supported these reports; stimulus-locked brain responses in auditory cortex were more likely to resemble the signature of segregation when participants tried to hear segregation than when attempting to perceive integration. These results indicate that listeners can influence how many sound sources they perceive, as reflected in neural responses that track both the input and its perceptual organization. SIGNIFICANCE STATEMENT Can we consciously influence our perception of the external world? We address this question using sound sequences that can be heard either as coming from a single source or as two distinct auditory streams. Listeners reported spontaneous changes in their perception between these two interpretations while we recorded neural activity to identify signatures of such integration and segregation. They also indicated that they could, to some extent, choose between these alternatives. This claim was supported by corresponding changes in responses in auditory cortex. By linking neural and behavioral correlates of perception, we demonstrate that the number of objects that we perceive can depend not only on the physical attributes of our environment, but also on how we intend to experience it.
Collapse
|
25
|
Denham SL, Winkler I. Predictive coding in auditory perception: challenges and unresolved questions. Eur J Neurosci 2018; 51:1151-1160. [PMID: 29250827 DOI: 10.1111/ejn.13802] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2017] [Revised: 09/03/2017] [Accepted: 11/20/2017] [Indexed: 11/30/2022]
Abstract
Predictive coding is arguably the currently dominant theoretical framework for the study of perception. It has been employed to explain important auditory perceptual phenomena, and it has inspired theoretical, experimental and computational modelling efforts aimed at describing how the auditory system parses the complex sound input into meaningful units (auditory scene analysis). These efforts have uncovered some vital questions, addressing which could help to further specify predictive coding and clarify some of its basic assumptions. The goal of the current review is to motivate these questions and show how unresolved issues in explaining some auditory phenomena lead to general questions of the theoretical framework. We focus on experimental and computational modelling issues related to sequential grouping in auditory scene analysis (auditory pattern detection and bistable perception), as we believe that this is the research topic where predictive coding has the highest potential for advancing our understanding. In addition to specific questions, our analysis led us to identify three more general questions that require further clarification: (1) What exactly is meant by prediction in predictive coding? (2) What governs which generative models make the predictions? and (3) What (if it exists) is the correlate of perceptual experience within the predictive coding framework?
Collapse
Affiliation(s)
- Susan L Denham
- School of Psychology, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, UK
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
26
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
27
|
Rankin J, Osborn Popp PJ, Rinzel J. Stimulus Pauses and Perturbations Differentially Delay or Promote the Segregation of Auditory Objects: Psychoacoustics and Modeling. Front Neurosci 2017; 11:198. [PMID: 28473747 PMCID: PMC5397483 DOI: 10.3389/fnins.2017.00198] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2016] [Accepted: 03/23/2017] [Indexed: 11/21/2022] Open
Abstract
Segregating distinct sound sources is fundamental for auditory perception, as in the cocktail party problem. In a process called the build-up of stream segregation, distinct sound sources that are perceptually integrated initially can be segregated into separate streams after several seconds. Previous research concluded that abrupt changes in the incoming sounds during build-up—for example, a step change in location, loudness or timing—reset the percept to integrated. Following this reset, the multisecond build-up process begins again. Neurophysiological recordings in auditory cortex (A1) show fast (subsecond) adaptation, but unified mechanistic explanations for the bias toward integration, multisecond build-up and resets remain elusive. Combining psychoacoustics and modeling, we show that initial unadapted A1 responses bias integration, that the slowness of build-up arises naturally from competition downstream, and that recovery of adaptation can explain resets. An early bias toward integrated perceptual interpretations arising from primary cortical stages that encode low-level features and feed into competition downstream could also explain similar phenomena in vision. Further, we report a previously overlooked class of perturbations that promote segregation rather than integration. Our results challenge current understanding for perturbation effects on the emergence of sound source segregation, leading to a new hypothesis for differential processing downstream of A1. Transient perturbations can momentarily redirect A1 responses as input to downstream competition units that favor segregation.
Collapse
Affiliation(s)
- James Rankin
- Department of Mathematics, University of ExeterExeter, UK.,Center for Neural Science, New York UniversityNew York, NY, USA
| | | | - John Rinzel
- Center for Neural Science, New York UniversityNew York, NY, USA.,Courant Institute of Mathematical SciencesNew York, NY, USA
| |
Collapse
|
28
|
Comparison of perceptual properties of auditory streaming between spectral and amplitude modulation domains. Hear Res 2017; 350:244-250. [PMID: 28323019 DOI: 10.1016/j.heares.2017.03.006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Revised: 02/20/2017] [Accepted: 03/15/2017] [Indexed: 11/21/2022]
Abstract
The two-tone sequence (ABA_), which comprises two different sounds (A and B) and a silent gap, has been used to investigate how the auditory system organizes sequential sounds depending on various stimulus conditions or brain states. Auditory streaming can be evoked by differences not only in the tone frequency ("spectral cue": ΔFTONE, TONE condition) but also in the amplitude modulation rate ("AM cue": ΔFAM, AM condition). The aim of the present study was to explore the relationship between the perceptual properties of auditory streaming for the TONE and AM conditions. A sequence with a long duration (400 repetitions of ABA_) was used to examine the property of the bistability of streaming. The ratio of feature differences that evoked an equivalent probability of the segregated percept was close to the ratio of the Q-values of the auditory and modulation filters, consistent with a "channeling theory" of auditory streaming. On the other hand, for values of ΔFAM and ΔFTONE evoking equal probabilities of the segregated percept, the number of perceptual switches was larger for the TONE condition than for the AM condition, indicating that the mechanism(s) that determine the bistability of auditory streaming are different between or sensitive to the two domains. Nevertheless, the number of switches for individual listeners was positively correlated between the spectral and AM domains. The results suggest a possibility that the neural substrates for spectral and AM processes share a common switching mechanism but differ in location and/or in the properties of neural activity or the strength of internal noise at each level.
Collapse
|
29
|
Dykstra AR, Cariani PA, Gutschalk A. A roadmap for the study of conscious audition and its neural basis. Philos Trans R Soc Lond B Biol Sci 2017; 372:20160103. [PMID: 28044014 PMCID: PMC5206271 DOI: 10.1098/rstb.2016.0103] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2016] [Indexed: 12/16/2022] Open
Abstract
How and which aspects of neural activity give rise to subjective perceptual experience-i.e. conscious perception-is a fundamental question of neuroscience. To date, the vast majority of work concerning this question has come from vision, raising the issue of generalizability of prominent resulting theories. However, recent work has begun to shed light on the neural processes subserving conscious perception in other modalities, particularly audition. Here, we outline a roadmap for the future study of conscious auditory perception and its neural basis, paying particular attention to how conscious perception emerges (and of which elements or groups of elements) in complex auditory scenes. We begin by discussing the functional role of the auditory system, particularly as it pertains to conscious perception. Next, we ask: what are the phenomena that need to be explained by a theory of conscious auditory perception? After surveying the available literature for candidate neural correlates, we end by considering the implications that such results have for a general theory of conscious perception as well as prominent outstanding questions and what approaches/techniques can best be used to address them.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Andrew R Dykstra
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
| | | | - Alexander Gutschalk
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
| |
Collapse
|
30
|
Boroujeni FM, Heidari F, Rouzbahani M, Kamali M. Comparison of auditory stream segregation in sighted and early blind individuals. Neurosci Lett 2017; 638:218-221. [PMID: 27986498 DOI: 10.1016/j.neulet.2016.12.022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2016] [Revised: 12/04/2016] [Accepted: 12/12/2016] [Indexed: 12/01/2022]
Abstract
An important characteristic of the auditory system is the capacity to analyze complex sounds and make decisions on the source of the constituent parts of these sounds. Blind individuals compensate for the lack of visual information by an increase input from other sensory modalities, including increased auditory information. The purpose of the current study was to compare the fission boundary (FB) threshold of sighted and early blind individuals through spectral aspects using a psychoacoustic auditory stream segregation (ASS) test. This study was conducted on 16 sighted and 16 early blind adult individuals. The applied stimuli were presented sequentially as the pure tones A and B and as a triplet ABA-ABA pattern at the intensity of 40dBSL. The A tone frequency was selected as the basis at values of 500, 1000, and 2000Hz. The B tone was presented with the difference of a 4-100% above the basis tone frequency. Blind individuals had significantly lower FB thresholds than sighted people. FB was independent of the frequency of the tone A when expressed as the difference in the number of equivalent rectangular bandwidths (ERBs). Early blindness may increase perceptual separation of the acoustic stimuli to form accurate representations of the world.
Collapse
Affiliation(s)
- Fatemeh Moghadasi Boroujeni
- Department of Audiology, School of Rehabilitation Sciences, Iran University of Medical Sciences, Tehran, Iran; Department of Audiology, School of Rehabilitation, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Fatemeh Heidari
- Department of Audiology, School of Rehabilitation Sciences, Iran University of Medical Sciences, Tehran, Iran
| | - Masoumeh Rouzbahani
- Department of Audiology, School of Rehabilitation Sciences, Iran University of Medical Sciences, Tehran, Iran.
| | - Mohammad Kamali
- Department of Basic Sciences in Rehabilitation, School of Rehabilitation Sciences, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
31
|
Thomassen S, Bendixen A. Subjective perceptual organization of a complex auditory scene. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:265. [PMID: 28147594 DOI: 10.1121/1.4973806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Empirical research on the sequential decomposition of an auditory scene primarily relies on interleaved sound mixtures of only two tone sequences (e.g., ABAB…). This oversimplifies the sound decomposition problem by limiting the number of putative perceptual organizations. The current study used a sound mixture composed of three different tones (ABCABC…) that could be perceptually organized in many different ways. Participants listened to these sequences and reported their subjective perception by continuously choosing one out of 12 visually presented perceptual organization alternatives. Different levels of frequency and spatial separation were implemented to check whether participants' perceptual reports would be systematic and plausible. As hypothesized, while perception switched back and forth in each condition between various perceptual alternatives (multistability), spatial as well as frequency separation generally raised the proportion of segregated and reduced the proportion of integrated alternatives. During segregated percepts, in contrast to the hypothesis, many participants had a tendency to perceive two streams in the foreground, rather than reporting alternatives with a clear foreground-background differentiation. Finally, participants perceived the organization with intermediate feature values (e.g., middle tones of the pattern) segregated in the foreground slightly less often than similar alternatives with outer feature values (e.g., higher tones).
Collapse
Affiliation(s)
- Sabine Thomassen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| |
Collapse
|
32
|
Szabó BT, Denham SL, Winkler I. Computational Models of Auditory Scene Analysis: A Review. Front Neurosci 2016; 10:524. [PMID: 27895552 PMCID: PMC5108797 DOI: 10.3389/fnins.2016.00524] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 10/28/2016] [Indexed: 12/02/2022] Open
Abstract
Auditory scene analysis (ASA) refers to the process (es) of parsing the complex acoustic input into auditory perceptual objects representing either physical sources or temporal sound patterns, such as melodies, which contributed to the sound waves reaching the ears. A number of new computational models accounting for some of the perceptual phenomena of ASA have been published recently. Here we provide a theoretically motivated review of these computational models, aiming to relate their guiding principles to the central issues of the theoretical framework of ASA. Specifically, we ask how they achieve the grouping and separation of sound elements and whether they implement some form of competition between alternative interpretations of the sound input. We consider the extent to which they include predictive processes, as important current theories suggest that perception is inherently predictive, and also how they have been evaluated. We conclude that current computational models of ASA are fragmentary in the sense that rather than providing general competing interpretations of ASA, they focus on assessing the utility of specific processes (or algorithms) for finding the causes of the complex acoustic signal. This leaves open the possibility for integrating complementary aspects of the models into a more comprehensive theory of ASA.
Collapse
Affiliation(s)
- Beáta T Szabó
- Faculty of Information Technology and Bionics, Pázmány Péter Catholic UniversityBudapest, Hungary; Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of SciencesBudapest, Hungary
| | - Susan L Denham
- School of Psychology, University of Plymouth Plymouth, UK
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences Budapest, Hungary
| |
Collapse
|
33
|
Tóth B, Kocsis Z, Háden GP, Szerafin Á, Shinn-Cunningham BG, Winkler I. EEG signatures accompanying auditory figure-ground segregation. Neuroimage 2016; 141:108-119. [PMID: 27421185 PMCID: PMC5656226 DOI: 10.1016/j.neuroimage.2016.07.028] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Revised: 07/06/2016] [Accepted: 07/11/2016] [Indexed: 11/16/2022] Open
Abstract
In everyday acoustic scenes, figure-ground segregation typically requires one to group together sound elements over both time and frequency. Electroencephalogram was recorded while listeners detected repeating tonal complexes composed of a random set of pure tones within stimuli consisting of randomly varying tonal elements. The repeating pattern was perceived as a figure over the randomly changing background. It was found that detection performance improved both as the number of pure tones making up each repeated complex (figure coherence) increased, and as the number of repeated complexes (duration) increased - i.e., detection was easier when either the spectral or temporal structure of the figure was enhanced. Figure detection was accompanied by the elicitation of the object related negativity (ORN) and the P400 event-related potentials (ERPs), which have been previously shown to be evoked by the presence of two concurrent sounds. Both ERP components had generators within and outside of auditory cortex. The amplitudes of the ORN and the P400 increased with both figure coherence and figure duration. However, only the P400 amplitude correlated with detection performance. These results suggest that 1) the ORN and P400 reflect processes involved in detecting the emergence of a new auditory object in the presence of other concurrent auditory objects; 2) the ORN corresponds to the likelihood of the presence of two or more concurrent sound objects, whereas the P400 reflects the perceptual recognition of the presence of multiple auditory objects and/or preparation for reporting the detection of a target object.
Collapse
Affiliation(s)
- Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Center for Computational Neuroscience and Neural Technology, Boston University, Boston, USA.
| | - Zsuzsanna Kocsis
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | - Gábor P Háden
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Ágnes Szerafin
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
| | | | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary; Department of Cognitive and Neuropsychology, Institute of Psychology, University of Szeged, Szeged, Hungary
| |
Collapse
|
34
|
Farkas D, Denham SL, Bendixen A, Tóth D, Kondo HM, Winkler I. Auditory Multi-Stability: Idiosyncratic Perceptual Switching Patterns, Executive Functions and Personality Traits. PLoS One 2016; 11:e0154810. [PMID: 27135945 PMCID: PMC4852918 DOI: 10.1371/journal.pone.0154810] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Accepted: 04/19/2016] [Indexed: 02/08/2023] Open
Abstract
Multi-stability refers to the phenomenon of perception stochastically switching between possible interpretations of an unchanging stimulus. Despite considerable variability, individuals show stable idiosyncratic patterns of switching between alternative perceptions in the auditory streaming paradigm. We explored correlates of the individual switching patterns with executive functions, personality traits, and creativity. The main dimensions on which individual switching patterns differed from each other were identified using multidimensional scaling. Individuals with high scores on the dimension explaining the largest portion of the inter-individual variance switched more often between the alternative perceptions than those with low scores. They also perceived the most unusual interpretation more often, and experienced all perceptual alternatives with a shorter delay from stimulus onset. The ego-resiliency personality trait, which reflects a tendency for adaptive flexibility and experience seeking, was significantly positively related to this dimension. Taking these results together we suggest that this dimension may reflect the individual's tendency for exploring the auditory environment. Executive functions were significantly related to some of the variables describing global properties of the switching patterns, such as the average number of switches. Thus individual patterns of perceptual switching in the auditory streaming paradigm are related to some personality traits and executive functions.
Collapse
Affiliation(s)
- Dávid Farkas
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
- Department of Cognitive Science, Faculty of Natural Sciences, Budapest University of Technology and Economics, Budapest, Hungary
- * E-mail:
| | - Susan L. Denham
- Cognition Institute and School of Psychology, University of Plymouth, Plymouth, United Kingdom
| | - Alexandra Bendixen
- School of Natural Sciences, Chemnitz University of Technology, Chemnitz, Germany
| | - Dénes Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| | - Hirohito M. Kondo
- Human Information Science Laboratory, NTT Communication Science Laboratories, NTT Corporation, Atsugi, Japan
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Budapest, Hungary
| |
Collapse
|
35
|
Barniv D, Nelken I. Auditory Streaming as an Online Classification Process with Evidence Accumulation. PLoS One 2015; 10:e0144788. [PMID: 26671774 PMCID: PMC4699212 DOI: 10.1371/journal.pone.0144788] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Accepted: 11/22/2015] [Indexed: 11/18/2022] Open
Abstract
When human subjects hear a sequence of two alternating pure tones, they often perceive it in one of two ways: as one integrated sequence (a single "stream" consisting of the two tones), or as two segregated sequences, one sequence of low tones perceived separately from another sequence of high tones (two "streams"). Perception of this stimulus is thus bistable. Moreover, subjects report on-going switching between the two percepts: unless the frequency separation is large, initial perception tends to be of integration, followed by toggling between integration and segregation phases. The process of stream formation is loosely named “auditory streaming”. Auditory streaming is believed to be a manifestation of human ability to analyze an auditory scene, i.e. to attribute portions of the incoming sound sequence to distinct sound generating entities. Previous studies suggested that the durations of the successive integration and segregation phases are statistically independent. This independence plays an important role in current models of bistability. Contrary to this, we show here, by analyzing a large set of data, that subsequent phase durations are positively correlated. To account together for bistability and positive correlation between subsequent durations, we suggest that streaming is a consequence of an evidence accumulation process. Evidence for segregation is accumulated during the integration phase and vice versa; a switch to the opposite percept occurs stochastically based on this evidence. During a long phase, a large amount of evidence for the opposite percept is accumulated, resulting in a long subsequent phase. In contrast, a short phase is followed by another short phase. We implement these concepts using a probabilistic model that shows both bistability and correlations similar to those observed experimentally.
Collapse
Affiliation(s)
- Dana Barniv
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
- * E-mail:
| | - Israel Nelken
- Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem, Israel
- Department of Neurobiology, Silberman Institute of Life Sciences, Hebrew University, Jerusalem, Israel
| |
Collapse
|
36
|
Rankin J, Sussman E, Rinzel J. Neuromechanistic Model of Auditory Bistability. PLoS Comput Biol 2015; 11:e1004555. [PMID: 26562507 PMCID: PMC4642990 DOI: 10.1371/journal.pcbi.1004555] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 09/12/2015] [Indexed: 12/26/2022] Open
Abstract
Sequences of higher frequency A and lower frequency B tones repeating in an ABA- triplet pattern are widely used to study auditory streaming. One may experience either an integrated percept, a single ABA-ABA- stream, or a segregated percept, separate but simultaneous streams A-A-A-A- and -B---B--. During minutes-long presentations, subjects may report irregular alternations between these interpretations. We combine neuromechanistic modeling and psychoacoustic experiments to study these persistent alternations and to characterize the effects of manipulating stimulus parameters. Unlike many phenomenological models with abstract, percept-specific competition and fixed inputs, our network model comprises neuronal units with sensory feature dependent inputs that mimic the pulsatile-like A1 responses to tones in the ABA- triplets. It embodies a neuronal computation for percept competition thought to occur beyond primary auditory cortex (A1). Mutual inhibition, adaptation and noise are implemented. We include slow NDMA recurrent excitation for local temporal memory that enables linkage across sound gaps from one triplet to the next. Percepts in our model are identified in the firing patterns of the neuronal units. We predict with the model that manipulations of the frequency difference between tones A and B should affect the dominance durations of the stronger percept, the one dominant a larger fraction of time, more than those of the weaker percept—a property that has been previously established and generalized across several visual bistable paradigms. We confirm the qualitative prediction with our psychoacoustic experiments and use the behavioral data to further constrain and improve the model, achieving quantitative agreement between experimental and modeling results. Our work and model provide a platform that can be extended to consider other stimulus conditions, including the effects of context and volition. Humans have an astonishing ability to separate out different sound sources in a busy room: think of how we can hear individual voices in a bustling coffee shop. Rather than voices, we use sound stimuli in the lab: repeating patterns of high and low tones. The tone sequences are ambiguous and can be interpreted in different ways—either grouped into a single stream, or separated out into different streams. When listening for a long time, one’s perception switches every few seconds, a phenomenon called auditory bistability. Based on knowledge of the organization of brain areas involved in separating out different sound sources and how neurons in these areas respond to the ambiguous sequences, we developed a computational model of auditory bistabilty. Our model is less abstract than existing models and shows how groups of neurons may compete in order to dictate what you perceive. We predict how the difference between the two tone sequences affects what you hear over time and we performed an experiment with human listeners to confirm our prediction. The model provides groundwork to further explore the way the brain deals with the busy and often ambiguous world of sound.
Collapse
Affiliation(s)
- James Rankin
- Center for Neural Science, New York University, New York, New York, United States of America
- * E-mail:
| | - Elyse Sussman
- Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York, United States of America
- Department of Otorhinolaryngology-HNS, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - John Rinzel
- Center for Neural Science, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| |
Collapse
|
37
|
Winkler I, Schröger E. Auditory perceptual objects as generative models: Setting the stage for communication by sound. BRAIN AND LANGUAGE 2015; 148:1-22. [PMID: 26184883 DOI: 10.1016/j.bandl.2015.05.003] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 03/03/2015] [Accepted: 05/03/2015] [Indexed: 06/04/2023]
Abstract
Communication by sounds requires that the communication channels (i.e. speech/speakers and other sound sources) had been established. This allows to separate concurrently active sound sources, to track their identity, to assess the type of message arriving from them, and to decide whether and when to react (e.g., reply to the message). We propose that these functions rely on a common generative model of the auditory environment. This model predicts upcoming sounds on the basis of representations describing temporal/sequential regularities. Predictions help to identify the continuation of the previously discovered sound sources to detect the emergence of new sources as well as changes in the behavior of the known ones. It produces auditory event representations which provide a full sensory description of the sounds, including their relation to the auditory context and the current goals of the organism. Event representations can be consciously perceived and serve as objects in various cognitive operations.
Collapse
Affiliation(s)
- István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Hungary; Institute of Psychology, University of Szeged, Hungary.
| | - Erich Schröger
- Institute for Psychology, University of Leipzig, Germany.
| |
Collapse
|
38
|
Bendixen A, Schwartze M, Kotz SA. Temporal dynamics of contingency extraction from tonal and verbal auditory sequences. BRAIN AND LANGUAGE 2015; 148:64-73. [PMID: 25512177 DOI: 10.1016/j.bandl.2014.11.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2014] [Revised: 11/12/2014] [Accepted: 11/15/2014] [Indexed: 06/04/2023]
Abstract
Consecutive sound events are often to some degree predictive of each other. Here we investigated the brain's capacity to detect contingencies between consecutive sounds by means of electroencephalography (EEG) during passive listening. Contingencies were embedded either within tonal or verbal stimuli. Contingency extraction was measured indirectly via the elicitation of the mismatch negativity (MMN) component of the event-related potential (ERP) by contingency violations. MMN results indicate that structurally identical forms of predictability can be extracted from both tonal and verbal stimuli. We also found similar generators to underlie the processing of contingency violations across stimulus types, as well as similar performance in an active-listening follow-up test. However, the process of passive contingency extraction was considerably slower (twice as many rule exemplars were needed) for verbal than for tonal stimuli These results suggest caution in transferring findings on complex predictive regularity processing obtained with tonal stimuli directly to the speech domain.
Collapse
Affiliation(s)
- Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg, D-26111 Oldenburg, Germany; Institute of Psychology, University of Leipzig, D-04103 Leipzig, Germany.
| | - Michael Schwartze
- School of Psychological Sciences, University of Manchester, M13 9PL Manchester, UK; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, D-04103 Leipzig, Germany.
| | - Sonja A Kotz
- School of Psychological Sciences, University of Manchester, M13 9PL Manchester, UK; Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, D-04103 Leipzig, Germany.
| |
Collapse
|
39
|
Farley BJ, Noreña AJ. Membrane potential dynamics of populations of cortical neurons during auditory streaming. J Neurophysiol 2015; 114:2418-30. [PMID: 26269558 DOI: 10.1152/jn.00545.2015] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Accepted: 08/12/2015] [Indexed: 11/22/2022] Open
Abstract
How a mixture of acoustic sources is perceptually organized into discrete auditory objects remains unclear. One current hypothesis postulates that perceptual segregation of different sources is related to the spatiotemporal separation of cortical responses induced by each acoustic source or stream. In the present study, the dynamics of subthreshold membrane potential activity were measured across the entire tonotopic axis of the rodent primary auditory cortex during the auditory streaming paradigm using voltage-sensitive dye imaging. Consistent with the proposed hypothesis, we observed enhanced spatiotemporal segregation of cortical responses to alternating tone sequences as their frequency separation or presentation rate was increased, both manipulations known to promote stream segregation. However, across most streaming paradigm conditions tested, a substantial cortical region maintaining a response to both tones coexisted with more peripheral cortical regions responding more selectively to one of them. We propose that these coexisting subthreshold representation types could provide neural substrates to support the flexible switching between the integrated and segregated streaming percepts.
Collapse
Affiliation(s)
| | - Arnaud J Noreña
- Aix Marseille Université, Centre National de la Recherche Scientifique, Marseille, France; and
| |
Collapse
|
40
|
Háden GP, Németh R, Török M, Winkler I. Predictive processing of pitch trends in newborn infants. Brain Res 2015; 1626:14-20. [PMID: 25749483 DOI: 10.1016/j.brainres.2015.02.048] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2014] [Revised: 02/20/2015] [Accepted: 02/21/2015] [Indexed: 11/17/2022]
Abstract
The notion of predictive sound processing suggests that the auditory system prepares for upcoming sounds once it has detected regular features within a sequence. Here we investigated whether predictive processes are operating at birth in the human auditory system. Event-related potentials (ERP) were recorded from healthy newborns to occasional ascending pitch steps occurring in the 2nd or the 5th position within trains of tones with otherwise monotonously descending pitch. If the trains were processed in a predictive manner only deviant pitch steps occurring in the later train position would elicit the discriminative mismatch response (MMR). Deviants delivered in the 5th but not in the 2nd position of the tone trains elicited a significant MMR response. These results suggest that newborns represent pitch trends within sound sequences and they process them in a predictive manner. This article is part of a Special Issue entitled SI: Prediction and Attention.
Collapse
Affiliation(s)
- Gábor P Háden
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary.
| | - Renáta Németh
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary; Department of Cognitive Science, Central European University, Frankel Leó út 30-34, H-1023 Budapest, Hungary.
| | - Miklós Török
- Military Hospital, Department of Obstetrics-Gynaecology and Perinatal Intensive Care Unit, Podmaniczky u. 111, H-1062 Budapest, Hungary.
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar Tudósok krt. 2, H-1117 Budapest, Hungary; Institute of Psychology, University of Szeged, Egyetem u. 2, H-6722 Szeged, Hungary.
| |
Collapse
|
41
|
Schröger E, Marzecová A, SanMiguel I. Attention and prediction in human audition: a lesson from cognitive psychophysiology. Eur J Neurosci 2015; 41:641-64. [PMID: 25728182 PMCID: PMC4402002 DOI: 10.1111/ejn.12816] [Citation(s) in RCA: 147] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Revised: 11/27/2014] [Accepted: 12/01/2014] [Indexed: 11/30/2022]
Abstract
Attention is a hypothetical mechanism in the service of perception that facilitates the processing of relevant information and inhibits the processing of irrelevant information. Prediction is a hypothetical mechanism in the service of perception that considers prior information when interpreting the sensorial input. Although both (attention and prediction) aid perception, they are rarely considered together. Auditory attention typically yields enhanced brain activity, whereas auditory prediction often results in attenuated brain responses. However, when strongly predicted sounds are omitted, brain responses to silence resemble those elicited by sounds. Studies jointly investigating attention and prediction revealed that these different mechanisms may interact, e.g. attention may magnify the processing differences between predicted and unpredicted sounds. Following the predictive coding theory, we suggest that prediction relates to predictions sent down from predictive models housed in higher levels of the processing hierarchy to lower levels and attention refers to gain modulation of the prediction error signal sent up to the higher level. As predictions encode contents and confidence in the sensory data, and as gain can be modulated by the intention of the listener and by the predictability of the input, various possibilities for interactions between attention and prediction can be unfolded. From this perspective, the traditional distinction between bottom-up/exogenous and top-down/endogenous driven attention can be revisited and the classic concepts of attentional gain and attentional trace can be integrated.
Collapse
Affiliation(s)
- Erich Schröger
- Institute for Psychology, BioCog - Cognitive and Biological Psychology, University of LeipzigNeumarkt 9-19, D-04109, Leipzig, Germany
| | - Anna Marzecová
- Institute for Psychology, BioCog - Cognitive and Biological Psychology, University of LeipzigNeumarkt 9-19, D-04109, Leipzig, Germany
| | - Iria SanMiguel
- Institute for Psychology, BioCog - Cognitive and Biological Psychology, University of LeipzigNeumarkt 9-19, D-04109, Leipzig, Germany
| |
Collapse
|
42
|
Steele SA, Tranchina D, Rinzel J. An alternating renewal process describes the buildup of perceptual segregation. Front Comput Neurosci 2015; 8:166. [PMID: 25620927 PMCID: PMC4286718 DOI: 10.3389/fncom.2014.00166] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2014] [Accepted: 12/02/2014] [Indexed: 12/05/2022] Open
Abstract
For some ambiguous scenes perceptual conflict arises between integration and segregation. Initially, all stimulus features seem integrated. Then abruptly, perhaps after a few seconds, a segregated percept emerges. For example, segregation of acoustic features into streams may require several seconds. In behavioral experiments, when a subject's reports of stream segregation are averaged over repeated trials, one obtains a buildup function, a smooth time course for segregation probability. The buildup function has been said to reflect an underlying mechanism of evidence accumulation or adaptation. During long duration stimuli perception may alternate between integration and segregation. We present a statistical model based on an alternating renewal process (ARP) that generates buildup functions without an accumulative process. In our model, perception alternates during a trial between different groupings, as in perceptual bistability, with random and independent dominance durations sampled from different percept-specific probability distributions. Using this theory, we describe the short-term dynamics of buildup observed on short trials in terms of the long-term statistics of percept durations for the two alternating perceptual organizations. Our statistical-dynamics model describes well the buildup functions and alternations in simulations of pseudo-mechanistic neuronal network models with percept-selective populations competing through mutual inhibition. Even though the competition model can show history dependence through slow adaptation, our statistical switching model, that neglects history, predicts well the buildup function. We propose that accumulation is not a necessary feature to produce buildup. Generally, if alternations between two states exhibit independent durations with stationary statistics then the associated buildup function can be described by the statistical dynamics of an ARP.
Collapse
Affiliation(s)
- Sara A Steele
- Center for Neural Science, New York University New York, NY, USA
| | - Daniel Tranchina
- Courant Institute for Mathematical Sciences, New York University New York, NY, USA ; Department of Biology, New York University New York, NY, USA
| | - John Rinzel
- Center for Neural Science, New York University New York, NY, USA ; Courant Institute for Mathematical Sciences, New York University New York, NY, USA
| |
Collapse
|
43
|
Marti S, Thibault L, Dehaene S. How does the extraction of local and global auditory regularities vary with context? PLoS One 2014; 9:e107227. [PMID: 25197987 PMCID: PMC4157871 DOI: 10.1371/journal.pone.0107227] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2014] [Accepted: 08/11/2014] [Indexed: 11/18/2022] Open
Abstract
How does the human brain extract regularities from its environment? There is evidence that short range or ‘local’ regularities (within seconds) are automatically detected by the brain while long range or ‘global’ regularities (over tens of seconds or more) require conscious awareness. In the present experiment, we asked whether participants' attention was needed to acquire such auditory regularities, to detect their violation or both. We designed a paradigm in which participants listened to predictable sounds. Subjects could be distracted by a visual task at two moments: when they were first exposed to a regularity or when they detected violations of this regularity. MEG recordings revealed that early brain responses (100–130 ms) to violations of short range regularities were unaffected by visual distraction and driven essentially by local transitional probabilities. Based on global workspace theory and prior results, we expected that visual distraction would eliminate the long range global effect, but unexpectedly, we found the contrary, i.e. late brain responses (300–600 ms) to violations of long range regularities on audio-visual trials but not on auditory only trials. Further analyses showed that, in fact, visual distraction was incomplete and that auditory and visual stimuli interfered in both directions. Our results show that conscious, attentive subjects can learn the long range dependencies present in auditory stimuli even while performing a visual task on synchronous visual stimuli. Furthermore, they acquire a complex regularity and end up making different predictions for the very same stimulus depending on the context (i.e. absence or presence of visual stimuli). These results suggest that while short-range regularity detection is driven by local transitional probabilities between stimuli, the human brain detects and stores long-range regularities in a highly flexible, context dependent manner.
Collapse
Affiliation(s)
- Sébastien Marti
- INSERM, U992, Cognitive Neuroimaging Unit, Gif/Yvette, France
- CEA, DSV/I2BM, NeuroSpin Center, Gif/Yvette, France
- * E-mail:
| | - Louis Thibault
- Laboratoire Psychologie de la Perception, UMR 8242, Université Paris Descartes, Paris, France
| | - Stanislas Dehaene
- INSERM, U992, Cognitive Neuroimaging Unit, Gif/Yvette, France
- CEA, DSV/I2BM, NeuroSpin Center, Gif/Yvette, France
- Collège de France, Paris, France
| |
Collapse
|
44
|
Bendixen A. Predictability effects in auditory scene analysis: a review. Front Neurosci 2014; 8:60. [PMID: 24744695 PMCID: PMC3978260 DOI: 10.3389/fnins.2014.00060] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2014] [Accepted: 03/14/2014] [Indexed: 12/02/2022] Open
Abstract
Many sound sources emit signals in a predictable manner. The idea that predictability can be exploited to support the segregation of one source's signal emissions from the overlapping signals of other sources has been expressed for a long time. Yet experimental evidence for a strong role of predictability within auditory scene analysis (ASA) has been scarce. Recently, there has been an upsurge in experimental and theoretical work on this topic resulting from fundamental changes in our perspective on how the brain extracts predictability from series of sensory events. Based on effortless predictive processing in the auditory system, it becomes more plausible that predictability would be available as a cue for sound source decomposition. In the present contribution, empirical evidence for such a role of predictability in ASA will be reviewed. It will be shown that predictability affects ASA both when it is present in the sound source of interest (perceptual foreground) and when it is present in other sound sources that the listener wishes to ignore (perceptual background). First evidence pointing toward age-related impairments in the latter capacity will be addressed. Moreover, it will be illustrated how effects of predictability can be shown by means of objective listening tests as well as by subjective report procedures, with the latter approach typically exploiting the multi-stable nature of auditory perception. Critical aspects of study design will be delineated to ensure that predictability effects can be unambiguously interpreted. Possible mechanisms for a functional role of predictability within ASA will be discussed, and an analogy with the old-plus-new heuristic for grouping simultaneous acoustic signals will be suggested.
Collapse
Affiliation(s)
- Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all," European Medical School, Carl von Ossietzky University of Oldenburg Oldenburg, Germany
| |
Collapse
|
45
|
Szalárdy O, Bendixen A, Böhm TM, Davies LA, Denham SL, Winkler I. The effects of rhythm and melody on auditory stream segregation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:1392-1405. [PMID: 24606277 DOI: 10.1121/1.4865196] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
While many studies have assessed the efficacy of similarity-based cues for auditory stream segregation, much less is known about whether and how the larger-scale structure of sound sequences support stream formation and the choice of sound organization. Two experiments investigated the effects of musical melody and rhythm on the segregation of two interleaved tone sequences. The two sets of tones fully overlapped in pitch range but differed from each other in interaural time and intensity. Unbeknownst to the listener, separately, each of the interleaved sequences was created from the notes of a different song. In different experimental conditions, the notes and/or their timing could either follow those of the songs or they could be scrambled or, in case of timing, set to be isochronous. Listeners were asked to continuously report whether they heard a single coherent sequence (integrated) or two concurrent streams (segregated). Although temporal overlap between tones from the two streams proved to be the strongest cue for stream segregation, significant effects of tonality and familiarity with the songs were also observed. These results suggest that the regular temporal patterns are utilized as cues in auditory stream segregation and that long-term memory is involved in this process.
Collapse
Affiliation(s)
- Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all," European Medical School, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| | - Tamás M Böhm
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| | - Lucy A Davies
- Cognition Institute and School of Psychology, University of Plymouth, Drake Circus, Plymouth PL4 8AA, United Kingdom
| | - Susan L Denham
- Cognition Institute and School of Psychology, University of Plymouth, Drake Circus, Plymouth PL4 8AA, United Kingdom
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, P.O. Box 286, H-1519 Budapest, Hungary
| |
Collapse
|
46
|
Denham S, Bõhm TM, Bendixen A, Szalárdy O, Kocsis Z, Mill R, Winkler I. Stable individual characteristics in the perception of multiple embedded patterns in multistable auditory stimuli. Front Neurosci 2014; 8:25. [PMID: 24616656 PMCID: PMC3937586 DOI: 10.3389/fnins.2014.00025] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2013] [Accepted: 01/27/2014] [Indexed: 11/25/2022] Open
Abstract
The ability of the auditory system to parse complex scenes into component objects in order to extract information from the environment is very robust, yet the processing principles underlying this ability are still not well understood. This study was designed to investigate the proposal that the auditory system constructs multiple interpretations of the acoustic scene in parallel, based on the finding that when listening to a long repetitive sequence listeners report switching between different perceptual organizations. Using the “ABA-” auditory streaming paradigm we trained listeners until they could reliably recognize all possible embedded patterns of length four which could in principle be extracted from the sequence, and in a series of test sessions investigated their spontaneous reports of those patterns. With the training allowing them to identify and mark a wider variety of possible patterns, participants spontaneously reported many more patterns than the ones traditionally assumed (Integrated vs. Segregated). Despite receiving consistent training and despite the apparent randomness of perceptual switching, we found individual switching patterns were idiosyncratic; i.e., the perceptual switching patterns of each participant were more similar to their own switching patterns in different sessions than to those of other participants. These individual differences were found to be preserved even between test sessions held a year after the initial experiment. Our results support the idea that the auditory system attempts to extract an exhaustive set of embedded patterns which can be used to generate expectations of future events and which by competing for dominance give rise to (changing) perceptual awareness, with the characteristics of pattern discovery and perceptual competition having a strong idiosyncratic component. Perceptual multistability thus provides a means for characterizing both general mechanisms and individual differences in human perception.
Collapse
Affiliation(s)
- Susan Denham
- Cognition Institute, University of Plymouth Plymouth, UK ; School of Psychology, University of Plymouth Plymouth, UK
| | - Tamás M Bõhm
- Research Centre for Natural Sciences, Institute of Cognitive Neuroscience and Psychology, Hungarian Academy of Sciences Budapest, Hungary ; Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics Budapest, Hungary
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Cluster of Excellence "Hearing4all", European Medical School, Carl von Ossietzky University of Oldenburg Oldenburg, Germany
| | - Orsolya Szalárdy
- Research Centre for Natural Sciences, Institute of Cognitive Neuroscience and Psychology, Hungarian Academy of Sciences Budapest, Hungary ; Department of Cognitive Science, Budapest University of Technology and Economics Budapest, Hungary
| | - Zsuzsanna Kocsis
- Research Centre for Natural Sciences, Institute of Cognitive Neuroscience and Psychology, Hungarian Academy of Sciences Budapest, Hungary ; Department of Cognitive Science, Budapest University of Technology and Economics Budapest, Hungary
| | - Robert Mill
- Cognition Institute, University of Plymouth Plymouth, UK
| | - István Winkler
- Research Centre for Natural Sciences, Institute of Cognitive Neuroscience and Psychology, Hungarian Academy of Sciences Budapest, Hungary ; Institute of Psychology, University of Szeged Szeged, Hungary
| |
Collapse
|
47
|
Bee MA. Treefrogs as animal models for research on auditory scene analysis and the cocktail party problem. Int J Psychophysiol 2014; 95:216-37. [PMID: 24424243 DOI: 10.1016/j.ijpsycho.2014.01.004] [Citation(s) in RCA: 57] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Revised: 11/10/2013] [Accepted: 01/01/2014] [Indexed: 01/18/2023]
Abstract
The perceptual analysis of acoustic scenes involves binding together sounds from the same source and separating them from other sounds in the environment. In large social groups, listeners experience increased difficulty performing these tasks due to high noise levels and interference from the concurrent signals of multiple individuals. While a substantial body of literature on these issues pertains to human hearing and speech communication, few studies have investigated how nonhuman animals may be evolutionarily adapted to solve biologically analogous communication problems. Here, I review recent and ongoing work aimed at testing hypotheses about perceptual mechanisms that enable treefrogs in the genus Hyla to communicate vocally in noisy, multi-source social environments. After briefly introducing the genus and the methods used to study hearing in frogs, I outline several functional constraints on communication posed by the acoustic environment of breeding "choruses". Then, I review studies of sound source perception aimed at uncovering how treefrog listeners may be adapted to cope with these constraints. Specifically, this review covers research on the acoustic cues used in sequential and simultaneous auditory grouping, spatial release from masking, and dip listening. Throughout the paper, I attempt to illustrate how broad-scale, comparative studies of carefully considered animal models may ultimately reveal an evolutionary diversity of underlying mechanisms for solving cocktail-party-like problems in communication.
Collapse
Affiliation(s)
- Mark A Bee
- Department of Ecology, Evolution and Behavior, University of Minnesota, 100 Ecology, 1987 Upper Buford Circle, St. Paul, MN 55108, USA.
| |
Collapse
|
48
|
Schröger E, Bendixen A, Denham SL, Mill RW, Bőhm TM, Winkler I. Predictive Regularity Representations in Violation Detection and Auditory Stream Segregation: From Conceptual to Computational Models. Brain Topogr 2013; 27:565-77. [DOI: 10.1007/s10548-013-0334-6] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2013] [Accepted: 11/13/2013] [Indexed: 11/24/2022]
|
49
|
Gutschalk A, Dykstra AR. Functional imaging of auditory scene analysis. Hear Res 2013; 307:98-110. [PMID: 23968821 DOI: 10.1016/j.heares.2013.08.003] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Revised: 07/26/2013] [Accepted: 08/08/2013] [Indexed: 11/16/2022]
Abstract
Our auditory system is constantly faced with the task of decomposing the complex mixture of sound arriving at the ears into perceptually independent streams constituting accurate representations of individual sound sources. This decomposition, termed auditory scene analysis, is critical for both survival and communication, and is thought to underlie both speech and music perception. The neural underpinnings of auditory scene analysis have been studied utilizing invasive experiments with animal models as well as non-invasive (MEG, EEG, and fMRI) and invasive (intracranial EEG) studies conducted with human listeners. The present article reviews human neurophysiological research investigating the neural basis of auditory scene analysis, with emphasis on two classical paradigms termed streaming and informational masking. Other paradigms - such as the continuity illusion, mistuned harmonics, and multi-speaker environments - are briefly addressed thereafter. We conclude by discussing the emerging evidence for the role of auditory cortex in remapping incoming acoustic signals into a perceptual representation of auditory streams, which are then available for selective attention and further conscious processing. This article is part of a Special Issue entitled Human Auditory Neuroimaging.
Collapse
Affiliation(s)
- Alexander Gutschalk
- Department of Neurology, Ruprecht-Karls-University Heidelberg, Heidelberg, Germany.
| | | |
Collapse
|