1
|
Laback B, Tabuchi H, Kohlrausch A. Evidence for proactive and retroactive temporal pattern analysis in simultaneous maskinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:3742-3759. [PMID: 38856312 DOI: 10.1121/10.0026240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2023] [Accepted: 05/17/2024] [Indexed: 06/11/2024]
Abstract
Amplitude modulation (AM) of a masker reduces its masking on a simultaneously presented unmodulated pure-tone target, which likely involves dip listening. This study tested the idea that dip-listening efficiency may depend on stimulus context, i.e., the match in AM peakedness (AMP) between the masker and a precursor or postcursor stimulus, assuming a form of temporal pattern analysis process. Masked thresholds were measured in normal-hearing listeners using Schroeder-phase harmonic complexes as maskers and precursors or postcursors. Experiment 1 showed threshold elevation (i.e., interference) when a flat cursor preceded or followed a peaked masker, suggesting proactive and retroactive temporal pattern analysis. Threshold decline (facilitation) was observed when the masker AMP was matched to the precursor, irrespective of stimulus AMP, suggesting only proactive processing. Subsequent experiments showed that both interference and facilitation (1) remained robust when a temporal gap was inserted between masker and cursor, (2) disappeared when an F0-difference was introduced between masker and precursor, and (3) decreased when the presentation level was reduced. These results suggest an important role of envelope regularity in dip listening, especially when masker and cursor are F0-matched and, therefore, form one perceptual stream. The reported effects seem to represent a time-domain variant of comodulation masking release.
Collapse
Affiliation(s)
- Bernhard Laback
- Austrian Academy of Sciences, Acoustics Research Institute, Wohllebengasse 12-14, 1040 Vienna, Austria
| | - Hisaaki Tabuchi
- Department of Psychology, University of Innsbruck, Universitätsstraße 15, 6020 Innsbruck, Austria
| | - Armin Kohlrausch
- Industrial Engineering & Innovation Sciences, Technische Universiteit Eindhoven, P.O. Box 513, 5600 MB Eindhoven, Netherlands
| |
Collapse
|
2
|
Feather J, Leclerc G, Mądry A, McDermott JH. Model metamers reveal divergent invariances between biological and artificial neural networks. Nat Neurosci 2023; 26:2017-2034. [PMID: 37845543 PMCID: PMC10620097 DOI: 10.1038/s41593-023-01442-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2022] [Accepted: 08/29/2023] [Indexed: 10/18/2023]
Abstract
Deep neural network models of sensory systems are often proposed to learn representational transformations with invariances like those in the brain. To reveal these invariances, we generated 'model metamers', stimuli whose activations within a model stage are matched to those of a natural stimulus. Metamers for state-of-the-art supervised and unsupervised neural network models of vision and audition were often completely unrecognizable to humans when generated from late model stages, suggesting differences between model and human invariances. Targeted model changes improved human recognizability of model metamers but did not eliminate the overall human-model discrepancy. The human recognizability of a model's metamers was well predicted by their recognizability by other models, suggesting that models contain idiosyncratic invariances in addition to those required by the task. Metamer recognizability dissociated from both traditional brain-based benchmarks and adversarial vulnerability, revealing a distinct failure mode of existing sensory models and providing a complementary benchmark for model assessment.
Collapse
Affiliation(s)
- Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Computational Neuroscience, Flatiron Institute, Cambridge, MA, USA.
| | - Guillaume Leclerc
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Aleksander Mądry
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- McGovern Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
3
|
Ruesseler M, Weber LA, Marshall TR, O'Reilly J, Hunt LT. Quantifying decision-making in dynamic, continuously evolving environments. eLife 2023; 12:e82823. [PMID: 37883173 PMCID: PMC10602589 DOI: 10.7554/elife.82823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 10/13/2023] [Indexed: 10/27/2023] Open
Abstract
During perceptual decision-making tasks, centroparietal electroencephalographic (EEG) potentials report an evidence accumulation-to-bound process that is time locked to trial onset. However, decisions in real-world environments are rarely confined to discrete trials; they instead unfold continuously, with accumulation of time-varying evidence being recency-weighted towards its immediate past. The neural mechanisms supporting recency-weighted continuous decision-making remain unclear. Here, we use a novel continuous task design to study how the centroparietal positivity (CPP) adapts to different environments that place different constraints on evidence accumulation. We show that adaptations in evidence weighting to these different environments are reflected in changes in the CPP. The CPP becomes more sensitive to fluctuations in sensory evidence when large shifts in evidence are less frequent, and the potential is primarily sensitive to fluctuations in decision-relevant (not decision-irrelevant) sensory input. A complementary triphasic component over occipito-parietal cortex encodes the sum of recently accumulated sensory evidence, and its magnitude covaries with parameters describing how different individuals integrate sensory evidence over time. A computational model based on leaky evidence accumulation suggests that these findings can be accounted for by a shift in decision threshold between different environments, which is also reflected in the magnitude of pre-decision EEG activity. Our findings reveal how adaptations in EEG responses reflect flexibility in evidence accumulation to the statistics of dynamic sensory environments.
Collapse
Affiliation(s)
- Maria Ruesseler
- Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford Centre for Human Brain Activity (OHBA) University Department of Psychiatry Warneford HospitalOxfordUnited Kingdom
| | - Lilian Aline Weber
- Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford Centre for Human Brain Activity (OHBA) University Department of Psychiatry Warneford HospitalOxfordUnited Kingdom
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory QuarterOxfordUnited Kingdom
| | - Tom Rhys Marshall
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory QuarterOxfordUnited Kingdom
- Centre for Human Brain Health, University of BirminghamBirminghamUnited Kingdom
| | - Jill O'Reilly
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory QuarterOxfordUnited Kingdom
| | - Laurence Tudor Hunt
- Wellcome Centre for Integrative Neuroimaging, Department of Psychiatry, University of Oxford, Oxford Centre for Human Brain Activity (OHBA) University Department of Psychiatry Warneford HospitalOxfordUnited Kingdom
- Department of Experimental Psychology, University of Oxford, Anna Watts Building, Radcliffe Observatory QuarterOxfordUnited Kingdom
| |
Collapse
|
4
|
Berto M, Ricciardi E, Pietrini P, Weisz N, Bottari D. Distinguishing Fine Structure and Summary Representation of Sound Textures from Neural Activity. eNeuro 2023; 10:ENEURO.0026-23.2023. [PMID: 37775312 PMCID: PMC10576259 DOI: 10.1523/eneuro.0026-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Revised: 08/25/2023] [Accepted: 08/31/2023] [Indexed: 10/01/2023] Open
Abstract
The auditory system relies on both local and summary representations; acoustic local features exceeding system constraints are compacted into a set of summary statistics. Such compression is pivotal for sound-object recognition. Here, we assessed whether computations subtending local and statistical representations of sounds could be distinguished at the neural level. A computational auditory model was employed to extract auditory statistics from natural sound textures (i.e., fire, rain) and to generate synthetic exemplars where local and statistical properties were controlled. Twenty-four human participants were passively exposed to auditory streams while the electroencephalography (EEG) was recorded. Each stream could consist of short, medium, or long sounds to vary the amount of acoustic information. Short and long sounds were expected to engage local or summary statistics representations, respectively. Data revealed a clear dissociation. Compared with summary-based ones, auditory-evoked responses based on local information were selectively greater in magnitude in short sounds. Opposite patterns emerged for longer sounds. Neural oscillations revealed that local features and summary statistics rely on neural activity occurring at different temporal scales, faster (beta) or slower (theta-alpha). These dissociations emerged automatically without explicit engagement in a discrimination task. Overall, this study demonstrates that the auditory system developed distinct coding mechanisms to discriminate changes in the acoustic environment based on fine structure and summary representations.
Collapse
Affiliation(s)
- Martina Berto
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100, Italy
| | - Emiliano Ricciardi
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100, Italy
| | - Pietro Pietrini
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100, Italy
| | - Nathan Weisz
- Department of Psychology and Centre for Cognitive Neuroscience, Paris-Lodron University of Salzburg, 5020, Austria
- Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, 5020, Austria
| | - Davide Bottari
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, Lucca, 55100, Italy
| |
Collapse
|
5
|
Chen C, de Hoz L. The perceptual categorization of multidimensional stimuli is hierarchically organized. iScience 2023; 26:106941. [PMID: 37378341 PMCID: PMC10291468 DOI: 10.1016/j.isci.2023.106941] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Revised: 03/30/2023] [Accepted: 05/18/2023] [Indexed: 06/29/2023] Open
Abstract
As we interact with our surroundings, we encounter the same or similar objects from different perspectives and are compelled to generalize. For example, despite their variety we recognize dog barks as a distinct sound class. While we have some understanding of generalization along a single stimulus dimension (frequency, color), natural stimuli are identifiable by a combination of dimensions. Measuring their interaction is essential to understand perception. Using a 2-dimension discrimination task for mice and frequency or amplitude modulated sounds, we tested untrained generalization across pairs of auditory dimensions in an automatized behavioral paradigm. We uncovered a perceptual hierarchy over the tested dimensions that was dominated by the sound's spectral composition. Stimuli are thus not perceived as a whole, but as a combination of their features, each of which weights differently on the identification of the stimulus according to an established hierarchy, possibly paralleling their differential shaping of neuronal tuning.
Collapse
Affiliation(s)
- Chi Chen
- Department of Neurogenetics, Max Planck Institute for Experimental Medicine, Göttingen, Germany
- International Max Planck Research School for Neurosciences, Göttingen, Germany
- Göttingen Graduate School of Neurosciences and Molecular Biosciences, Göttingen, Germany
- Neuroscience Research Center, Charité Medical University, Berlin, Germany
| | - Livia de Hoz
- Department of Neurogenetics, Max Planck Institute for Experimental Medicine, Göttingen, Germany
- Neuroscience Research Center, Charité Medical University, Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
| |
Collapse
|
6
|
McAlpine D, de Hoz L. Listening loops and the adapting auditory brain. Front Neurosci 2023; 17:1081295. [PMID: 37008228 PMCID: PMC10060829 DOI: 10.3389/fnins.2023.1081295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 02/17/2023] [Indexed: 03/18/2023] Open
Abstract
Analysing complex auditory scenes depends in part on learning the long-term statistical structure of sounds comprising those scenes. One way in which the listening brain achieves this is by analysing the statistical structure of acoustic environments over multiple time courses and separating background from foreground sounds. A critical component of this statistical learning in the auditory brain is the interplay between feedforward and feedback pathways—“listening loops”—connecting the inner ear to higher cortical regions and back. These loops are likely important in setting and adjusting the different cadences over which learned listening occurs through adaptive processes that tailor neural responses to sound environments that unfold over seconds, days, development, and the life-course. Here, we posit that exploring listening loops at different scales of investigation—from in vivo recording to human assessment—their role in detecting different timescales of regularity, and the consequences this has for background detection, will reveal the fundamental processes that transform hearing into the essential task of listening.
Collapse
Affiliation(s)
- David McAlpine
- Department of Linguistics, Macquarie University, Sydney, NSW, Australia
- *Correspondence: David McAlpine,
| | - Livia de Hoz
- Neuroscience Research Center, Charité – Universitätsmedizin Berlin, Berlin, Germany
- Bernstein Center for Computational Neuroscience, Berlin, Germany
| |
Collapse
|
7
|
Maruyama H, Okada K, Motoyoshi I. A two-stage spectral model for sound texture perception: Synthesis and psychophysics. Iperception 2023; 14:20416695231157349. [PMID: 36845027 PMCID: PMC9950610 DOI: 10.1177/20416695231157349] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 01/30/2023] [Indexed: 02/25/2023] Open
Abstract
The natural environment is filled with a variety of auditory events such as wind blowing, water flowing, and fire crackling. It has been suggested that the perception of such textural sounds is based on the statistics of the natural auditory events. Inspired by a recent spectral model for visual texture perception, we propose a model that can describe the perceived sound texture only with the linear spectrum and the energy spectrum. We tested the validity of the model by using synthetic noise sounds that preserve the two-stage amplitude spectra of the original sound. Psychophysical experiment showed that our synthetic noises were perceived as like the original sounds for 120 real-world auditory events. The performance was comparable with the synthetic sounds produced by McDermott-Simoncelli's model which considers various classes of auditory statistics. The results support the notion that the perception of natural sound textures is predictable by the two-stage spectral signals.
Collapse
Affiliation(s)
| | | | - Isamu Motoyoshi
- Isamu Motoyoshi, Department of Life
Sciences, The University of Tokyo, Japan.
| |
Collapse
|
8
|
Lawlor J, Zagala A, Jamali S, Boubenec Y. Pupillary dynamics reflect the impact of temporal expectation on detection strategy. iScience 2023; 26:106000. [PMID: 36798438 PMCID: PMC9926307 DOI: 10.1016/j.isci.2023.106000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 11/09/2022] [Accepted: 01/12/2023] [Indexed: 01/18/2023] Open
Abstract
Everyday life's perceptual decision-making is informed by experience. In particular, temporal expectation can ease the detection of relevant events in noisy sensory streams. Here, we investigated if humans can extract hidden temporal cues from the occurrences of probabilistic targets and utilize them to inform target detection in a complex acoustic stream. To understand what neural mechanisms implement temporal expectation influence on decision-making, we used pupillometry as a proxy for underlying neuromodulatory activity. We found that participants' detection strategy was influenced by the hidden temporal context and correlated with sound-evoked pupil dilation. A model of urgency fitted on false alarms predicted detection reaction time. Altogether, these findings suggest that temporal expectation informs decision-making and could be implemented through neuromodulatory-mediated urgency signals.
Collapse
Affiliation(s)
- Jennifer Lawlor
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA,Corresponding author
| | - Agnès Zagala
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, Canada
| | - Sara Jamali
- Institut Pasteur, INSERM, Institut de l’Audition, Paris, France
| | - Yves Boubenec
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France
| |
Collapse
|
9
|
Lorenzi C, Apoux F, Grinfeder E, Krause B, Miller-Viacava N, Sueur J. Human Auditory Ecology: Extending Hearing Research to the Perception of Natural Soundscapes by Humans in Rapidly Changing Environments. Trends Hear 2023; 27:23312165231212032. [PMID: 37981813 PMCID: PMC10658775 DOI: 10.1177/23312165231212032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 11/21/2023] Open
Abstract
Research in hearing sciences has provided extensive knowledge about how the human auditory system processes speech and assists communication. In contrast, little is known about how this system processes "natural soundscapes," that is the complex arrangements of biological and geophysical sounds shaped by sound propagation through non-anthropogenic habitats [Grinfeder et al. (2022). Frontiers in Ecology and Evolution. 10: 894232]. This is surprising given that, for many species, the capacity to process natural soundscapes determines survival and reproduction through the ability to represent and monitor the immediate environment. Here we propose a framework to encourage research programmes in the field of "human auditory ecology," focusing on the study of human auditory perception of ecological processes at work in natural habitats. Based on large acoustic databases with high ecological validity, these programmes should investigate the extent to which this presumably ancestral monitoring function of the human auditory system is adapted to specific information conveyed by natural soundscapes, whether it operate throughout the life span or whether it emerges through individual learning or cultural transmission. Beyond fundamental knowledge of human hearing, these programmes should yield a better understanding of how normal-hearing and hearing-impaired listeners monitor rural and city green and blue spaces and benefit from them, and whether rehabilitation devices (hearing aids and cochlear implants) restore natural soundscape perception and emotional responses back to normal. Importantly, they should also reveal whether and how humans hear the rapid changes in the environment brought about by human activity.
Collapse
Affiliation(s)
- Christian Lorenzi
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d’Etudes Cognitives, Ecole Normale Supérieure, Université Paris Sciences et Lettres (PSL), Paris, France
| | - Frédéric Apoux
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d’Etudes Cognitives, Ecole Normale Supérieure, Université Paris Sciences et Lettres (PSL), Paris, France
| | - Elie Grinfeder
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d’Etudes Cognitives, Ecole Normale Supérieure, Université Paris Sciences et Lettres (PSL), Paris, France
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| | | | - Nicole Miller-Viacava
- Laboratoire des Systèmes Perceptifs, UMR CNRS 8248, Département d’Etudes Cognitives, Ecole Normale Supérieure, Université Paris Sciences et Lettres (PSL), Paris, France
| | - Jérôme Sueur
- Institut de Systématique, Évolution, Biodiversité (ISYEB), Muséum national d’Histoire naturelle, CNRS, Sorbonne Université, EPHE, Université des Antilles, Paris, France
| |
Collapse
|
10
|
Abstract
Hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. We explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise for normal hearing listeners. We measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. Harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. Harmonicity also improved discrimination of sounds in noise. The largest benefits were observed for two-note up-down "pitch" discrimination and melodic contour discrimination, both of which could be performed equally well with harmonic and inharmonic tones in quiet, but which showed large harmonic advantages in noise. The results show that harmonicity facilitates hearing in noise, plausibly by providing a noise-robust pitch cue that aids detection and discrimination.
Collapse
|
11
|
Deep neural network models of sound localization reveal how perception is adapted to real-world environments. Nat Hum Behav 2022; 6:111-133. [PMID: 35087192 PMCID: PMC8830739 DOI: 10.1038/s41562-021-01244-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Accepted: 10/29/2021] [Indexed: 11/15/2022]
Abstract
Mammals localize sounds using information from their two ears.
Localization in real-world conditions is challenging, as echoes provide
erroneous information, and noises mask parts of target sounds. To better
understand real-world localization we equipped a deep neural network with human
ears and trained it to localize sounds in a virtual environment. The resulting
model localized accurately in realistic conditions with noise and reverberation.
In simulated experiments, the model exhibited many features of human spatial
hearing: sensitivity to monaural spectral cues and interaural time and level
differences, integration across frequency, biases for sound onsets, and limits
on localization of concurrent sources. But when trained in unnatural
environments without either reverberation, noise, or natural sounds, these
performance characteristics deviated from those of humans. The results show how
biological hearing is adapted to the challenges of real-world environments and
illustrate how artificial neural networks can reveal the real-world constraints
that shape perception.
Collapse
|
12
|
Scheuregger O, Hjortkjær J, Dau T. Identification and Discrimination of Sound Textures in Hearing-Impaired and Older Listeners. Trends Hear 2021; 25:23312165211065608. [PMID: 34939472 PMCID: PMC8721370 DOI: 10.1177/23312165211065608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Sound textures are a broad class of sounds defined by their homogeneous temporal structure. It has been suggested that sound texture perception is mediated by time-averaged summary statistics measured from early stages of the auditory system. The ability of young normal-hearing (NH) listeners to identify synthetic sound textures increases as the statistics of the synthetic texture approach those of its real-world counterpart. In sound texture discrimination, young NH listeners utilize the fine temporal stimulus information for short-duration stimuli, whereas they switch to a time-averaged statistical representation as the stimulus' duration increases. The present study investigated how younger and older listeners with a sensorineural hearing impairment perform in the corresponding texture identification and discrimination tasks in which the stimuli were amplified to compensate for the individual listeners' loss of audibility. In both hearing impaired (HI) listeners and NH controls, sound texture identification performance increased as the number of statistics imposed during the synthesis stage increased, but hearing impairment was accompanied by a significant reduction in overall identification accuracy. Sound texture discrimination performance was measured across listener groups categorized by age and hearing loss. Sound texture discrimination performance was unaffected by hearing loss at all excerpt durations. The older listeners' sound texture and exemplar discrimination performance decreased for signals of short excerpt duration, with older HI listeners performing better than older NH listeners. The results suggest that the time-averaged statistic representations of sound textures provide listeners with cues which are robust to the effects of age and sensorineural hearing loss.
Collapse
Affiliation(s)
- Oliver Scheuregger
- Hearing Systems Section, Department of Health Technology, 5205Technical University of Denmark, Kongens Lyngby, Denmark
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, 5205Technical University of Denmark, Kongens Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Kettegård Allé 30, DK-2650 Hvidovre, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, 5205Technical University of Denmark, Kongens Lyngby, Denmark
| |
Collapse
|
13
|
Saddler MR, Gonzalez R, McDermott JH. Deep neural network models reveal interplay of peripheral coding and stimulus statistics in pitch perception. Nat Commun 2021; 12:7278. [PMID: 34907158 PMCID: PMC8671597 DOI: 10.1038/s41467-021-27366-6] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 11/12/2021] [Indexed: 11/15/2022] Open
Abstract
Perception is thought to be shaped by the environments for which organisms are optimized. These influences are difficult to test in biological organisms but may be revealed by machine perceptual systems optimized under different conditions. We investigated environmental and physiological influences on pitch perception, whose properties are commonly linked to peripheral neural coding limits. We first trained artificial neural networks to estimate fundamental frequency from biologically faithful cochlear representations of natural sounds. The best-performing networks replicated many characteristics of human pitch judgments. To probe the origins of these characteristics, we then optimized networks given altered cochleae or sound statistics. Human-like behavior emerged only when cochleae had high temporal fidelity and when models were optimized for naturalistic sounds. The results suggest pitch perception is critically shaped by the constraints of natural environments in addition to those of the cochlea, illustrating the use of artificial neural networks to reveal underpinnings of behavior.
Collapse
Affiliation(s)
- Mark R Saddler
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, MIT, Cambridge, MA, USA.
| | - Ray Gonzalez
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA
- Center for Brains, Minds and Machines, MIT, Cambridge, MA, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, USA.
- Center for Brains, Minds and Machines, MIT, Cambridge, MA, USA.
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
14
|
Berto M, Ricciardi E, Pietrini P, Bottari D. Interactions between auditory statistics processing and visual experience emerge only in late development. iScience 2021; 24:103383. [PMID: 34816108 PMCID: PMC8593607 DOI: 10.1016/j.isci.2021.103383] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 09/18/2021] [Accepted: 10/27/2021] [Indexed: 01/10/2023] Open
Abstract
The auditory system relies on local and global representations to discriminate sounds. This study investigated whether vision influences the development and functioning of these fundamental sound computations. We employed a computational approach to control statistical properties embedded in sounds and tested samples of sighted controls (SC) and congenitally (CB) and late-onset (LB) blind individuals in two experiments. In experiment 1, performance relied on local features analysis; in experiment 2, performance benefited from computing global representations. In both experiments, SC and CB performance remarkably overlapped. Conversely, LB performed systematically worse than the other groups when relying on local features, with no alterations on global representations. Results suggest that auditory computations tested here develop independently from vision. The efficiency of local auditory processing can be hampered in case sight becomes unavailable later in life, supporting the existence of an audiovisual interplay for the processing of auditory details, which emerges only in late development. Computational and deprivation models can be combined to assess sensory plasticity Basic auditory computations develop independently from early visual input Late-onset sight loss can hamper the efficiency of local auditory processing
Collapse
Affiliation(s)
- Martina Berto
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, 55100 Lucca, Italy
| | - Emiliano Ricciardi
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, 55100 Lucca, Italy
| | - Pietro Pietrini
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, 55100 Lucca, Italy
| | - Davide Bottari
- Molecular Mind Lab, IMT School for Advanced Studies Lucca, 55100 Lucca, Italy
| |
Collapse
|
15
|
Abstract
Perception adapts to the properties of prior stimulation, as illustrated by phenomena such as visual color constancy or speech context effects. In the auditory domain, only little is known about adaptive processes when it comes to the attribute of auditory brightness. Here, we report an experiment that tests whether listeners adapt to spectral colorations imposed on naturalistic music and speech excerpts. Our results indicate consistent contrastive adaptation of auditory brightness judgments on a trial-by-trial basis. The pattern of results suggests that these effects tend to grow with an increase in the duration of the adaptor context but level off after around 8 trials of 2 s duration. A simple model of the response criterion yields a correlation of r = .97 with the measured data and corroborates the notion that brightness perception adapts on timescales that fall in the range of auditory short-term memory. Effects turn out to be similar for spectral filtering based on linear spectral filter slopes and filtering based on a measured transfer function from a commercially available hearing device. Overall, our findings demonstrate the adaptivity of auditory brightness perception under realistic acoustical conditions.
Collapse
Affiliation(s)
- Kai Siedenburg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.
| | - Feline Malin Barg
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | - Henning Schepker
- Department of Medical Physics and Acoustics, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
- Starkey Hearing, Eden Prairie, MN, USA
| |
Collapse
|
16
|
Ziemba CM, Simoncelli EP. Opposing effects of selectivity and invariance in peripheral vision. Nat Commun 2021; 12:4597. [PMID: 34321483 PMCID: PMC8319169 DOI: 10.1038/s41467-021-24880-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 07/08/2021] [Indexed: 02/07/2023] Open
Abstract
Sensory processing necessitates discarding some information in service of preserving and reformatting more behaviorally relevant information. Sensory neurons seem to achieve this by responding selectively to particular combinations of features in their inputs, while averaging over or ignoring irrelevant combinations. Here, we expose the perceptual implications of this tradeoff between selectivity and invariance, using stimuli and tasks that explicitly reveal their opposing effects on discrimination performance. We generate texture stimuli with statistics derived from natural photographs, and ask observers to perform two different tasks: Discrimination between images drawn from families with different statistics, and discrimination between image samples with identical statistics. For both tasks, the performance of an ideal observer improves with stimulus size. In contrast, humans become better at family discrimination but worse at sample discrimination. We demonstrate through simulations that these behaviors arise naturally in an observer model that relies on a common set of physiologically plausible local statistical measurements for both tasks.
Collapse
Affiliation(s)
- Corey M Ziemba
- Center for Perceptual Systems, The University of Texas at Austin, Austin, TX, USA.
- Center for Neural Science, New York University, New York, NY, USA.
| | - Eero P Simoncelli
- Center for Neural Science, New York University, New York, NY, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA
| |
Collapse
|
17
|
Causal inference in environmental sound recognition. Cognition 2021; 214:104627. [PMID: 34044231 DOI: 10.1016/j.cognition.2021.104627] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 01/28/2021] [Accepted: 02/05/2021] [Indexed: 11/23/2022]
Abstract
Sound is caused by physical events in the world. Do humans infer these causes when recognizing sound sources? We tested whether the recognition of common environmental sounds depends on the inference of a basic physical variable - the source intensity (i.e., the power that produces a sound). A source's intensity can be inferred from the intensity it produces at the ear and its distance, which is normally conveyed by reverberation. Listeners could thus use intensity at the ear and reverberation to constrain recognition by inferring the underlying source intensity. Alternatively, listeners might separate these acoustic cues from their representation of a sound's identity in the interest of invariant recognition. We compared these two hypotheses by measuring recognition accuracy for sounds with typically low or high source intensity (e.g., pepper grinders vs. trucks) that were presented across a range of intensities at the ear or with reverberation cues to distance. The recognition of low-intensity sources (e.g., pepper grinders) was impaired by high presentation intensities or reverberation that conveyed distance, either of which imply high source intensity. Neither effect occurred for high-intensity sources. The results suggest that listeners implicitly use the intensity at the ear along with distance cues to infer a source's power and constrain its identity. The recognition of real-world sounds thus appears to depend upon the inference of their physical generative parameters, even generative parameters whose cues might otherwise be separated from the representation of a sound's identity.
Collapse
|
18
|
Contributions of natural signal statistics to spectral context effects in consonant categorization. Atten Percept Psychophys 2021; 83:2694-2708. [PMID: 33987821 DOI: 10.3758/s13414-021-02310-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
Speech perception, like all perception, takes place in context. Recognition of a given speech sound is influenced by the acoustic properties of surrounding sounds. When the spectral composition of earlier (context) sounds (e.g., a sentence with more energy at lower third formant [F3] frequencies) differs from that of a later (target) sound (e.g., consonant with intermediate F3 onset frequency), the auditory system magnifies this difference, biasing target categorization (e.g., towards higher-F3-onset /d/). Historically, these studies used filters to force context stimuli to possess certain spectral compositions. Recently, these effects were produced using unfiltered context sounds that already possessed the desired spectral compositions (Stilp & Assgari, 2019, Attention, Perception, & Psychophysics, 81, 2037-2052). Here, this natural signal statistics approach is extended to consonant categorization (/g/-/d/). Context sentences were either unfiltered (already possessing the desired spectral composition) or filtered (to imbue specific spectral characteristics). Long-term spectral characteristics of unfiltered contexts were poor predictors of shifts in consonant categorization, but short-term characteristics (last 475 ms) were excellent predictors. This diverges from vowel data, where long-term and shorter-term intervals (last 1,000 ms) were equally strong predictors. Thus, time scale plays a critical role in how listeners attune to signal statistics in the acoustic environment.
Collapse
|
19
|
Zhang L, Schlaghecken F, Harte J, Roberts KL. The Influence of the Type of Background Noise on Perceptual Learning of Speech in Noise. Front Neurosci 2021; 15:646137. [PMID: 34012384 PMCID: PMC8126633 DOI: 10.3389/fnins.2021.646137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Accepted: 04/06/2021] [Indexed: 11/13/2022] Open
Abstract
Objectives Auditory perceptual learning studies tend to focus on the nature of the target stimuli. However, features of the background noise can also have a significant impact on the amount of benefit that participants obtain from training. This study explores whether perceptual learning of speech in background babble noise generalizes to other, real-life environmental background noises (car and rain), and if the benefits are sustained over time. Design Normal-hearing native English speakers were randomly assigned to a training (n = 12) or control group (n = 12). Both groups completed a pre- and post-test session in which they identified Bamford-Kowal-Bench (BKB) target words in babble, car, or rain noise. The training group completed speech-in-babble noise training on three consecutive days between the pre- and post-tests. A follow up session was conducted between 8 and 18 weeks after the post-test session (training group: n = 9; control group: n = 7). Results Participants who received training had significantly higher post-test word identification accuracy than control participants for all three types of noise, although benefits were greatest for the babble noise condition and weaker for the car- and rain-noise conditions. Both training and control groups maintained their pre- to post-test improvement over a period of several weeks for speech in babble noise, but returned to pre-test accuracy for speech in car and rain noise. Conclusion The findings show that training benefits can show some generalization from speech-in-babble noise to speech in other types of environmental noise. Both groups sustained their learning over a period of several weeks for speech-in-babble noise. As the control group received equal exposure to all three noise types, the sustained learning with babble noise, but not other noises, implies that a structural feature of babble noise was conducive to the sustained improvement. These findings emphasize the importance of considering the background noise as well as the target stimuli in auditory perceptual learning studies.
Collapse
Affiliation(s)
- Liping Zhang
- Department of Otolaryngology-Head and Neck Surgery, Shandong Provincial ENT Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China.,Warwick Manufacturing Group, University of Warwick, Coventry, United Kingdom
| | | | - James Harte
- Warwick Manufacturing Group, University of Warwick, Coventry, United Kingdom.,Interacoustics Research Unit, Technical University of Denmark, Lyngby, Denmark
| | - Katherine L Roberts
- Department of Psychology, University of Warwick, Coventry, United Kingdom.,Department of Psychology, Nottingham Trent University, Nottingham, United Kingdom
| |
Collapse
|
20
|
Compression and amplification algorithms in hearing aids impair the selectivity of neural responses to speech. Nat Biomed Eng 2021; 6:717-730. [PMID: 33941898 PMCID: PMC7612903 DOI: 10.1038/s41551-021-00707-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 02/25/2021] [Indexed: 02/07/2023]
Abstract
In quiet environments, hearing aids improve the perception of low-intensity sounds. However, for high-intensity sounds in background noise, the aids often fail to provide a benefit to the wearer. Here, by using large-scale single-neuron recordings from hearing-impaired gerbils — an established animal model of human hearing — we show that hearing aids restore the sensitivity of neural responses to speech, but not their selectivity. Rather than reflecting a deficit in supra-threshold auditory processing, the low selectivity is a consequence of hearing-aid compression (which decreases the spectral and temporal contrasts of incoming sound) and of amplification (which distorts neural responses, regardless of whether hearing is impaired). Processing strategies that avoid the trade-off between neural sensitivity and selectivity should improve the performance of hearing aids.
Collapse
|
21
|
Herrera-Esposito D, Coen-Cagli R, Gomez-Sena L. Flexible contextual modulation of naturalistic texture perception in peripheral vision. J Vis 2021; 21:1. [PMID: 33393962 PMCID: PMC7794279 DOI: 10.1167/jov.21.1.1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 12/01/2020] [Indexed: 11/24/2022] Open
Abstract
Peripheral vision comprises most of our visual field, and is essential in guiding visual behavior. Its characteristic capabilities and limitations, which distinguish it from foveal vision, have been explained by the most influential theory of peripheral vision as the product of representing the visual input using summary statistics. Despite its success, this account may provide a limited understanding of peripheral vision, because it neglects processes of perceptual grouping and segmentation. To test this hypothesis, we studied how contextual modulation, namely the modulation of the perception of a stimulus by its surrounds, interacts with segmentation in human peripheral vision. We used naturalistic textures, which are directly related to summary-statistics representations. We show that segmentation cues affect contextual modulation, and that this is not captured by our implementation of the summary-statistics model. We then characterize the effects of different texture statistics on contextual modulation, providing guidance for extending the model, as well as for probing neural mechanisms of peripheral vision.
Collapse
Affiliation(s)
- Daniel Herrera-Esposito
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | - Ruben Coen-Cagli
- Department of Systems and Computational Biology and Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Leonel Gomez-Sena
- Laboratorio de Neurociencias, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| |
Collapse
|
22
|
McPherson MJ, McDermott JH. Time-dependent discrimination advantages for harmonic sounds suggest efficient coding for memory. Proc Natl Acad Sci U S A 2020; 117:32169-32180. [PMID: 33262275 PMCID: PMC7749397 DOI: 10.1073/pnas.2008956117] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Perceptual systems have finite memory resources and must store incoming signals in compressed formats. To explore whether representations of a sound's pitch might derive from this need for compression, we compared discrimination of harmonic and inharmonic sounds across delays. In contrast to inharmonic spectra, harmonic spectra can be summarized, and thus compressed, using their fundamental frequency (f0). Participants heard two sounds and judged which was higher. Despite being comparable for sounds presented back-to-back, discrimination was better for harmonic than inharmonic stimuli when sounds were separated in time, implicating memory representations unique to harmonic sounds. Patterns of individual differences (correlations between thresholds in different conditions) indicated that listeners use different representations depending on the time delay between sounds, directly comparing the spectra of temporally adjacent sounds, but transitioning to comparing f0s across delays. The need to store sound in memory appears to determine reliance on f0-based pitch and may explain its importance in music, in which listeners must extract relationships between notes separated in time.
Collapse
Affiliation(s)
- Malinda J McPherson
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02115
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Program in Speech and Hearing Bioscience and Technology, Harvard University, Boston, MA 02115
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
23
|
Vacher J, Davila A, Kohn A, Coen-Cagli R. Texture Interpolation for Probing Visual Perception. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2020; 33:22146-22157. [PMID: 36420050 PMCID: PMC9681139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Texture synthesis models are important tools for understanding visual processing. In particular, statistical approaches based on neurally relevant features have been instrumental in understanding aspects of visual perception and of neural coding. New deep learning-based approaches further improve the quality of synthetic textures. Yet, it is still unclear why deep texture synthesis performs so well, and applications of this new framework to probe visual perception are scarce. Here, we show that distributions of deep convolutional neural network (CNN) activations of a texture are well described by elliptical distributions and therefore, following optimal transport theory, constraining their mean and covariance is sufficient to generate new texture samples. Then, we propose the natural geodesics (i.e. the shortest path between two points) arising with the optimal transport metric to interpolate between arbitrary textures. Compared to other CNN-based approaches, our interpolation method appears to match more closely the geometry of texture perception, and our mathematical framework is better suited to study its statistical nature. We apply our method by measuring the perceptual scale associated to the interpolation parameter in human observers, and the neural sensitivity of different areas of visual cortex in macaque monkeys.
Collapse
Affiliation(s)
- Jonathan Vacher
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, 10461 Bronx, NY, USA
| | - Aida Davila
- Albert Einstein College of Medicine, Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Adam Kohn
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| | - Ruben Coen-Cagli
- Albert Einstein College of Medicine, Dept. of Systems and Comp. Biology, and Dominick P. Purpura Dept. of Neuroscience, 10461 Bronx, NY, USA
| |
Collapse
|
24
|
Thoret E, Varnet L, Boubenec Y, Férriere R, Le Tourneau FM, Krause B, Lorenzi C. Characterizing amplitude and frequency modulation cues in natural soundscapes: A pilot study on four habitats of a biosphere reserve. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:3260. [PMID: 32486802 DOI: 10.1121/10.0001174] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 04/13/2020] [Indexed: 06/11/2023]
Abstract
Natural soundscapes correspond to the acoustical patterns produced by biological and geophysical sound sources at different spatial and temporal scales for a given habitat. This pilot study aims to characterize the temporal-modulation information available to humans when perceiving variations in soundscapes within and across natural habitats. This is addressed by processing soundscapes from a previous study [Krause, Gage, and Joo. (2011). Landscape Ecol. 26, 1247] via models of human auditory processing extracting modulation at the output of cochlear filters. The soundscapes represent combinations of elevation, animal, and vegetation diversity in four habitats of the biosphere reserve in the Sequoia National Park (Sierra Nevada, USA). Bayesian statistical analysis and support vector machine classifiers indicate that: (i) amplitude-modulation (AM) and frequency-modulation (FM) spectra distinguish the soundscapes associated with each habitat; and (ii) for each habitat, diurnal and seasonal variations are associated with salient changes in AM and FM cues at rates between about 1 and 100 Hz in the low (<0.5 kHz) and high (>1-3 kHz) audio-frequency range. Support vector machine classifications further indicate that soundscape variations can be classified accurately based on these perceptually inspired representations.
Collapse
Affiliation(s)
- Etienne Thoret
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Léo Varnet
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Yves Boubenec
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| | - Régis Férriere
- Institut de Biologie de l'Ecole Normale Supérieure (IBENS), Université Paris Sciences et Lettres, CNRS, INSERM Paris, 75005, France
| | - François-Michel Le Tourneau
- International Center for Interdisciplinary Global Environmental Studies (iGLOBES), UMI 3157 CNRS, École normale supérieure, Université Paris Sciences et Lettres, University of Arizona, Tucson, Arizona 85721, USA
| | - Bernie Krause
- Wild Sanctuary, P.O. Box 536, Glen Ellen, California 95442, USA
| | - Christian Lorenzi
- Laboratoire des systèmes perceptifs, UMR CNRS 8248, Département d'Etudes Cognitives, École normale supérieure, Université Paris Sciences et Lettres, 29 rue d'Ulm Paris, 75005, France
| |
Collapse
|
25
|
Młynarski W, McDermott JH. Ecological origins of perceptual grouping principles in the auditory system. Proc Natl Acad Sci U S A 2019; 116:25355-25364. [PMID: 31754035 PMCID: PMC6911196 DOI: 10.1073/pnas.1903887116] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Events and objects in the world must be inferred from sensory signals to support behavior. Because sensory measurements are temporally and spatially local, the estimation of an object or event can be viewed as the grouping of these measurements into representations of their common causes. Perceptual grouping is believed to reflect internalized regularities of the natural environment, yet grouping cues have traditionally been identified using informal observation and investigated using artificial stimuli. The relationship of grouping to natural signal statistics has thus remained unclear, and additional or alternative cues remain possible. Here, we develop a general methodology for relating grouping to natural sensory signals and apply it to derive auditory grouping cues from natural sounds. We first learned local spectrotemporal features from natural sounds and measured their co-occurrence statistics. We then learned a small set of stimulus properties that could predict the measured feature co-occurrences. The resulting cues included established grouping cues, such as harmonic frequency relationships and temporal coincidence, but also revealed previously unappreciated grouping principles. Human perceptual grouping was predicted by natural feature co-occurrence, with humans relying on the derived grouping cues in proportion to their informativity about co-occurrence in natural sounds. The results suggest that auditory grouping is adapted to natural stimulus statistics, show how these statistics can reveal previously unappreciated grouping phenomena, and provide a framework for studying grouping in natural signals.
Collapse
Affiliation(s)
- Wiktor Młynarski
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA 02139
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA 02139
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Boston, MA 02115
| |
Collapse
|
26
|
Illusory sound texture reveals multi-second statistical completion in auditory scene analysis. Nat Commun 2019; 10:5096. [PMID: 31704913 PMCID: PMC6841952 DOI: 10.1038/s41467-019-12893-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Accepted: 10/03/2019] [Indexed: 12/27/2022] Open
Abstract
Sound sources in the world are experienced as stable even when intermittently obscured, implying perceptual completion mechanisms that “fill in” missing sensory information. We demonstrate a filling-in phenomenon in which the brain extrapolates the statistics of background sounds (textures) over periods of several seconds when they are interrupted by another sound, producing vivid percepts of illusory texture. The effect differs from previously described completion effects in that 1) the extrapolated sound must be defined statistically given the stochastic nature of texture, and 2) the effect lasts much longer, enabling introspection and facilitating assessment of the underlying representation. Illusory texture biases subsequent texture statistic estimates indistinguishably from actual texture, suggesting that it is represented similarly to actual texture. The illusion appears to represent an inference about whether the background is likely to continue during concurrent sounds, providing a stable statistical representation of the ongoing environment despite unstable sensory evidence. Auditory textures are sounds defined by a particular statistical distribution, e.g. as is produced by rain, or a swarm of insects. Here, the authors describe a striking perceptual illusion in which sound textures are heard to continue, even though they have in fact been replaced by white noise.
Collapse
|
27
|
Sadeghi M, Zhai X, Stevenson IH, Escabí MA. A neural ensemble correlation code for sound category identification. PLoS Biol 2019; 17:e3000449. [PMID: 31574079 PMCID: PMC6788721 DOI: 10.1371/journal.pbio.3000449] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 10/11/2019] [Accepted: 09/03/2019] [Indexed: 12/25/2022] Open
Abstract
Humans and other animals effortlessly identify natural sounds and categorize them into behaviorally relevant categories. Yet, the acoustic features and neural transformations that enable sound recognition and the formation of perceptual categories are largely unknown. Here, using multichannel neural recordings in the auditory midbrain of unanesthetized female rabbits, we first demonstrate that neural ensemble activity in the auditory midbrain displays highly structured correlations that vary with distinct natural sound stimuli. These stimulus-driven correlations can be used to accurately identify individual sounds using single-response trials, even when the sounds do not differ in their spectral content. Combining neural recordings and an auditory model, we then show how correlations between frequency-organized auditory channels can contribute to discrimination of not just individual sounds but sound categories. For both the model and neural data, spectral and temporal correlations achieved similar categorization performance and appear to contribute equally. Moreover, both the neural and model classifiers achieve their best task performance when they accumulate evidence over a time frame of approximately 1-2 seconds, mirroring human perceptual trends. These results together suggest that time-frequency correlations in sounds may be reflected in the correlations between auditory midbrain ensembles and that these correlations may play an important role in the identification and categorization of natural sounds.
Collapse
Affiliation(s)
- Mina Sadeghi
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Xiu Zhai
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ian H. Stevenson
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Monty A. Escabí
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
28
|
Kell AJE, McDermott JH. Invariance to background noise as a signature of non-primary auditory cortex. Nat Commun 2019; 10:3958. [PMID: 31477711 PMCID: PMC6718388 DOI: 10.1038/s41467-019-11710-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 07/30/2019] [Indexed: 12/22/2022] Open
Abstract
Despite well-established anatomical differences between primary and non-primary auditory cortex, the associated representational transformations have remained elusive. Here we show that primary and non-primary auditory cortex are differentiated by their invariance to real-world background noise. We measured fMRI responses to natural sounds presented in isolation and in real-world noise, quantifying invariance as the correlation between the two responses for individual voxels. Non-primary areas were substantially more noise-invariant than primary areas. This primary-nonprimary difference occurred both for speech and non-speech sounds and was unaffected by a concurrent demanding visual task, suggesting that the observed invariance is not specific to speech processing and is robust to inattention. The difference was most pronounced for real-world background noise-both primary and non-primary areas were relatively robust to simple types of synthetic noise. Our results suggest a general representational transformation between auditory cortical stages, illustrating a representational consequence of hierarchical organization in the auditory system.
Collapse
Affiliation(s)
- Alexander J E Kell
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, 02139, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, 02139, USA.
- Zuckerman Institute of Mind, Brain, and Behavior, Columbia University, New York, NY, 10027, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, 02139, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, 02139, USA.
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Boston, MA, USA.
| |
Collapse
|
29
|
Zeng H, Chen L. Robust Temporal Averaging of Time Intervals Between Action and Sensation. Front Psychol 2019; 10:511. [PMID: 30941074 PMCID: PMC6433714 DOI: 10.3389/fpsyg.2019.00511] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2018] [Accepted: 02/20/2019] [Indexed: 11/13/2022] Open
Abstract
Perception of the time interval between one’s own action (a finger tapping) and the associated sensory feedback (a visual flash or an auditory beep) is critical for precise and flexible control of action and behavioral decision. Previous studies have examined temporal averaging for multiple time intervals and its role for perceptual organization and crossmodal integration. In the present study, we extended the temporal averaging from sensory stimuli to the coupling of action and its sensory feedback. We investigated whether and how temporal averaging could be achieved with respect to the multiple intervals in a sequence of action-sensory feedback events, and hence affect the subsequent timing behavior. In unimodal task, participants voluntarily tapped their index finger at a constant pace while receiving auditory feedback (beeps) with varied intervals as well as variances throughout the sequence. In crossmodal task, for a given sequence, each tap was accompanied randomly with either visual flash or auditory beep as sensory feedback. When the sequence was over, observers produced a subsequent tap with either auditory or visual stimulus, which enclose a probe interval. In both tasks, participants were required to make a two alternative forced choice (2AFC), to indicate whether the target interval is shorter or longer than the mean interval between taps and their associated sensory events in the preceding sequence. In both scenarios, participants’ judgments of the probe interval suggested that they had internalized the mean interval associated with specific bindings of action and sensation, showing a robust temporal averaging process for the interval between action and sensation.
Collapse
|
30
|
|
31
|
Norman-Haignere SV, McDermott JH. Neural responses to natural and model-matched stimuli reveal distinct computations in primary and nonprimary auditory cortex. PLoS Biol 2018; 16:e2005127. [PMID: 30507943 PMCID: PMC6292651 DOI: 10.1371/journal.pbio.2005127] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 12/13/2018] [Accepted: 11/08/2018] [Indexed: 11/19/2022] Open
Abstract
A central goal of sensory neuroscience is to construct models that can explain neural responses to natural stimuli. As a consequence, sensory models are often tested by comparing neural responses to natural stimuli with model responses to those stimuli. One challenge is that distinct model features are often correlated across natural stimuli, and thus model features can predict neural responses even if they do not in fact drive them. Here, we propose a simple alternative for testing a sensory model: we synthesize a stimulus that yields the same model response as each of a set of natural stimuli, and test whether the natural and "model-matched" stimuli elicit the same neural responses. We used this approach to test whether a common model of auditory cortex-in which spectrogram-like peripheral input is processed by linear spectrotemporal filters-can explain fMRI responses in humans to natural sounds. Prior studies have that shown that this model has good predictive power throughout auditory cortex, but this finding could reflect feature correlations in natural stimuli. We observed that fMRI responses to natural and model-matched stimuli were nearly equivalent in primary auditory cortex (PAC) but that nonprimary regions, including those selective for music or speech, showed highly divergent responses to the two sound sets. This dissociation between primary and nonprimary regions was less clear from model predictions due to the influence of feature correlations across natural stimuli. Our results provide a signature of hierarchical organization in human auditory cortex, and suggest that nonprimary regions compute higher-order stimulus properties that are not well captured by traditional models. Our methodology enables stronger tests of sensory models and could be broadly applied in other domains.
Collapse
Affiliation(s)
- Sam V. Norman-Haignere
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Zuckerman Institute of Mind, Brain and Behavior, Columbia University, New York, New York, United States of America
- Laboratoire des Sytèmes Perceptifs, Département d’Études Cognitives, ENS, PSL University, CNRS, Paris France
| | - Josh H. McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, Massachusetts, United States of America
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
| |
Collapse
|
32
|
Abstract
Human listeners appear to represent the textures of sounds through a process of automatic time averaging that exists beyond volition. This process distils likely background sounds into their summary statistics, a computationally efficient way of dealing with complex auditory scenes.
Collapse
Affiliation(s)
- David McAlpine
- Department of Linguistics, and The Australian Hearing Hub, Macquarie University, 16 University Avenue, Sydney, NSW, 2109, Australia.
| |
Collapse
|