1
|
Clonan AC, Zhai X, Stevenson IH, Escabí MA. Interference of mid-level sound statistics underlie human speech recognition sensitivity in natural noise. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.13.579526. [PMID: 38405870 PMCID: PMC10888804 DOI: 10.1101/2024.02.13.579526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
Recognizing speech in noise, such as in a busy restaurant, is an essential cognitive skill where the task difficulty varies across environments and noise levels. Although there is growing evidence that the auditory system relies on statistical representations for perceiving 1-5 and coding4,6-9 natural sounds, it's less clear how statistical cues and neural representations contribute to segregating speech in natural auditory scenes. We demonstrate that human listeners rely on mid-level statistics to segregate and recognize speech in environmental noise. Using natural backgrounds and variants with perturbed spectro-temporal statistics, we show that speech recognition accuracy at a fixed noise level varies extensively across natural backgrounds (0% to 100%). Furthermore, for each background the unique interference created by summary statistics can mask or unmask speech, thus hindering or improving speech recognition. To identify the neural coding strategy and statistical cues that influence accuracy, we developed generalized perceptual regression, a framework that links summary statistics from a neural model to word recognition accuracy. Whereas a peripheral cochlear model accounts for only 60% of perceptual variance, summary statistics from a mid-level auditory midbrain model accurately predicts single trial sensory judgments, accounting for more than 90% of the perceptual variance. Furthermore, perceptual weights from the regression framework identify which statistics and tuned neural filters are influential and how they impact recognition. Thus, perception of speech in natural backgrounds relies on a mid-level auditory representation involving interference of multiple summary statistics that impact recognition beneficially or detrimentally across natural background sounds.
Collapse
Affiliation(s)
- Alex C Clonan
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Xiu Zhai
- Biomedical Engineering, Wentworth Institute of Technology, Boston, MA 02115
| | - Ian H Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| | - Monty A Escabí
- Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269
- Psychological Sciences, University of Connecticut, Storrs, CT 06269
- Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, CT 06269
| |
Collapse
|
2
|
He F, Stevenson IH, Escabí MA. Two stages of bandwidth scaling drives efficient neural coding of natural sounds. PLoS Comput Biol 2023; 19:e1010862. [PMID: 36787338 PMCID: PMC9970106 DOI: 10.1371/journal.pcbi.1010862] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 02/27/2023] [Accepted: 01/09/2023] [Indexed: 02/15/2023] Open
Abstract
Theories of efficient coding propose that the auditory system is optimized for the statistical structure of natural sounds, yet the transformations underlying optimal acoustic representations are not well understood. Using a database of natural sounds including human speech and a physiologically-inspired auditory model, we explore the consequences of peripheral (cochlear) and mid-level (auditory midbrain) filter tuning transformations on the representation of natural sound spectra and modulation statistics. Whereas Fourier-based sound decompositions have constant time-frequency resolution at all frequencies, cochlear and auditory midbrain filters bandwidths increase proportional to the filter center frequency. This form of bandwidth scaling produces a systematic decrease in spectral resolution and increase in temporal resolution with increasing frequency. Here we demonstrate that cochlear bandwidth scaling produces a frequency-dependent gain that counteracts the tendency of natural sound power to decrease with frequency, resulting in a whitened output representation. Similarly, bandwidth scaling in mid-level auditory filters further enhances the representation of natural sounds by producing a whitened modulation power spectrum (MPS) with higher modulation entropy than both the cochlear outputs and the conventional Fourier MPS. These findings suggest that the tuning characteristics of the peripheral and mid-level auditory system together produce a whitened output representation in three dimensions (frequency, temporal and spectral modulation) that reduces redundancies and allows for a more efficient use of neural resources. This hierarchical multi-stage tuning strategy is thus likely optimized to extract available information and may underlies perceptual sensitivity to natural sounds.
Collapse
Affiliation(s)
- Fengrong He
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ian H. Stevenson
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Monty A. Escabí
- Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- The Connecticut Institute for Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
3
|
Spiking network optimized for word recognition in noise predicts auditory system hierarchy. PLoS Comput Biol 2020; 16:e1007558. [PMID: 32559204 PMCID: PMC7329140 DOI: 10.1371/journal.pcbi.1007558] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 07/01/2020] [Accepted: 11/22/2019] [Indexed: 11/21/2022] Open
Abstract
The auditory neural code is resilient to acoustic variability and capable of recognizing sounds amongst competing sound sources, yet, the transformations enabling noise robust abilities are largely unknown. We report that a hierarchical spiking neural network (HSNN) optimized to maximize word recognition accuracy in noise and multiple talkers predicts organizational hierarchy of the ascending auditory pathway. Comparisons with data from auditory nerve, midbrain, thalamus and cortex reveals that the optimal HSNN predicts several transformations of the ascending auditory pathway including a sequential loss of temporal resolution and synchronization ability, increasing sparseness, and selectivity. The optimal organizational scheme enhances performance by selectively filtering out noise and fast temporal cues such as voicing periodicity, that are not directly relevant to the word recognition task. An identical network arranged to enable high information transfer fails to predict auditory pathway organization and has substantially poorer performance. Furthermore, conventional single-layer linear and nonlinear receptive field networks that capture the overall feature extraction of the HSNN fail to achieve similar performance. The findings suggest that the auditory pathway hierarchy and its sequential nonlinear feature extraction computations enhance relevant cues while removing non-informative sources of noise, thus enhancing the representation of sounds in noise impoverished conditions. The brain’s ability to recognize sounds in the presence of competing sounds or background noise is essential for everyday hearing tasks. How the brain accomplishes noise resiliency, however, is poorly understood. Using neural recordings from the ascending auditory pathway and an auditory spiking network model trained for sound recognition in noise we explore the computational strategies that enable noise robustness. Our results suggest that the hierarchical feature organization of the ascending auditory pathway and the resulting computations are critical for sound recognition in the presence of noise.
Collapse
|
4
|
Cai H, Dent ML. Best sensitivity of temporal modulation transfer functions in laboratory mice matches the amplitude modulation embedded in vocalizations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:337. [PMID: 32006990 PMCID: PMC7043865 DOI: 10.1121/10.0000583] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Revised: 12/18/2019] [Accepted: 12/22/2019] [Indexed: 06/10/2023]
Abstract
The perception of spectrotemporal changes is crucial for distinguishing between acoustic signals, including vocalizations. Temporal modulation transfer functions (TMTFs) have been measured in many species and reveal that the discrimination of amplitude modulation suffers at rapid modulation frequencies. TMTFs were measured in six CBA/CaJ mice in an operant conditioning procedure, where mice were trained to discriminate an 800 ms amplitude modulated white noise target from a continuous noise background. TMTFs of mice show a bandpass characteristic, with an upper limit cutoff frequency of around 567 Hz. Within the measured modulation frequencies ranging from 5 Hz to 1280 Hz, the mice show a best sensitivity for amplitude modulation at around 160 Hz. To look for a possible parallel evolution between sound perception and production in living organisms, we also analyzed the components of amplitude modulations embedded in natural ultrasonic vocalizations (USVs) emitted by this strain. We found that the cutoff frequency of amplitude modulation in most of the individual USVs is around their most sensitive range obtained from the psychoacoustic experiments. Further analyses of the duration and modulation frequency ranges of USVs indicated that the broader the frequency ranges of amplitude modulation in natural USVs, the shorter the durations of the USVs.
Collapse
Affiliation(s)
- Huaizhen Cai
- Department of Psychology, University at Buffalo-SUNY, Buffalo, New York 14260, USA
| | - Micheal L Dent
- Department of Psychology, University at Buffalo-SUNY, Buffalo, New York 14260, USA
| |
Collapse
|
5
|
Sadeghi M, Zhai X, Stevenson IH, Escabí MA. A neural ensemble correlation code for sound category identification. PLoS Biol 2019; 17:e3000449. [PMID: 31574079 PMCID: PMC6788721 DOI: 10.1371/journal.pbio.3000449] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 10/11/2019] [Accepted: 09/03/2019] [Indexed: 12/25/2022] Open
Abstract
Humans and other animals effortlessly identify natural sounds and categorize them into behaviorally relevant categories. Yet, the acoustic features and neural transformations that enable sound recognition and the formation of perceptual categories are largely unknown. Here, using multichannel neural recordings in the auditory midbrain of unanesthetized female rabbits, we first demonstrate that neural ensemble activity in the auditory midbrain displays highly structured correlations that vary with distinct natural sound stimuli. These stimulus-driven correlations can be used to accurately identify individual sounds using single-response trials, even when the sounds do not differ in their spectral content. Combining neural recordings and an auditory model, we then show how correlations between frequency-organized auditory channels can contribute to discrimination of not just individual sounds but sound categories. For both the model and neural data, spectral and temporal correlations achieved similar categorization performance and appear to contribute equally. Moreover, both the neural and model classifiers achieve their best task performance when they accumulate evidence over a time frame of approximately 1-2 seconds, mirroring human perceptual trends. These results together suggest that time-frequency correlations in sounds may be reflected in the correlations between auditory midbrain ensembles and that these correlations may play an important role in the identification and categorization of natural sounds.
Collapse
Affiliation(s)
- Mina Sadeghi
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Xiu Zhai
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
| | - Ian H. Stevenson
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
| | - Monty A. Escabí
- Department of Electrical and Computer Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Biomedical Engineering, University of Connecticut, Storrs, Connecticut, United States of America
- Department of Psychological Sciences, University of Connecticut, Storrs, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
6
|
Luo J, Macias S, Ness TV, Einevoll GT, Zhang K, Moss CF. Neural timing of stimulus events with microsecond precision. PLoS Biol 2018; 16:e2006422. [PMID: 30365484 PMCID: PMC6221347 DOI: 10.1371/journal.pbio.2006422] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2018] [Revised: 11/07/2018] [Accepted: 10/10/2018] [Indexed: 12/29/2022] Open
Abstract
Temporal analysis of sound is fundamental to auditory processing throughout the animal kingdom. Echolocating bats are powerful models for investigating the underlying mechanisms of auditory temporal processing, as they show microsecond precision in discriminating the timing of acoustic events. However, the neural basis for microsecond auditory discrimination in bats has eluded researchers for decades. Combining extracellular recordings in the midbrain inferior colliculus (IC) and mathematical modeling, we show that microsecond precision in registering stimulus events emerges from synchronous neural firing, revealed through low-latency variability of stimulus-evoked extracellular field potentials (EFPs, 200–600 Hz). The temporal precision of the EFP increases with the number of neurons firing in synchrony. Moreover, there is a functional relationship between the temporal precision of the EFP and the spectrotemporal features of the echolocation calls. In addition, EFP can measure the time difference of simulated echolocation call–echo pairs with microsecond precision. We propose that synchronous firing of populations of neurons operates in diverse species to support temporal analysis for auditory localization and complex sound processing. We routinely rely on a stopwatch to precisely measure the time it takes for an athlete to reach the finish line. Without the assistance of such a timing device, our measurement of elapsed time becomes imprecise. By contrast, some animals, such as echolocating bats, naturally perform timing tasks with remarkable precision. Behavioral research has shown that echolocating bats can estimate the elapsed time between sonar cries and echo returns with a precision in the range of microseconds. However, the neural basis for such microsecond precision has remained a puzzle to scientists. Combining extracellular recordings in the bat’s inferior colliculus (IC)—a midbrain nucleus of the auditory pathway—and mathematical modeling, we show that microsecond precision in registering stimulus events emerges from synchronous neural firing. Our recordings revealed a low-latency variability of stimulus-evoked extracellular field potentials (EFPs), which, according to our mathematical modeling, was determined by the number of firing neurons and their synchrony. Moreover, the acoustic features of echolocation calls, such as signal duration and bandwidth, which the bat dynamically modulates during prey capture, also modulate the precision of EFPs. These findings have broad implications for understanding temporal analysis of acoustic signals in a wide range of auditory behaviors across the animal kingdom.
Collapse
Affiliation(s)
- Jinhong Luo
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (JL); (CFM)
| | - Silvio Macias
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Torbjørn V. Ness
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
| | - Gaute T. Einevoll
- Faculty of Science and Technology, Norwegian University of Life Sciences, Ås, Norway
- Department of Physics, University of Oslo, Oslo, Norway
| | - Kechen Zhang
- Department of Biomedical Engineering, Johns Hopkins School of Medicine, Baltimore, Maryland, United States of America
| | - Cynthia F. Moss
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland, United States of America
- * E-mail: (JL); (CFM)
| |
Collapse
|