1
|
Mammalian octopus cells are direction selective to frequency sweeps by excitatory synaptic sequence detection. Proc Natl Acad Sci U S A 2022; 119:e2203748119. [PMID: 36279465 PMCID: PMC9636937 DOI: 10.1073/pnas.2203748119] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Octopus cells are remarkable projection neurons of the mammalian cochlear nucleus, with extremely fast membranes and wide-frequency tuning. They are considered prime examples of coincidence detectors but are poorly characterized in vivo. We discover that octopus cells are selective to frequency sweep direction, a feature that is absent in their auditory nerve inputs. In vivo intracellular recordings reveal that direction selectivity does not derive from across-frequency coincidence detection but hinges on the amplitudes and activation sequence of auditory nerve inputs tuned to clusters of hot spot frequencies. A simple biophysical octopus cell model excited with real nerve spike trains recreates direction selectivity through interaction of intrinsic membrane conductances with the activation sequence of clustered excitatory inputs. We conclude that octopus cells are sequence detectors, sensitive to temporal patterns across cochlear frequency channels. The detection of sequences rather than coincidences is a much simpler but powerful operation to extract temporal information.
Collapse
|
2
|
Oberle HM, Ford AN, Dileepkumar D, Czarny J, Apostolides PF. Synaptic mechanisms of top-down control in the non-lemniscal inferior colliculus. eLife 2022; 10:e72730. [PMID: 34989674 PMCID: PMC8735864 DOI: 10.7554/elife.72730] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 12/19/2021] [Indexed: 01/05/2023] Open
Abstract
Corticofugal projections to evolutionarily ancient, subcortical structures are ubiquitous across mammalian sensory systems. These 'descending' pathways enable the neocortex to control ascending sensory representations in a predictive or feedback manner, but the underlying cellular mechanisms are poorly understood. Here, we combine optogenetic approaches with in vivo and in vitro patch-clamp electrophysiology to study the projection from mouse auditory cortex to the inferior colliculus (IC), a major descending auditory pathway that controls IC neuron feature selectivity, plasticity, and auditory perceptual learning. Although individual auditory cortico-collicular synapses were generally weak, IC neurons often integrated inputs from multiple corticofugal axons that generated reliable, tonic depolarizations even during prolonged presynaptic activity. Latency measurements in vivo showed that descending signals reach the IC within 30 ms of sound onset, which in IC neurons corresponded to the peak of synaptic depolarizations evoked by short sounds. Activating ascending and descending pathways at latencies expected in vivo caused a NMDA receptor-dependent, supralinear excitatory postsynaptic potential summation, indicating that descending signals can nonlinearly amplify IC neurons' moment-to-moment acoustic responses. Our results shed light upon the synaptic bases of descending sensory control and imply that heterosynaptic cooperativity contributes to the auditory cortico-collicular pathway's role in plasticity and perceptual learning.
Collapse
Affiliation(s)
- Hannah M Oberle
- Kresge Hearing Research Institute & Department of Otolaryngology, University of MichiganAnn ArborUnited States
- Neuroscience Graduate Program, University of MichiganAnn ArborUnited States
| | - Alexander N Ford
- Kresge Hearing Research Institute & Department of Otolaryngology, University of MichiganAnn ArborUnited States
| | - Deepak Dileepkumar
- Kresge Hearing Research Institute & Department of Otolaryngology, University of MichiganAnn ArborUnited States
| | - Jordyn Czarny
- Kresge Hearing Research Institute & Department of Otolaryngology, University of MichiganAnn ArborUnited States
| | - Pierre F Apostolides
- Kresge Hearing Research Institute & Department of Otolaryngology, University of MichiganAnn ArborUnited States
- Molecular and Integrative Physiology, University of Michigan Medical SchoolAnn ArborUnited States
| |
Collapse
|
3
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
4
|
Robust Rate-Place Coding of Resolved Components in Harmonic and Inharmonic Complex Tones in Auditory Midbrain. J Neurosci 2020; 40:2080-2093. [PMID: 31996454 DOI: 10.1523/jneurosci.2337-19.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 01/12/2020] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Harmonic complex tones (HCTs) commonly occurring in speech and music evoke a strong pitch at their fundamental frequency (F0), especially when they contain harmonics individually resolved by the cochlea. When all frequency components of an HCT are shifted by the same amount, the pitch of the resulting inharmonic tone (IHCT) can also shift, although the envelope repetition rate is unchanged. A rate-place code, whereby resolved harmonics are represented by local maxima in firing rates along the tonotopic axis, has been characterized in the auditory nerve and primary auditory cortex, but little is known about intermediate processing stages. We recorded single-neuron responses to HCT and IHCT with varying F0 and sound level in the inferior colliculus (IC) of unanesthetized rabbits of both sexes. Many neurons showed peaks in firing rate when a low-numbered harmonic aligned with the neuron's characteristic frequency, demonstrating "rate-place" coding. The IC rate-place code was most prevalent for F0 > 800 Hz, was only moderately dependent on sound level over a 40 dB range, and was not sensitive to stimulus harmonicity. A spectral receptive-field model incorporating broadband inhibition better predicted the neural responses than a purely excitatory model, suggesting an enhancement of the rate-place representation by inhibition. Some IC neurons showed facilitation in response to HCT relative to pure tones, similar to cortical "harmonic template neurons" (Feng and Wang, 2017), but to a lesser degree. Our findings shed light on the transformation of rate-place coding of resolved harmonics along the auditory pathway.SIGNIFICANCE STATEMENT Harmonic complex tones are ubiquitous in speech and music and produce strong pitch percepts when they contain frequency components that are individually resolved by the cochlea. Here, we characterize a "rate-place" code for resolved harmonics in the auditory midbrain that is more robust across sound levels than the peripheral rate-place code and insensitive to the harmonic relationships among frequency components. We use a computational model to show that inhibition may play an important role in shaping the rate-place code. Our study fills a major gap in understanding the transformations in neural representations of resolved harmonics along the auditory pathway.
Collapse
|
5
|
Su Y, Delgutte B. Pitch of harmonic complex tones: rate and temporal coding of envelope repetition rate in inferior colliculus of unanesthetized rabbits. J Neurophysiol 2019; 122:2468-2485. [PMID: 31664871 DOI: 10.1152/jn.00512.2019] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Harmonic complex tones (HCTs) found in speech, music, and animal vocalizations evoke strong pitch percepts at their fundamental frequencies. The strongest pitches are produced by HCTs that contain harmonics resolved by cochlear frequency analysis, but HCTs containing solely unresolved harmonics also evoke a weaker pitch at their envelope repetition rate (ERR). In the auditory periphery, neurons phase lock to the stimulus envelope, but this temporal representation of ERR degrades and gives way to rate codes along the ascending auditory pathway. To assess the role of the inferior colliculus (IC) in such transformations, we recorded IC neuron responses to HCT and sinusoidally modulated broadband noise (SAMN) with varying ERR from unanesthetized rabbits. Different interharmonic phase relationships of HCT were used to manipulate the temporal envelope without changing the power spectrum. Many IC neurons demonstrated band-pass rate tuning to ERR between 60 and 1,600 Hz for HCT and between 40 and 500 Hz for SAMN. The tuning was not related to the pure-tone best frequency of neurons but was dependent on the shape of the stimulus envelope, indicating a temporal rather than spectral origin. A phenomenological model suggests that the tuning may arise from peripheral temporal response patterns via synaptic inhibition. We also characterized temporal coding to ERR. Some IC neurons could phase lock to the stimulus envelope up to 900 Hz for either HCT or SAMN, but phase locking was weaker with SAMN. Together, the rate code and the temporal code represent a wide range of ERR, providing strong cues for the pitch of unresolved harmonics.NEW & NOTEWORTHY Envelope repetition rate (ERR) provides crucial cues for pitch perception of frequency components that are not individually resolved by the cochlea, but the neural representation of ERR for stimuli containing many harmonics is poorly characterized. Here we show that the pitch of stimuli with unresolved harmonics is represented by both a rate code and a temporal code for ERR in auditory midbrain neurons and propose possible underlying neural mechanisms with a computational model.
Collapse
Affiliation(s)
- Yaqing Su
- Eaton-Peabody Labs, Massachusetts Eye and Ear, Boston, Massachusetts.,Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Bertrand Delgutte
- Eaton-Peabody Labs, Massachusetts Eye and Ear, Boston, Massachusetts.,Department of Otolaryngology, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
6
|
Peng F, Innes-Brown H, McKay CM, Fallon JB, Zhou Y, Wang X, Hu N, Hou W. Temporal Coding of Voice Pitch Contours in Mandarin Tones. Front Neural Circuits 2018; 12:55. [PMID: 30087597 PMCID: PMC6066958 DOI: 10.3389/fncir.2018.00055] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 06/27/2018] [Indexed: 11/13/2022] Open
Abstract
Accurate perception of time-variant pitch is important for speech recognition, particularly for tonal languages with different lexical tones such as Mandarin, in which different tones convey different semantic information. Previous studies reported that the auditory nerve and cochlear nucleus can encode different pitches through phase-locked neural activities. However, little is known about how the inferior colliculus (IC) encodes the time-variant periodicity pitch of natural speech. In this study, the Mandarin syllable /ba/ pronounced with four lexical tones (flat, rising, falling then rising and falling) were used as stimuli. Local field potentials (LFPs) and single neuron activity were simultaneously recorded from 90 sites within contralateral IC of six urethane-anesthetized and decerebrate guinea pigs in response to the four stimuli. Analysis of the temporal information of LFPs showed that 93% of the LFPs exhibited robust encoding of periodicity pitch. Pitch strength of LFPs derived from the autocorrelogram was significantly (p < 0.001) stronger for rising tones than flat and falling tones. Pitch strength are also significantly increased (p < 0.05) with the characteristic frequency (CF). On the other hand, only 47% (42 or 90) of single neuron activities were significantly synchronized to the fundamental frequency of the stimulus suggesting that the temporal spiking pattern of single IC neuron could encode the time variant periodicity pitch of speech robustly. The difference between the number of LFPs and single neurons that encode the time-variant F0 voice pitch supports the notion of a transition at the level of IC from direct temporal coding in the spike trains of individual neurons to other form of neural representation.
Collapse
Affiliation(s)
- Fei Peng
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Hamish Innes-Brown
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
| | - Colette M. McKay
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
| | - James B. Fallon
- Bionics Institute, East Melbourne, VIC, Australia
- Department of Medical Bionics Department, University of Melbourne, Melbourne, VIC, Australia
- Department of Otolaryngology, University of Melbourne, Melbourne, VIC, Australia
| | - Yi Zhou
- Chongqing Key Laboratory of Neurobiology, Department of Neurobiology, Third Military Medical University, Chongqing, China
| | - Xing Wang
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Chongqing Medical Electronics Engineering Technology Research Center, Chongqing University, Chongqing, China
| | - Ning Hu
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
| | - Wensheng Hou
- Key Laboratory of Biorheological Science and Technology of Ministry of Education, Bioengineering College of Chongqing University, Chongqing, China
- Collaborative Innovation Center for Brain Science, Chongqing University, Chongqing, China
- Chongqing Medical Electronics Engineering Technology Research Center, Chongqing University, Chongqing, China
| |
Collapse
|
7
|
Neural representations of concurrent sounds with overlapping spectra in rat inferior colliculus: Comparisons between temporal-fine structure and envelope. Hear Res 2017; 353:87-96. [PMID: 28655419 DOI: 10.1016/j.heares.2017.06.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/19/2017] [Revised: 05/21/2017] [Accepted: 06/12/2017] [Indexed: 11/24/2022]
Abstract
Perceptual segregation of multiple sounds, which overlap in both time and spectra, into individual auditory streams is critical for hearing in natural environments. Some cues such as interaural time disparities (ITDs) play an important role in the segregation, especially when sounds are separated in space. In this study, we investigated the neural representation of two uncorrelated narrowband noises that shared the identical spectrum in the rat inferior colliculus (IC) using frequency-following-response (FFR) recordings, when the ITD for each noise stimulus was manipulated. The results of this study showed that recorded FFRs exhibited two distinctive components: the fast-varying temporal fine structure (TFS) component (FFRTFS) and the slow-varying envelope component (FFRENV). When a single narrowband noise was presented alone, the FFRTFS, but not the FFRENV, was sensitive to ITDs. When two narrowband noises were presented simultaneously, the FFRTFS took advantage of the ITD disparity that was associated with perceived spatial separation between the two concurrent sounds, and displayed a better linear synchronization to the sound with an ipsilateral-leading ITD. However, no effects of ITDs were found on the FFRENV. These results suggest that the FFRTFS and FFRENV represent two distinct types of signal processing in the auditory brainstem and contribute differentially to sound segregation based on spatial cues: the FFRTFS is more critical to spatial release from masking.
Collapse
|
8
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
9
|
Sayles M, Stasiak A, Winter IM. Reverberation impairs brainstem temporal representations of voiced vowel sounds: challenging "periodicity-tagged" segregation of competing speech in rooms. Front Syst Neurosci 2015; 8:248. [PMID: 25628545 PMCID: PMC4290552 DOI: 10.3389/fnsys.2014.00248] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 12/18/2014] [Indexed: 11/26/2022] Open
Abstract
The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into “auditory objects.” Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions.
Collapse
Affiliation(s)
- Mark Sayles
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Arkadiusz Stasiak
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Ian M Winter
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
10
|
Nakamoto KT, Shackleton TM, Magezi DA, Palmer AR. A function for binaural integration in auditory grouping and segregation in the inferior colliculus. J Neurophysiol 2014; 113:1819-30. [PMID: 25540219 DOI: 10.1152/jn.00472.2014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Responses of neurons to binaural, harmonic complex stimuli in urethane-anesthetized guinea pig inferior colliculus (IC) are reported. To assess the binaural integration of harmonicity cues for sound segregation and grouping, responses were measured to harmonic complexes with different fundamental frequencies presented to each ear. Simultaneously gated harmonic stimuli with fundamental frequencies of 125 Hz and 145 Hz were presented to the left and right ears, respectively, and recordings made from 96 neurons with characteristic frequencies >2 kHz in the central nucleus of the IC. Of these units, 70 responded continuously throughout the stimulus and were excited by the stimulus at the contralateral ear. The stimulus at the ipsilateral ear excited (EE: 14%; 10/70), inhibited (EI: 33%; 23/70), or had no significant effect (EO: 53%; 37/70), defined by the effect on firing rate. The neurons phase locked to the temporal envelope at each ear to varying degrees depending on signal level. Many of the cells (predominantly EO) were dominated by the response to the contralateral stimulus. Another group (predominantly EI) synchronized to the contralateral stimulus and were suppressed by the ipsilateral stimulus in a phasic manner. A third group synchronized to the stimuli at both ears (predominantly EE). Finally, a group only responded when the waveform peaks from each ear coincided. We conclude that these groups of neurons represent different "streams" of information but exhibit modifications of the response rather than encoding a feature of the stimulus, like pitch.
Collapse
Affiliation(s)
- Kyle T Nakamoto
- Medical Research Council Institute of Hearing Research, University Park, Nottingham, United Kingdom; Department of Anatomy and Neurobiology, Northeast Ohio Medical University, Rootstown, Ohio; and
| | - Trevor M Shackleton
- Medical Research Council Institute of Hearing Research, University Park, Nottingham, United Kingdom
| | - David A Magezi
- Laboratory for Cognitive and Neurological Sciences, Neurology Unit, Department of Medicine, Faculty of Science, University of Fribourg, Fribourg, Switzerland
| | - Alan R Palmer
- Medical Research Council Institute of Hearing Research, University Park, Nottingham, United Kingdom
| |
Collapse
|
11
|
Du Y, Kong L, Wang Q, Wu X, Li L. Auditory frequency-following response: a neurophysiological measure for studying the "cocktail-party problem". Neurosci Biobehav Rev 2011; 35:2046-57. [PMID: 21645541 DOI: 10.1016/j.neubiorev.2011.05.008] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2010] [Revised: 05/12/2011] [Accepted: 05/19/2011] [Indexed: 11/19/2022]
Abstract
How do we recognize what one person is saying when others are speaking at the same time? The "cocktail-party problem" proposed by Cherry (1953) has puzzled scientific societies for half a century. This puzzle will not be solved without using appropriate neurophysiological investigation that should satisfy the following four essential requirements: (1) certain critical speech characteristics related to speech intelligibility are recorded; (2) neural responses to different speech sources are differentiated; (3) neural correlates of bottom-up binaural unmasking of responses to target speech are measurable; (4) neural correlates of attentional top-down unmasking of target speech are measurable. Before speech signals reach the cerebral cortex, some critical acoustic features are represented in subcortical structures by the frequency-following responses (FFRs), which are sustained evoked potentials based on precisely phase-locked responses of neuron populations to low-to-middle-frequency periodical acoustical stimuli. This review summarizes previous studies on FFRs associated with each of the four requirements and suggests that FFRs are useful for studying the "cocktail-party problem".
Collapse
Affiliation(s)
- Yi Du
- Department of Psychology, Speech and Hearing Research Center, Key Laboratory on Machine Perception (Ministry of Education), Peking University, Beijing, China
| | | | | | | | | |
Collapse
|
12
|
Neural correlates of auditory scene analysis based on inharmonicity in monkey primary auditory cortex. J Neurosci 2010; 30:12480-94. [PMID: 20844143 DOI: 10.1523/jneurosci.1780-10.2010] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Segregation of concurrent sounds in complex acoustic environments is a fundamental feature of auditory scene analysis. A powerful cue used by the auditory system to segregate concurrent sounds, such as speakers' voices at a cocktail party, is inharmonicity. This can be demonstrated when a component of a harmonic complex tone is perceived as a separate tone "popping out" from the complex as a whole when it is sufficiently mistuned from its harmonic value. The neural bases of perceptual "pop out" of mistuned harmonics are unclear. We recorded multiunit activity from primary auditory cortex (A1) of behaving monkeys elicited by harmonic complex tones that were either "in tune" or that contained a mistuned third harmonic set at the best frequency of the neural populations. Responses to mistuned sounds were enhanced relative to responses to "in-tune" sounds, thus correlating with the enhanced perceptual salience of the mistuned component. Consistent with human psychophysics of "pop out," response enhancements increased with the degree of mistuning, were maximal for neural populations tuned to the frequency of the mistuned component, and were not observed under comparable stimulus conditions that do not elicit perceptual "pop out." Mistuning was also associated with changes in neuronal temporal response patterns phase locked to "beats" in the stimuli. Intracortical auditory evoked potentials paralleled noninvasive neurophysiological correlates of perceptual "pop out" in humans, further augmenting the translational relevance of the results. Findings suggest two complementary neural mechanisms for "pop out," based on the detection of local differences in activation level or coherence of temporal response patterns across A1.
Collapse
|
13
|
Nakamoto KT, Shackleton TM, Palmer AR. Responses in the inferior colliculus of the guinea pig to concurrent harmonic series and the effect of inactivation of descending controls. J Neurophysiol 2010; 103:2050-61. [PMID: 20147418 DOI: 10.1152/jn.00451.2009] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
One of the fundamental questions of auditory research is how sounds are segregated because, in natural environments, multiple sounds tend to occur at the same time. Concurrent sounds, such as two talkers, physically add together and arrive at the ear as a single input sound wave. The auditory system easily segregates this input into a coherent perception of each of the multiple sources. A common feature of speech and communication calls is their harmonic structure and in this report we used two harmonic complexes to study the role of the corticofugal pathway in the processing of concurrent sounds. We demonstrate that, in the inferior colliculus (IC) of the anesthetized guinea pig, deactivation of the auditory cortex altered the temporal and/or the spike response to the concurrent, monaural harmonic complexes. More specifically, deactivating the auditory cortex altered the representation of the relative level of the complexes. This suggests that the auditory cortex modulates the representation of the level of two harmonic complexes in the IC. Since sound level is a cue used in the segregation of auditory input, the corticofugal pathway may play a role in this segregation.
Collapse
Affiliation(s)
- Kyle T Nakamoto
- College of Medicine, Northeastern Ohio Universities, 4209 State Rt. 44, P.O. Box 95, Rootstown, OH 44272-0095, USA.
| | | | | |
Collapse
|
14
|
Snyder JS, Carter OL, Hannon EE, Alain C. Adaptation reveals multiple levels of representation in auditory stream segregation. J Exp Psychol Hum Percept Perform 2009; 35:1232-44. [PMID: 19653761 DOI: 10.1037/a0012741] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
When presented with alternating low and high tones, listeners are more likely to perceive 2 separate streams of tones ("streaming") than a single coherent stream when the frequency separation (Deltaf) between tones is greater and the number of tone presentations is greater ("buildup"). However, the same large-Deltaf sequence reduces streaming for subsequent patterns presented after a gap of up to several seconds. Buildup occurs at a level of neural representation with sharp frequency tuning. The authors used adaptation to demonstrate that the contextual effect of prior Deltaf arose from a representation with broad frequency tuning, unlike buildup. Separate adaptation did not occur in a representation of Deltaf independent of frequency range, suggesting that any frequency-shift detectors undergoing adaptation are also frequency specific. A separate effect of prior perception was observed, dissociating stimulus-related (i.e., Deltaf) and perception-related (i.e., 1 stream vs. 2 streams) adaptation. Viewing a visual analogue to auditory streaming had no effect on subsequent perception of streaming, suggesting adaptation in auditory-specific brain circuits. These results, along with previous findings on buildup, suggest that processing in at least 3 levels of auditory neural representation underlies segregation and formation of auditory streams.
Collapse
Affiliation(s)
- Joel S Snyder
- Department of Psychology, University of Nevada, 4505 South Maryland Parkway, Box 455030, Las Vegas, NV 89154-5030, USA
| | | | | | | |
Collapse
|
15
|
Pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings. Hear Res 2009; 266:36-51. [PMID: 19788920 DOI: 10.1016/j.heares.2009.09.012] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 09/23/2009] [Accepted: 09/24/2009] [Indexed: 11/18/2022]
Abstract
Harmonic complex tones are a particularly important class of sounds found in both speech and music. Although these sounds contain multiple frequency components, they are usually perceived as a coherent whole, with a pitch corresponding to the fundamental frequency (F0). However, when two or more harmonic sounds occur concurrently, e.g., at a cocktail party or in a symphony, the auditory system must separate harmonics and assign them to their respective F0s so that a coherent and veridical representation of the different sounds sources is formed. Here we review both psychophysical and neurophysiological (single-unit and evoked-potential) findings, which provide some insight into how, and how well, the auditory system accomplishes this task. A survey of computational models designed to estimate multiple F0s and segregate concurrent sources is followed by a review of the empirical literature on the perception and neural coding of concurrent harmonic sounds, including vowels, as well as findings obtained using single complex tones with mistuned harmonics.
Collapse
|
16
|
Shackleton TM, Liu LF, Palmer AR. Responses to diotic, dichotic, and alternating phase harmonic stimuli in the inferior colliculus of guinea pigs. J Assoc Res Otolaryngol 2009; 10:76-90. [PMID: 19089495 PMCID: PMC2644390 DOI: 10.1007/s10162-008-0149-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2008] [Accepted: 11/13/2008] [Indexed: 11/25/2022] Open
Abstract
Humans perceive a harmonic series as a single auditory object with a pitch equivalent to the fundamental frequency (F0) of the series. When harmonics are presented to alternate ears, the repetition rate of the waveform at each ear doubles. If the harmonics are resolved, then the pitch perceived is still equivalent to F0, suggesting the stimulus is binaurally integrated before pitch is processed. However, unresolved harmonics give rise to the doubling of pitch which would be expected from monaural processing (Bernstein and Oxenham, J. Acoust. Soc. Am., 113:3323-3334, 2003). We used similar stimuli to record responses of multi-unit clusters in the central nucleus of the inferior colliculus (IC) of anesthetized guinea pigs (urethane supplemented by fentanyl/fluanisone) to determine the nature of the representation of harmonic stimuli and to what extent there was binaural integration. We examined both the temporal and rate-tuning of IC clusters and found no evidence for binaural integration. Stimuli comprised all harmonics below 10 kHz with fundamental frequencies (F0) from 50 to 400 Hz in half-octave steps. In diotic conditions, all the harmonics were presented to both ears. In dichotic conditions, odd harmonics were presented to one ear and even harmonics to the other. Neural characteristic frequencies (CF, n = 85) were from 0.2 to 14.7 kHz; 29 had CFs below 1 kHz. The majority of clusters responded predominantly to the contralateral ear, with the dominance of the contralateral ear increasing with CF. With diotic stimuli, over half of the clusters (58%) had peaked firing rate vs. F0 functions. The most common peak F0 was 141 Hz. Almost all (98%) clusters phase locked diotically to an F0 of 50 Hz, and approximately 40% of clusters still phase locked significantly (Rayleigh coefficient >13.8) at the highest F0 tested (400 Hz). These results are consistent with the previous reports of responses to amplitude-modulated stimuli. Clusters phase locked significantly at a frequency equal to F0 for contralateral and diotic stimuli but at 2F0 for dichotic stimuli. We interpret these data as responses following the envelope periodicity in monaural channels rather than as a binaurally integrated representation.
Collapse
Affiliation(s)
- Trevor M Shackleton
- MRC Institute of Hearing Research, University Park, Nottingham, NG7 2RD, UK.
| | | | | |
Collapse
|
17
|
Snyder RL, Bonham BH, Sinex DG. Acute changes in frequency responses of inferior colliculus central nucleus (ICC) neurons following progressively enlarged restricted spiral ganglion lesions. Hear Res 2008; 246:59-78. [PMID: 18938235 PMCID: PMC2630712 DOI: 10.1016/j.heares.2008.09.010] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Revised: 09/24/2008] [Accepted: 09/24/2008] [Indexed: 11/30/2022]
Abstract
Immediate effects of sequential and progressively enlarged spiral ganglion (SG) lesions were recorded from cochleas and inferior colliculi. Small SG-lesions produced modest elevations in cochlear tone-evoked compound action potential (CAP) thresholds across narrow frequency ranges; progressively enlarged lesions produced progressively higher CAP-threshold elevations across progressively wider frequency ranges. No comparable changes in distortion product otoacoustic emissions (DPOAEs) amplitudes were observed consistent with silencing of auditory nerve sectors without affecting organ of Corti function. Frequency response areas (FRAs) of inferior colliculus (IC) neurons were recorded before and immediately after SG-lesions using multi-site silicon arrays fixed in place with recording sites arrayed along IC frequency gradient. Individual post-lesion FRAs exhibited progressively elevated response thresholds and diminished response amplitudes at lesion frequencies, whereas responses at non-lesion frequencies were either unchanged or enhanced. Characteristic frequencies were shifted and silent areas were introduced within these FRAs. Sequentially larger lesions produced sequentially larger shifts in CF and/or enlarged silent areas within affected FRAs, producing immediate changes in IC frequency organization. These results contrast with those from the auditory nerve, extend previous reports of experience-induced plasticity in the auditory CNS, and support results indicating afferent convergence onto ICC neurons across broad frequency bands.
Collapse
Affiliation(s)
- Russell L Snyder
- Department of Otolaryngology, University of California, San Francisco, CA 94143-0526, United States.
| | | | | |
Collapse
|
18
|
Sinex DG. Responses of cochlear nucleus neurons to harmonic and mistuned complex tones. Hear Res 2008; 238:39-48. [PMID: 18078726 PMCID: PMC2323903 DOI: 10.1016/j.heares.2007.11.001] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/26/2007] [Accepted: 11/01/2007] [Indexed: 11/29/2022]
Abstract
Preliminary measurements of the representation in the cochlear nucleus (CN) of harmonic tones, harmonic tones with mistuned components, and double harmonic tones are reported. These data indicate that, unlike auditory nerve fibers and IC neurons, neurons in the CN may exhibit one of several qualitatively different response patterns when stimulated with mistuned tones. Primary-like neurons synchronized their discharges to 2-3 individual stimulus components, much like auditory nerve fibers do. Chopper neurons tended to respond with the periodicity of envelopes produced by interactions between adjacent stimulus components but exhibited little or no response synchronized to individual stimulus components. A small proportion of CN neurons exhibited complex slowly-modulated discharge patterns similar to those that are commonly observed in the inferior colliculus (IC). The patterns obtained from CN neurons with different pure tone discharge patterns were generally consistent with expectations based on previous studies with other stimuli. The measurements provided additional insight into the hierarchical processing stages that result in the highly patterned responses of IC neurons to harmonic and mistuned tones.
Collapse
Affiliation(s)
- Donal G Sinex
- Utah State University, Department of Psychology, 2810 Old Main Hill, Logan, UT 84322-2810, USA.
| |
Collapse
|