1
|
Mammalian octopus cells are direction selective to frequency sweeps by excitatory synaptic sequence detection. Proc Natl Acad Sci U S A 2022; 119:e2203748119. [PMID: 36279465 PMCID: PMC9636937 DOI: 10.1073/pnas.2203748119] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Octopus cells are remarkable projection neurons of the mammalian cochlear nucleus, with extremely fast membranes and wide-frequency tuning. They are considered prime examples of coincidence detectors but are poorly characterized in vivo. We discover that octopus cells are selective to frequency sweep direction, a feature that is absent in their auditory nerve inputs. In vivo intracellular recordings reveal that direction selectivity does not derive from across-frequency coincidence detection but hinges on the amplitudes and activation sequence of auditory nerve inputs tuned to clusters of hot spot frequencies. A simple biophysical octopus cell model excited with real nerve spike trains recreates direction selectivity through interaction of intrinsic membrane conductances with the activation sequence of clustered excitatory inputs. We conclude that octopus cells are sequence detectors, sensitive to temporal patterns across cochlear frequency channels. The detection of sequences rather than coincidences is a much simpler but powerful operation to extract temporal information.
Collapse
|
2
|
Guest DR, Oxenham AJ. Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch. PLoS Comput Biol 2022; 18:e1009889. [PMID: 35239639 PMCID: PMC8923464 DOI: 10.1371/journal.pcbi.1009889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 03/15/2022] [Accepted: 02/02/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate pitch perception of harmonic complex tones is widely believed to rely on temporal fine structure information conveyed by the precise phase-locked responses of auditory-nerve fibers. However, accurate pitch perception remains possible even when spectrally resolved harmonics are presented at frequencies beyond the putative limits of neural phase locking, and it is unclear whether residual temporal information, or a coarser rate-place code, underlies this ability. We addressed this question by measuring human pitch discrimination at low and high frequencies for harmonic complex tones, presented either in isolation or in the presence of concurrent complex-tone maskers. We found that concurrent complex-tone maskers impaired performance at both low and high frequencies, although the impairment introduced by adding maskers at high frequencies relative to low frequencies differed between the tested masker types. We then combined simulated auditory-nerve responses to our stimuli with ideal-observer analysis to quantify the extent to which performance was limited by peripheral factors. We found that the worsening of both frequency discrimination and F0 discrimination at high frequencies could be well accounted for (in relative terms) by optimal decoding of all available information at the level of the auditory nerve. A Python package is provided to reproduce these results, and to simulate responses to acoustic stimuli from the three previously published models of the human auditory nerve used in our analyses.
Collapse
Affiliation(s)
- Daniel R. Guest
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Andrew J. Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
3
|
Goldsworthy RL, Bissmeyer SRS, Camarena A. Advantages of Pulse Rate Compared to Modulation Frequency for Temporal Pitch Perception in Cochlear Implant Users. J Assoc Res Otolaryngol 2022; 23:137-150. [PMID: 34981263 PMCID: PMC8782986 DOI: 10.1007/s10162-021-00828-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 12/01/2021] [Indexed: 02/03/2023] Open
Abstract
Most cochlear implants encode the fundamental frequency of periodic sounds by amplitude modulation of constant-rate pulsatile stimulation. Pitch perception provided by such stimulation strategies is markedly poor. Two experiments are reported here that consider potential advantages of pulse rate compared to modulation frequency for providing stimulation timing cues for pitch. The first experiment examines beat frequency distortion that occurs when modulating constant-rate pulsatile stimulation. This distortion has been reported on previously, but the results presented here indicate that distortion occurs for higher stimulation rates than previously reported. The second experiment examines pitch resolution as provided by pulse rate compared to modulation frequency. The results indicate that pitch discrimination is better with pulse rate than with modulation frequency. The advantage was large for rates near what has been suggested as the upper limit of temporal pitch perception conveyed by cochlear implants. The results are relevant to sound processing design for cochlear implants particularly for algorithms that encode fundamental frequency into deep envelope modulations or into precisely timed pulsatile stimulation.
Collapse
Affiliation(s)
- Raymond L Goldsworthy
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Susan R S Bissmeyer
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Andres Camarena
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
4
|
Goldsworthy RL, Camarena A, Bissmeyer SRS. Pitch perception is more robust to interference and better resolved when provided by pulse rate than by modulation frequency of cochlear implant stimulation. Hear Res 2021; 409:108319. [PMID: 34340020 PMCID: PMC9343238 DOI: 10.1016/j.heares.2021.108319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/15/2021] [Accepted: 07/21/2021] [Indexed: 01/14/2023]
Abstract
Cochlear implants are medical devices that have been used to restore hearing to more than half a million people worldwide. Most recipients achieve high levels of speech comprehension through these devices, but speech comprehension in background noise and music appreciation in general are markedly poor compared to normal hearing. A key aspect of hearing that is notably diminished in cochlear implant outcomes is the sense of pitch provided by these devices. Pitch perception is an important factor affecting speech comprehension in background noise and is critical for music perception. The present article summarizes two experiments that examine the robustness and resolution of pitch perception as provided by cochlear implant stimulation timing. The driving hypothesis is that pitch conveyed by stimulation timing cues is more robust and better resolved when provided by variable pulse rates than by modulation frequency of constant-rate stimulation. Experiment 1 examines the robustness for hearing a large, one-octave, pitch difference in the presence of interfering electrical stimulation. With robustness to interference characterized for an otherwise easily discernible pitch difference, Experiment 2 examines the resolution of discrimination thresholds in the presence of interference as conveyed by modulation frequency or by pulse rate. These experiments test for an advantage of stimulation with precise temporal cues. The results indicate that pitch provided by pulse rate is both more robust to interference and is better resolved compared to when provided by modulation frequency. These results should inform the development of new sound processing strategies for cochlear implants designed to encode fundamental frequency of sounds into precise temporal stimulation.
Collapse
Affiliation(s)
- Raymond L Goldsworthy
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States.
| | - Andres Camarena
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States; Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, United States
| | - Susan R S Bissmeyer
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States; Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, United States
| |
Collapse
|
5
|
He W, Ren T. The origin of mechanical harmonic distortion within the organ of Corti in living gerbil cochleae. Commun Biol 2021; 4:1008. [PMID: 34433876 PMCID: PMC8387486 DOI: 10.1038/s42003-021-02540-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 08/11/2021] [Indexed: 11/09/2022] Open
Abstract
Although auditory harmonic distortion has been demonstrated psychophysically in humans and electrophysiologically in experimental animals, the cellular origin of the mechanical harmonic distortion remains unclear. To demonstrate the outer hair cell-generated harmonics within the organ of Corti, we measured sub-nanometer vibrations of the reticular lamina from the apical ends of the outer hair cells in living gerbil cochleae using a custom-built heterodyne low-coherence interferometer. The harmonics in the reticular lamina vibration are significantly larger and have broader spectra and shorter latencies than those in the basilar membrane vibration. The latency of the second harmonic is significantly greater than that of the fundamental at low stimulus frequencies. These data indicate that the mechanical harmonics are generated by the outer hair cells over a broad cochlear region and propagate from the generation sites to their own best-frequency locations.
Collapse
Affiliation(s)
- Wenxuan He
- Oregon Hearing Research Center, Department of Otolaryngology, Oregon Health & Science University, Portland, OR, USA
| | - Tianying Ren
- Oregon Hearing Research Center, Department of Otolaryngology, Oregon Health & Science University, Portland, OR, USA.
| |
Collapse
|
6
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
7
|
Robust Rate-Place Coding of Resolved Components in Harmonic and Inharmonic Complex Tones in Auditory Midbrain. J Neurosci 2020; 40:2080-2093. [PMID: 31996454 DOI: 10.1523/jneurosci.2337-19.2020] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Revised: 01/12/2020] [Accepted: 01/16/2020] [Indexed: 11/21/2022] Open
Abstract
Harmonic complex tones (HCTs) commonly occurring in speech and music evoke a strong pitch at their fundamental frequency (F0), especially when they contain harmonics individually resolved by the cochlea. When all frequency components of an HCT are shifted by the same amount, the pitch of the resulting inharmonic tone (IHCT) can also shift, although the envelope repetition rate is unchanged. A rate-place code, whereby resolved harmonics are represented by local maxima in firing rates along the tonotopic axis, has been characterized in the auditory nerve and primary auditory cortex, but little is known about intermediate processing stages. We recorded single-neuron responses to HCT and IHCT with varying F0 and sound level in the inferior colliculus (IC) of unanesthetized rabbits of both sexes. Many neurons showed peaks in firing rate when a low-numbered harmonic aligned with the neuron's characteristic frequency, demonstrating "rate-place" coding. The IC rate-place code was most prevalent for F0 > 800 Hz, was only moderately dependent on sound level over a 40 dB range, and was not sensitive to stimulus harmonicity. A spectral receptive-field model incorporating broadband inhibition better predicted the neural responses than a purely excitatory model, suggesting an enhancement of the rate-place representation by inhibition. Some IC neurons showed facilitation in response to HCT relative to pure tones, similar to cortical "harmonic template neurons" (Feng and Wang, 2017), but to a lesser degree. Our findings shed light on the transformation of rate-place coding of resolved harmonics along the auditory pathway.SIGNIFICANCE STATEMENT Harmonic complex tones are ubiquitous in speech and music and produce strong pitch percepts when they contain frequency components that are individually resolved by the cochlea. Here, we characterize a "rate-place" code for resolved harmonics in the auditory midbrain that is more robust across sound levels than the peripheral rate-place code and insensitive to the harmonic relationships among frequency components. We use a computational model to show that inhibition may play an important role in shaping the rate-place code. Our study fills a major gap in understanding the transformations in neural representations of resolved harmonics along the auditory pathway.
Collapse
|
8
|
Graves JE, Oxenham AJ. Pitch discrimination with mixtures of three concurrent harmonic complexes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:2072. [PMID: 31046318 PMCID: PMC6469983 DOI: 10.1121/1.5096639] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 02/19/2019] [Accepted: 03/13/2019] [Indexed: 06/09/2023]
Abstract
In natural listening contexts, especially in music, it is common to hear three or more simultaneous pitches, but few empirical or theoretical studies have addressed how this is achieved. Place and pattern-recognition theories of pitch require at least some harmonics to be spectrally resolved for pitch to be extracted, but it is unclear how often such conditions exist when multiple complex tones are presented together. In three behavioral experiments, mixtures of three concurrent complexes were filtered into a single bandpass spectral region, and the relationship between the fundamental frequencies and spectral region was varied in order to manipulate the extent to which harmonics were resolved either before or after mixing. In experiment 1, listeners discriminated major from minor triads (a difference of 1 semitone in one note of the triad). In experiments 2 and 3, listeners compared the pitch of a probe tone with that of a subsequent target, embedded within two other tones. All three experiments demonstrated above-chance performance, even in conditions where the combinations of harmonic components were unlikely to be resolved after mixing, suggesting that fully resolved harmonics may not be necessary to extract the pitch from multiple simultaneous complexes.
Collapse
Affiliation(s)
- Jackson E Graves
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
9
|
Ananthakrishnan S, Krishnan A. Human frequency following responses to iterated rippled noise with positive and negative gain: Differential sensitivity to waveform envelope and temporal fine-structure. Hear Res 2018; 367:113-123. [PMID: 30096491 PMCID: PMC6130915 DOI: 10.1016/j.heares.2018.07.009] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/01/2018] [Revised: 07/01/2018] [Accepted: 07/25/2018] [Indexed: 10/28/2022]
Abstract
The perceived pitch of iterated rippled noise (IRN) with negative gain (IRNn) is an octave lower than that of IRN with positive gain (IRNp). IRNp and IRNn have identical waveform envelopes (ENV), but differing stimulus waveform fine structure (TFS), which likely accounts for this perceived pitch difference. Here, we examine whether differences in the temporal pattern of phase-locked activity reflected in the human brainstem Frequency Following Response (FFR) elicited by IRNp and IRNn can account for the differences in perceived pitch for the two stimuli. FFRs using a single onset polarity were measured in 13 normal-hearing, adult listeners in response to IRNp and IRNn stimuli with 2 ms, and 4 ms delay. Autocorrelation functions (ACFs) and Fast Fourier Transforms (FFTs) were used to evaluate the dominant periodicity and spectral pattern (harmonic spacing) in the phase-locked FFR neural activity. For both delays, the harmonic spacing in the spectra corresponded more strongly with the perceived lowering of pitch from IRNp to IRNn, compared to the ACFs. These results suggest that the FFR elicited by a single polarity stimulus reflects phase-locking to both stimulus ENV and TFS. A post-hoc experiment evaluating the FFR phase-locked activity to ENV (FFRENV), and TFS (FFRTFS) elicited by IRNp and IRNn confirmed that only the phase-locked activity to the TFS, reflected in FFRTFS, showed differences in both spectra and ACF that closely matched the pitch difference between the two stimuli. The results of the post-hoc experiment suggests that pitch-relevant information is preserved in the temporal pattern of phase-locked activity and suggests that the differences in stimulus ENV and TFS driving the pitch percept of IRNp and IRNn are preserved in the brainstem neural response. The scalp recorded FFR may provide for a noninvasive analytic tool to evaluate the relative contributions of envelope and temporal fine-structure in the neural representation of complex sounds in humans.
Collapse
Affiliation(s)
- Saradha Ananthakrishnan
- Department of Audiology, Speech-Language Pathology, and Deaf Studies, Towson University, Towson, MD, 21252, USA
| | - Ananthanarayan Krishnan
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, IN, 47906, USA.
| |
Collapse
|
10
|
Settibhaktini H, Chintanpalli A. Modeling the level-dependent changes of concurrent vowel scores. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:440. [PMID: 29390795 PMCID: PMC6226212 DOI: 10.1121/1.5021330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 10/20/2017] [Accepted: 01/02/2018] [Indexed: 06/07/2023]
Abstract
The difference in fundamental frequency (F0) between talkers is an important cue for speaker segregation. To understand how this cue varies across sound level, Chintanpalli, Ahlstrom, and Dubno [(2014). J. Assoc. Res. Otolaryngol. 15, 823-837] collected level-dependent changes in concurrent-vowel identification scores for same- and different-F0 conditions in younger adults with normal hearing. Modeling suggested that level-dependent changes in phase locking of auditory-nerve (AN) fibers to formants and F0s may contribute to concurrent-vowel identification scores; however, identification scores were not predicted to test this suggestion directly. The current study predicts these identification scores using the temporal responses of a computational AN model and a modified version of Meddis and Hewitt's [(1992). J. Acoust. Soc. Am. 91, 233-245] F0-based segregation algorithm. The model successfully captured the level-dependent changes in identification scores of both vowels with and without F0 difference, as well as identification scores for one vowel correct. The model's F0-based vowel segregation was controlled using the actual F0-benefit across levels such that the predicted F0-benefit matched qualitatively with the actual F0-benefit as a function of level. The quantitative predictions from this F0-based segregation algorithm demonstrate that temporal responses of AN fibers to vowel formants and F0s can account for variations in identification scores across sound level and F0-difference conditions in a concurrent-vowel task.
Collapse
Affiliation(s)
- Harshavardhan Settibhaktini
- Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science, Pilani Campus, Vidya Vihar, Pilani, Rajasthan, 333031, India
| | - Ananthakrishna Chintanpalli
- Department of Electrical and Electronics Engineering, Birla Institute of Technology and Science, Pilani Campus, Vidya Vihar, Pilani, Rajasthan, 333031, India
| |
Collapse
|
11
|
Chintanpalli A, Ahlstrom JB, Dubno JR. Effects of age and hearing loss on concurrent vowel identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:4142. [PMID: 28040038 PMCID: PMC5848863 DOI: 10.1121/1.4968781] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Revised: 11/09/2016] [Accepted: 11/11/2016] [Indexed: 06/06/2023]
Abstract
Differences in formant frequencies and fundamental frequencies (F0) are important cues for segregating and identifying two simultaneous vowels. This study assessed age- and hearing-loss-related changes in the use of these cues for recognition of one or both vowels in a pair and determined differences related to vowel identity and specific vowel pairings. Younger adults with normal hearing, older adults with normal hearing, and older adults with hearing loss listened to different-vowel and identical-vowel pairs that varied in F0 differences. Identification of both vowels as a function of F0 difference revealed that increased age affects the use of F0 and formant difference cues for different-vowel pairs. Hearing loss further reduced the use of these cues, which was not attributable to lower vowel sensation levels. High scores for one vowel in the pair and no effect of F0 differences suggested that F0 cues are important only for identifying both vowels. In contrast to mean scores, widely varying differences in effects of F0 cues, age, and hearing loss were observed for particular vowels and vowel pairings. These variations in identification of vowel pairs were not explained by acoustical models based on the location and level of formants within the two vowels.
Collapse
Affiliation(s)
- Ananthakrishna Chintanpalli
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| |
Collapse
|
12
|
Neural Segregation of Concurrent Speech: Effects of Background Noise and Reverberation on Auditory Scene Analysis in the Ventral Cochlear Nucleus. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [PMID: 27080680 DOI: 10.1007/978-3-319-25474-6_41] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
Concurrent complex sounds (e.g., two voices speaking at once) are perceptually disentangled into separate "auditory objects". This neural processing often occurs in the presence of acoustic-signal distortions from noise and reverberation (e.g., in a busy restaurant). A difference in periodicity between sounds is a strong segregation cue under quiet, anechoic conditions. However, noise and reverberation exert differential effects on speech intelligibility under "cocktail-party" listening conditions. Previous neurophysiological studies have concentrated on understanding auditory scene analysis under ideal listening conditions. Here, we examine the effects of noise and reverberation on periodicity-based neural segregation of concurrent vowels /a/ and /i/, in the responses of single units in the guinea-pig ventral cochlear nucleus (VCN): the first processing station of the auditory brain stem. In line with human psychoacoustic data, we find reverberation significantly impairs segregation when vowels have an intonated pitch contour, but not when they are spoken on a monotone. In contrast, noise impairs segregation independent of intonation pattern. These results are informative for models of speech processing under ecologically valid listening conditions, where noise and reverberation abound.
Collapse
|
13
|
Neural Representation of Concurrent Vowels in Macaque Primary Auditory Cortex. eNeuro 2016; 3:eN-NWR-0071-16. [PMID: 27294198 PMCID: PMC4901243 DOI: 10.1523/eneuro.0071-16.2016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 04/15/2016] [Indexed: 11/30/2022] Open
Abstract
Successful speech perception in real-world environments requires that the auditory system segregate competing voices that overlap in frequency and time into separate streams. Vowels are major constituents of speech and are comprised of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). The pitch and identity of a vowel are determined by its F0 and spectral envelope (formant structure), respectively. When two spectrally overlapping vowels differing in F0 are presented concurrently, they can be readily perceived as two separate “auditory objects” with pitches at their respective F0s. A difference in pitch between two simultaneous vowels provides a powerful cue for their segregation, which in turn, facilitates their individual identification. The neural mechanisms underlying the segregation of concurrent vowels based on pitch differences are poorly understood. Here, we examine neural population responses in macaque primary auditory cortex (A1) to single and double concurrent vowels (/a/ and /i/) that differ in F0 such that they are heard as two separate auditory objects with distinct pitches. We find that neural population responses in A1 can resolve, via a rate-place code, lower harmonics of both single and double concurrent vowels. Furthermore, we show that the formant structures, and hence the identities, of single vowels can be reliably recovered from the neural representation of double concurrent vowels. We conclude that A1 contains sufficient spectral information to enable concurrent vowel segregation and identification by downstream cortical areas.
Collapse
|
14
|
Entracking as a Brain Stem Code for Pitch: The Butte Hypothesis. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [PMID: 27080675 DOI: 10.1007/978-3-319-25474-6_36] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2024]
Abstract
The basic nature of pitch is much debated. A robust code for pitch exists in the auditory nerve in the form of an across-fiber pooled interspike interval (ISI) distribution, which resembles the stimulus autocorrelation. An unsolved question is how this representation can be "read out" by the brain. A new view is proposed in which a known brain-stem property plays a key role in the coding of periodicity, which I refer to as "entracking", a contraction of "entrained phase-locking". It is proposed that a scalar rather than vector code of periodicity exists by virtue of coincidence detectors that code the dominant ISI directly into spike rate through entracking. Perfect entracking means that a neuron fires one spike per stimulus-waveform repetition period, so that firing rate equals the repetition frequency. Key properties are invariance with SPL and generalization across stimuli. The main limitation in this code is the upper limit of firing (~ 500 Hz). It is proposed that entracking provides a periodicity tag which is superimposed on a tonotopic analysis: at low SPLs and fundamental frequencies > 500 Hz, a spectral or place mechanism codes for pitch. With increasing SPL the place code degrades but entracking improves and first occurs in neurons with low thresholds for the spectral components present. The prediction is that populations of entracking neurons, extended across characteristic frequency, form plateaus ("buttes") of firing rate tied to periodicity.
Collapse
|
15
|
On the Relevance of Natural Stimuli for the Study of Brainstem Correlates: The Example of Consonance Perception. PLoS One 2015; 10:e0145439. [PMID: 26720000 PMCID: PMC4697839 DOI: 10.1371/journal.pone.0145439] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 12/03/2015] [Indexed: 11/19/2022] Open
Abstract
Some combinations of musical tones sound pleasing to Western listeners, and are termed consonant, while others sound discordant, and are termed dissonant. The perceptual phenomenon of consonance has been traced to the acoustic property of harmonicity. It has been repeatedly shown that neural correlates of consonance can be found as early as the auditory brainstem as reflected in the harmonicity of the scalp-recorded frequency-following response (FFR). “Neural Pitch Salience” (NPS) measured from FFRs—essentially a time-domain equivalent of the classic pattern recognition models of pitch—has been found to correlate with behavioral judgments of consonance for synthetic stimuli. Following the idea that the auditory system has evolved to process behaviorally relevant natural sounds, and in order to test the generalizability of this finding made with synthetic tones, we recorded FFRs for consonant and dissonant intervals composed of synthetic and natural stimuli. We found that NPS correlated with behavioral judgments of consonance and dissonance for synthetic but not for naturalistic sounds. These results suggest that while some form of harmonicity can be computed from the auditory brainstem response, the general percept of consonance and dissonance is not captured by this measure. It might either be represented in the brainstem in a different code (such as place code) or arise at higher levels of the auditory pathway. Our findings further illustrate the importance of using natural sounds, as a complementary tool to fully-controlled synthetic sounds, when probing auditory perception.
Collapse
|
16
|
Perception and coding of interaural time differences with bilateral cochlear implants. Hear Res 2015; 322:138-50. [DOI: 10.1016/j.heares.2014.10.004] [Citation(s) in RCA: 82] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 10/01/2014] [Accepted: 10/07/2014] [Indexed: 11/21/2022]
|
17
|
Sayles M, Stasiak A, Winter IM. Reverberation impairs brainstem temporal representations of voiced vowel sounds: challenging "periodicity-tagged" segregation of competing speech in rooms. Front Syst Neurosci 2015; 8:248. [PMID: 25628545 PMCID: PMC4290552 DOI: 10.3389/fnsys.2014.00248] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 12/18/2014] [Indexed: 11/26/2022] Open
Abstract
The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into “auditory objects.” Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions.
Collapse
Affiliation(s)
- Mark Sayles
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Arkadiusz Stasiak
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Ian M Winter
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
18
|
Fishman YI, Steinschneider M, Micheyl C. Neural representation of concurrent harmonic sounds in monkey primary auditory cortex: implications for models of auditory scene analysis. J Neurosci 2014; 34:12425-43. [PMID: 25209282 PMCID: PMC4160777 DOI: 10.1523/jneurosci.0025-14.2014] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2014] [Revised: 07/14/2014] [Accepted: 07/28/2014] [Indexed: 11/21/2022] Open
Abstract
The ability to attend to a particular sound in a noisy environment is an essential aspect of hearing. To accomplish this feat, the auditory system must segregate sounds that overlap in frequency and time. Many natural sounds, such as human voices, consist of harmonics of a common fundamental frequency (F0). Such harmonic complex tones (HCTs) evoke a pitch corresponding to their F0. A difference in pitch between simultaneous HCTs provides a powerful cue for their segregation. The neural mechanisms underlying concurrent sound segregation based on pitch differences are poorly understood. Here, we examined neural responses in monkey primary auditory cortex (A1) to two concurrent HCTs that differed in F0 such that they are heard as two separate "auditory objects" with distinct pitches. We found that A1 can resolve, via a rate-place code, the lower harmonics of both HCTs, a prerequisite for deriving their pitches and for their perceptual segregation. Onset asynchrony between the HCTs enhanced the neural representation of their harmonics, paralleling their improved perceptual segregation in humans. Pitches of the concurrent HCTs could also be temporally represented by neuronal phase-locking at their respective F0s. Furthermore, a model of A1 responses using harmonic templates could qualitatively reproduce psychophysical data on concurrent sound segregation in humans. Finally, we identified a possible intracortical homolog of the "object-related negativity" recorded noninvasively in humans, which correlates with the perceptual segregation of concurrent sounds. Findings indicate that A1 contains sufficient spectral and temporal information for segregating concurrent sounds based on differences in pitch.
Collapse
Affiliation(s)
- Yonatan I Fishman
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461,
| | - Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Christophe Micheyl
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, and Starkey Hearing Research Center, Berkeley, California 94704
| |
Collapse
|
19
|
Chintanpalli A, Ahlstrom JB, Dubno JR. Computational model predictions of cues for concurrent vowel identification. J Assoc Res Otolaryngol 2014; 15:823-37. [PMID: 25002128 DOI: 10.1007/s10162-014-0475-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 06/03/2014] [Indexed: 11/28/2022] Open
Abstract
Although differences in fundamental frequencies (F0s) between vowels are beneficial for their segregation and identification, listeners can still segregate and identify simultaneous vowels that have identical F0s, suggesting that additional cues are contributing, including formant frequency differences. The current perception and computational modeling study was designed to assess the contribution of F0 and formant difference cues for concurrent vowel identification. Younger adults with normal hearing listened to concurrent vowels over a wide range of levels (25-85 dB SPL) for conditions in which F0 was the same or different between vowel pairs. Vowel identification scores were poorer at the lowest and highest levels for each F0 condition, and F0 benefit was reduced at the lowest level as compared to higher levels. To understand the neural correlates underlying level-dependent changes in vowel identification, a computational auditory-nerve model was used to estimate formant and F0 difference cues under the same listening conditions. Template contrast and average localized synchronized rate predicted level-dependent changes in the strength of phase locking to F0s and formants of concurrent vowels, respectively. At lower levels, poorer F0 benefit may be attributed to poorer phase locking to both F0s, which resulted from lower firing rates of auditory-nerve fibers. At higher levels, poorer identification scores may relate to poorer phase locking to the second formant, due to synchrony capture by lower formants. These findings suggest that concurrent vowel identification may be partly influenced by level-dependent changes in phase locking of auditory-nerve fibers to F0s and formants of both vowels.
Collapse
Affiliation(s)
- Ananthakrishna Chintanpalli
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC, 29425-5500, USA,
| | | | | |
Collapse
|
20
|
Butler BE, Trainor LJ. Brief pitch-priming facilitates infants’ discrimination of pitch-evoking noise: Evidence from event-related potentials. Brain Cogn 2013; 83:271-8. [DOI: 10.1016/j.bandc.2013.09.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Revised: 07/26/2013] [Accepted: 09/10/2013] [Indexed: 10/26/2022]
|
21
|
Chintanpalli A, Heinz MG. The use of confusion patterns to evaluate the neural basis for concurrent vowel identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2988-3000. [PMID: 24116434 PMCID: PMC3799688 DOI: 10.1121/1.4820888] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 05/31/2013] [Accepted: 08/26/2013] [Indexed: 06/02/2023]
Abstract
Normal-hearing listeners take advantage of differences in fundamental frequency (F0) to segregate competing talkers. Computational modeling using an F0-based segregation algorithm and auditory-nerve temporal responses captures the gradual improvement in concurrent-vowel identification with increasing F0 difference. This result has been taken to suggest that F0-based segregation is the basis for this improvement; however, evidence suggests that other factors may also contribute. The present study further tested models of concurrent-vowel identification by evaluating their ability to predict the specific confusions made by listeners. Measured human confusions consisted of at most one to three confusions per vowel pair, typically from an error in only one of the two vowels. An improvement due to F0 difference was correlated with spectral differences between vowels; however, simple models based on acoustic and cochlear spectral patterns predicted some confusions not made by human listeners. In contrast, a neural temporal model was better at predicting listener confusion patterns. However, the full F0-based segregation algorithm using these neural temporal analyses was inconsistent across F0 difference in capturing listener confusions, being worse for smaller differences. The inability of this commonly accepted model to fully account for listener confusions suggests that other factors besides F0 segregation are likely to contribute.
Collapse
|
22
|
Leclerc I, Dajani HR, Giguère C. Differences in shimmer across formant regions. J Voice 2013; 27:685-90. [PMID: 24070592 DOI: 10.1016/j.jvoice.2013.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2013] [Accepted: 05/03/2013] [Indexed: 11/16/2022]
Abstract
OBJECTIVES Objective acoustic measures used to analyze phonatory dysfunction include shimmer and jitter. These measures are limited in that they do not take into account auditory processing. However, previous studies have indicated that shimmer may be processed differently along the tonotopic axis of the ear and, in particular, may be perceptually and physiologically significant around the third and fourth formants. METHODS This study investigated the relationship between shimmer around the first four formants (F1-F4) and in the broadband unfiltered speech waveform for 18 normal speakers from the voice disorders database of KayPENTAX. The voice samples were filtered around each formant with a bandwidth of 400Hz and then shimmer was assessed using five built-in different measures from Praat software. RESULTS Comparisons of means tests revealed that shimmer increases significantly with formant frequency from F1 to F4, for all shimmer measures. Furthermore, for all shimmer measures, shimmer in the unfiltered speech was significantly and more strongly correlated with shimmer around F1 (r = 0.45-0.61) and F2 (r = 0.69-0.74), significantly but more weakly correlated with F4 (r = 0.42-0.47), and not significantly correlated with F3. CONCLUSIONS The findings indicate that there are differences in the shimmer found around the different formants and that shimmer information around F3 and F4 is not well captured in standard shimmer measurements based on the broadband unfiltered waveform.
Collapse
Affiliation(s)
- Isabelle Leclerc
- Audiology and Speech-Language Pathology Program, University of Ottawa, Ottawa, Ontario, Canada
| | | | | |
Collapse
|
23
|
Neural representation of harmonic complex tones in primary auditory cortex of the awake monkey. J Neurosci 2013; 33:10312-23. [PMID: 23785145 DOI: 10.1523/jneurosci.0020-13.2013] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Many natural sounds are periodic and consist of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). Such harmonic complex tones (HCTs) evoke a pitch corresponding to their F0, which plays a key role in the perception of speech and music. "Pitch-selective" neurons have been identified in non-primary auditory cortex of marmoset monkeys. Noninvasive studies point to a putative "pitch center" located in a homologous cortical region in humans. It remains unclear whether there is sufficient spectral and temporal information available at the level of primary auditory cortex (A1) to enable reliable pitch extraction in non-primary auditory cortex. Here we evaluated multiunit responses to HCTs in A1 of awake macaques using a stimulus design employed in auditory nerve studies of pitch encoding. The F0 of the HCTs was varied in small increments, such that harmonics of the HCTs fell either on the peak or on the sides of the neuronal pure tone tuning functions. Resultant response-amplitude-versus-harmonic-number functions ("rate-place profiles") displayed a periodic pattern reflecting the neuronal representation of individual HCT harmonics. Consistent with psychoacoustic findings in humans, lower harmonics were better resolved in rate-place profiles than higher harmonics. Lower F0s were also temporally represented by neuronal phase-locking to the periodic waveform of the HCTs. Findings indicate that population responses in A1 contain sufficient spectral and temporal information for extracting the pitch of HCTs by neurons in downstream cortical areas that receive their input from A1.
Collapse
|
24
|
Henry KS, Heinz MG. Effects of sensorineural hearing loss on temporal coding of narrowband and broadband signals in the auditory periphery. Hear Res 2013; 303:39-47. [PMID: 23376018 DOI: 10.1016/j.heares.2013.01.014] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2012] [Revised: 12/11/2012] [Accepted: 01/15/2013] [Indexed: 10/27/2022]
Abstract
People with sensorineural hearing loss have substantial difficulty understanding speech under degraded listening conditions. Behavioral studies suggest that this difficulty may be caused by changes in auditory processing of the rapidly-varying temporal fine structure (TFS) of acoustic signals. In this paper, we review the presently known effects of sensorineural hearing loss on processing of TFS and slower envelope modulations in the peripheral auditory system of mammals. Cochlear damage has relatively subtle effects on phase locking by auditory-nerve fibers to the temporal structure of narrowband signals under quiet conditions. In background noise, however, sensorineural loss does substantially reduce phase locking to the TFS of pure-tone stimuli. For auditory processing of broadband stimuli, sensorineural hearing loss has been shown to severely alter the neural representation of temporal information along the tonotopic axis of the cochlea. Notably, auditory-nerve fibers innervating the high-frequency part of the cochlea grow increasingly responsive to low-frequency TFS information and less responsive to temporal information near their characteristic frequency (CF). Cochlear damage also increases the correlation of the response to TFS across fibers of varying CF, decreases the traveling-wave delay between TFS responses of fibers with different CFs, and can increase the range of temporal modulation frequencies encoded in the periphery for broadband sounds. Weaker neural coding of temporal structure in background noise and degraded coding of broadband signals along the tonotopic axis of the cochlea are expected to contribute considerably to speech perception problems in people with sensorineural hearing loss. This article is part of a Special Issue entitled "Annual Reviews 2013".
Collapse
Affiliation(s)
- Kenneth S Henry
- Department of Speech, Language, and Hearing Sciences, Purdue University, 500 Oval Drive, West Lafayette, IN 47907, USA
| | | |
Collapse
|
25
|
Olde Scheper TV, Mansvelder HD, van Ooyen A. Short term depression unmasks the ghost frequency. PLoS One 2012; 7:e50189. [PMID: 23227159 PMCID: PMC3515566 DOI: 10.1371/journal.pone.0050189] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2012] [Accepted: 10/19/2012] [Indexed: 11/19/2022] Open
Abstract
Short Term Plasticity (STP) has been shown to exist extensively in synapses throughout the brain. Its function is more or less clear in the sense that it alters the probability of synaptic transmission at short time scales. However, it is still unclear what effect STP has on the dynamics of neural networks. We show, using a novel dynamic STP model, that Short Term Depression (STD) can affect the phase of frequency coded input such that small networks can perform temporal signal summation and determination with high accuracy. We show that this property of STD can readily solve the problem of the ghost frequency, the perceived pitch of a harmonic complex in absence of the base frequency. Additionally, we demonstrate that this property can explain dynamics in larger networks. By means of two models, one of chopper neurons in the Ventral Cochlear Nucleus and one of a cortical microcircuit with inhibitory Martinotti neurons, it is shown that the dynamics in these microcircuits can reliably be reproduced using STP. Our model of STP gives important insights into the potential roles of STP in self-regulation of cortical activity and long-range afferent input in neuronal microcircuits.
Collapse
Affiliation(s)
- Tjeerd V Olde Scheper
- Department of Integrative Neurophysiology, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands.
| | | | | |
Collapse
|
26
|
Wang GI, Delgutte B. Sensitivity of cochlear nucleus neurons to spatio-temporal changes in auditory nerve activity. J Neurophysiol 2012; 108:3172-95. [PMID: 22972956 DOI: 10.1152/jn.00160.2012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The spatio-temporal pattern of auditory nerve (AN) activity, representing the relative timing of spikes across the tonotopic axis, contains cues to perceptual features of sounds such as pitch, loudness, timbre, and spatial location. These spatio-temporal cues may be extracted by neurons in the cochlear nucleus (CN) that are sensitive to relative timing of inputs from AN fibers innervating different cochlear regions. One possible mechanism for this extraction is "cross-frequency" coincidence detection (CD), in which a central neuron converts the degree of coincidence across the tonotopic axis into a rate code by preferentially firing when its AN inputs discharge in synchrony. We used Huffman stimuli (Carney LH. J Neurophysiol 64: 437-456, 1990), which have a flat power spectrum but differ in their phase spectra, to systematically manipulate relative timing of spikes across tonotopically neighboring AN fibers without changing overall firing rates. We compared responses of CN units to Huffman stimuli with responses of model CD cells operating on spatio-temporal patterns of AN activity derived from measured responses of AN fibers with the principle of cochlear scaling invariance. We used the maximum likelihood method to determine the CD model cell parameters most likely to produce the measured CN unit responses, and thereby could distinguish units behaving like cross-frequency CD cells from those consistent with same-frequency CD (in which all inputs would originate from the same tonotopic location). We find that certain CN unit types, especially those associated with globular bushy cells, have responses consistent with cross-frequency CD cells. A possible functional role of a cross-frequency CD mechanism in these CN units is to increase the dynamic range of binaural neurons that process cues for sound localization.
Collapse
Affiliation(s)
- Grace I Wang
- Eaton-Peabody Laboratories, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | | |
Collapse
|
27
|
Abstract
Certain chords are preferred by listeners behaviorally and also occur with higher regularity in musical composition. Event-related potentials index the perceived consonance (i.e., pleasantness) of musical pitch relationships providing a cortical neural correlate for such behavioral preferences. Here, we show correlates of these harmonic preferences exist at subcortical stages of audition. Brainstem frequency-following responses were measured in response to four prototypical musical triads. Pitch salience computed from frequency-following responses correctly predicted the ordering of triadic harmony stipulated by music theory (i.e., major >minor >>diminished >augmented). Moreover, neural response magnitudes showed high correspondence with listeners' perceptual ratings of the same chords. Results suggest that preattentive stages of pitch processing may contribute to perceptual judgments of musical harmony.
Collapse
|
28
|
Bidelman GM, Heinz MG. Auditory-nerve responses predict pitch attributes related to musical consonance-dissonance for normal and impaired hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:1488-1502. [PMID: 21895089 PMCID: PMC3188968 DOI: 10.1121/1.3605559] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2010] [Revised: 06/02/2011] [Accepted: 06/03/2011] [Indexed: 05/31/2023]
Abstract
Human listeners prefer consonant over dissonant musical intervals and the perceived contrast between these classes is reduced with cochlear hearing loss. Population-level activity of normal and impaired model auditory-nerve (AN) fibers was examined to determine (1) if peripheral auditory neurons exhibit correlates of consonance and dissonance and (2) if the reduced perceptual difference between these qualities observed for hearing-impaired listeners can be explained by impaired AN responses. In addition, acoustical correlates of consonance-dissonance were also explored including periodicity and roughness. Among the chromatic pitch combinations of music, consonant intervals/chords yielded more robust neural pitch-salience magnitudes (determined by harmonicity/periodicity) than dissonant intervals/chords. In addition, AN pitch-salience magnitudes correctly predicted the ordering of hierarchical pitch and chordal sonorities described by Western music theory. Cochlear hearing impairment compressed pitch salience estimates between consonant and dissonant pitch relationships. The reduction in contrast of neural responses following cochlear hearing loss may explain the inability of hearing-impaired listeners to distinguish musical qualia as clearly as normal-hearing individuals. Of the neural and acoustic correlates explored, AN pitch salience was the best predictor of behavioral data. Results ultimately show that basic pitch relationships governing music are already present in initial stages of neural processing at the AN level.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
29
|
Laroche M, Dajani HR, Marcoux AM. Contribution of Resolved and Unresolved Harmonic Regions to Brainstem Speech-Evoked Responses in Quiet and in Background Noise. Audiol Res 2011; 2:e7. [PMID: 26557316 PMCID: PMC4627165 DOI: 10.4081/audiores.2011.e7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Revised: 01/05/2012] [Accepted: 01/11/2012] [Indexed: 11/25/2022] Open
Abstract
Speech auditory brainstem responses (speech ABR) reflect activity that is phase-locked to the harmonics of the fundamental frequency (F0) up to at least the first formant (F1). Recent evidence suggests that responses at F0 in the presence of noise are more robust than responses at F1, and are also dissociated in some learning-impaired children. Peripheral auditory processing can be broadly divided into resolved and unresolved harmonic regions. This study investigates the contribution of these two regions to the speech ABR, and their susceptibility to noise. We recorded, in quiet and in background white noise, evoked responses in twelve normal hearing adults in response to three variants of a synthetic vowel: i) Allformants, which contains all first three formants, ii) F1Only, which is dominated by resolved harmonics, and iii) F2&F3Only, which is dominated by unresolved harmonics. There were no statistically significant differences in the response at F0 due to the three variants of the stimulus in quiet, nor did the noise affect this response with the Allformants and F1Only variants. On the other hand, the response at F0 with the F2&F3Only variant was significantly weaker in noise than with the two other variants (p<0.001). With the response at F1, there was no difference with the Allformants and F1Only variants in quiet, but was expectedly weaker with the F2&F3Only variant (p<0.01). The addition of noise significantly weakened the response at F1 with the F1Only variant (p<0.05), but this weakening only tended towards significance with the Allformants variant (p=0.07). The results of this study indicate that resolved and unresolved harmonics are processed in different but interacting pathways that converge in the upper brainstem. The results also support earlier work on the differential susceptibility of responses at F0 and F1 to added noise.
Collapse
Affiliation(s)
- M Laroche
- School of Information Technology and Engineering, University of Ottawa , ON, Canada
| | - H R Dajani
- School of Information Technology and Engineering, University of Ottawa , ON, Canada
| | - A M Marcoux
- Audiology and Speech-Language Pathology Program, University of Ottawa , ON, Canada
| |
Collapse
|
30
|
Neural encoding in the human brainstem relevant to the pitch of complex tones. Hear Res 2011; 275:110-9. [DOI: 10.1016/j.heares.2010.12.008] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Revised: 12/06/2010] [Accepted: 12/08/2010] [Indexed: 10/18/2022]
|
31
|
Santurette S, Dau T. The role of temporal fine structure information for the low pitch of high-frequency complex tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:282-292. [PMID: 21303009 DOI: 10.1121/1.3518718] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The fused low pitch evoked by complex tones containing only unresolved high-frequency components demonstrates the ability of the human auditory system to extract pitch using a temporal mechanism in the absence of spectral cues. However, the temporal features used by such a mechanism have been a matter of debate. For stimuli with components lying exclusively in high-frequency spectral regions, the slowly varying temporal envelope of sounds is often assumed to be the only information contained in auditory temporal representations, and it has remained controversial to what extent the fast amplitude fluctuations, or temporal fine structure (TFS), of the conveyed signal can be processed. Using a pitch matching paradigm, the present study found that the low pitch of inharmonic transposed tones with unresolved components was consistent with the timing between the most prominent TFS maxima in their waveforms, rather than envelope maxima. Moreover, envelope cues did not take over as the absolute frequency or rank of the lowest component was raised and TFS cues thus became less effective. Instead, the low pitch became less salient. This suggests that complex pitch perception does not rely on envelope coding as such, and that TFS representation might persist at higher frequencies than previously thought.
Collapse
Affiliation(s)
- Sébastien Santurette
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DTU Bygning 352, Orsteds Plads, 2800 Kongens Lyngby, Denmark.
| | | |
Collapse
|
32
|
Cedolin L, Delgutte B. Spatiotemporal representation of the pitch of harmonic complex tones in the auditory nerve. J Neurosci 2010; 30:12712-24. [PMID: 20861376 PMCID: PMC2957107 DOI: 10.1523/jneurosci.6365-09.2010] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 07/28/2010] [Accepted: 07/31/2010] [Indexed: 11/21/2022] Open
Abstract
The pitch of harmonic complex tones plays an important role in speech and music perception and the analysis of auditory scenes, yet traditional rate-place and temporal models for pitch processing provide only an incomplete description of the psychophysical data. To test physiologically a model based on spatiotemporal pitch cues created by the cochlear traveling wave (Shamma, 1985), we recorded from single fibers in the auditory nerve of anesthetized cat in response to harmonic complex tones with missing fundamentals and equal-amplitude harmonics. We used the principle of scaling invariance in cochlear mechanics to infer the spatiotemporal response pattern to a given stimulus from a series of measurements made in a single fiber as a function of fundamental frequency F0. We found that spatiotemporal cues to resolved harmonics are available for F0 values between 350 and 1100 Hz and that these cues are more robust than traditional rate-place cues at high stimulus levels. The lower F0 limit is determined by the limited frequency selectivity of the cochlea, whereas the upper limit is caused by the degradation of phase locking to the stimulus fine structure at high frequencies. The spatiotemporal representation is consistent with the upper F0 limit to the perception of the pitch of complex tones with a missing fundamental, and its effectiveness does not depend on the relative phase between resolved harmonics. The spatiotemporal representation is thus consistent with key trends in human psychophysics.
Collapse
Affiliation(s)
- Leonardo Cedolin
- Eaton–Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts 02114, and
- Speech and Hearing Bioscience and Technology Program, Harvard–Massachusetts Institute of Technology Division of Health Sciences and Technology, Cambridge, Massachusetts 02139
| | - Bertrand Delgutte
- Eaton–Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston, Massachusetts 02114, and
- Research Laboratory of Electronics, Massachusetts Institute of Technology, and
| |
Collapse
|
33
|
Micheyl C, Keebler MV, Oxenham AJ. Pitch perception for mixtures of spectrally overlapping harmonic complex tones. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:257-69. [PMID: 20649221 PMCID: PMC2921428 DOI: 10.1121/1.3372751] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2008] [Revised: 03/02/2010] [Accepted: 03/04/2010] [Indexed: 05/29/2023]
Abstract
This study measured difference limens for fundamental frequency (DLF0s) for a target harmonic complex in the presence of a simultaneous spectrally overlapping harmonic masker. The resolvability of the target harmonics was manipulated by bandpass filtering the stimuli into a low (800-2400 Hz) or high (1600-3200 Hz) spectral region, using different nominal F0s for the targets (100, 200, and 400 Hz), and different masker F0s (0, +9, or -9 semitones) relative to the target. Three different modes of masker presentation, relative to the target, were tested: ipsilateral, contralateral, and dichotic, with a higher masker level in the contralateral ear. Ipsilateral and dichotic maskers generally caused marked elevations in DLF0s compared to both the unmasked and contralateral masker conditions. Analyses based on excitation patterns revealed that ipsilaterally masked F0 difference limens were small (<2%) only when the excitation patterns evoked by the target-plus-masker mixture contained several salient (>1 dB) peaks at or close to target harmonic frequencies, even though these peaks were rarely produced by the target alone. The findings are discussed in terms of place- or place-time mechanisms of pitch perception.
Collapse
Affiliation(s)
- Christophe Micheyl
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455-0344, USA.
| | | | | |
Collapse
|
34
|
Nakamoto KT, Shackleton TM, Palmer AR. Responses in the inferior colliculus of the guinea pig to concurrent harmonic series and the effect of inactivation of descending controls. J Neurophysiol 2010; 103:2050-61. [PMID: 20147418 DOI: 10.1152/jn.00451.2009] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
One of the fundamental questions of auditory research is how sounds are segregated because, in natural environments, multiple sounds tend to occur at the same time. Concurrent sounds, such as two talkers, physically add together and arrive at the ear as a single input sound wave. The auditory system easily segregates this input into a coherent perception of each of the multiple sources. A common feature of speech and communication calls is their harmonic structure and in this report we used two harmonic complexes to study the role of the corticofugal pathway in the processing of concurrent sounds. We demonstrate that, in the inferior colliculus (IC) of the anesthetized guinea pig, deactivation of the auditory cortex altered the temporal and/or the spike response to the concurrent, monaural harmonic complexes. More specifically, deactivating the auditory cortex altered the representation of the relative level of the complexes. This suggests that the auditory cortex modulates the representation of the level of two harmonic complexes in the IC. Since sound level is a cue used in the segregation of auditory input, the corticofugal pathway may play a role in this segregation.
Collapse
Affiliation(s)
- Kyle T Nakamoto
- College of Medicine, Northeastern Ohio Universities, 4209 State Rt. 44, P.O. Box 95, Rootstown, OH 44272-0095, USA.
| | | | | |
Collapse
|
35
|
Neural correlates of consonance, dissonance, and the hierarchy of musical pitch in the human brainstem. J Neurosci 2009; 29:13165-71. [PMID: 19846704 DOI: 10.1523/jneurosci.3900-09.2009] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Consonant and dissonant pitch relationships in music provide the foundation of melody and harmony, the building blocks of Western tonal music. We hypothesized that phase-locked neural activity within the brainstem may preserve information relevant to these important perceptual attributes of music. To this end, we measured brainstem frequency-following responses (FFRs) from nonmusicians in response to the dichotic presentation of nine musical intervals that varied in their degree of consonance and dissonance. Neural pitch salience was computed for each response using temporally based autocorrelation and harmonic pitch sieve analyses. Brainstem responses to consonant intervals were more robust and yielded stronger pitch salience than those to dissonant intervals. In addition, the ordering of neural pitch salience across musical intervals followed the hierarchical arrangement of pitch stipulated by Western music theory. Finally, pitch salience derived from neural data showed high correspondence with behavioral consonance judgments (r = 0.81). These results suggest that brainstem neural mechanisms mediating pitch processing show preferential encoding of consonant musical relationships and, furthermore, preserve the hierarchical pitch relationships found in music, even for individuals without formal musical training. We infer that the basic pitch relationships governing music may be rooted in low-level sensory processing and that an encoding scheme that favors consonant pitch relationships may be one reason why such intervals are preferred behaviorally.
Collapse
|
36
|
Pitch, harmonicity and concurrent sound segregation: psychoacoustical and neurophysiological findings. Hear Res 2009; 266:36-51. [PMID: 19788920 DOI: 10.1016/j.heares.2009.09.012] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/08/2009] [Revised: 09/23/2009] [Accepted: 09/24/2009] [Indexed: 11/18/2022]
Abstract
Harmonic complex tones are a particularly important class of sounds found in both speech and music. Although these sounds contain multiple frequency components, they are usually perceived as a coherent whole, with a pitch corresponding to the fundamental frequency (F0). However, when two or more harmonic sounds occur concurrently, e.g., at a cocktail party or in a symphony, the auditory system must separate harmonics and assign them to their respective F0s so that a coherent and veridical representation of the different sounds sources is formed. Here we review both psychophysical and neurophysiological (single-unit and evoked-potential) findings, which provide some insight into how, and how well, the auditory system accomplishes this task. A survey of computational models designed to estimate multiple F0s and segregate concurrent sources is followed by a review of the empirical literature on the perception and neural coding of concurrent harmonic sounds, including vowels, as well as findings obtained using single complex tones with mistuned harmonics.
Collapse
|
37
|
Abstract
By measuring the auditory brainstem response to two musical intervals, the major sixth (E3 and G2) and the minor seventh (E3 and F#2), we found that musicians have a more specialized sensory system for processing behaviorally relevant aspects of sound. Musicians had heightened responses to the harmonics of the upper tone (E), as well as certain combination tones (sum tones) generated by nonlinear processing in the auditory system. In music, the upper note is typically carried by the upper voice, and the enhancement of the upper tone likely reflects musicians' extensive experience attending to the upper voice. Neural phase locking to the temporal periodicity of the amplitude-modulated envelope, which underlies the perception of musical harmony, was also more precise in musicians than nonmusicians. Neural enhancements were strongly correlated with years of musical training, and our findings, therefore, underscore the role that long-term experience with music plays in shaping auditory sensory encoding.
Collapse
|
38
|
Oxenham AJ. Pitch perception and auditory stream segregation: implications for hearing loss and cochlear implants. Trends Amplif 2008; 12:316-31. [PMID: 18974203 PMCID: PMC2901529 DOI: 10.1177/1084713808325881] [Citation(s) in RCA: 140] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Pitch is important for speech and music perception, and may also play a crucial role in our ability to segregate sounds that arrive from different sources. This article reviews some basic aspects of pitch coding in the normal auditory system and explores the implications for pitch perception in people with hearing impairments and cochlear implants. Data from normal-hearing listeners suggest that the low-frequency, low-numbered harmonics within complex tones are of prime importance in pitch perception and in the perceptual segregation of competing sounds. The poorer frequency selectivity experienced by many hearing-impaired listeners leads to less access to individual harmonics, and the coding schemes currently employed in cochlear implants provide little or no representation of individual harmonics. These deficits in the coding of harmonic sounds may underlie some of the difficulties experienced by people with hearing loss and cochlear implants, and may point to future areas where sound representation in auditory prostheses could be improved.
Collapse
|