1
|
Kalra L, Altman S, Bee MA. Perceptually salient differences in a species recognition cue do not promote auditory streaming in eastern grey treefrogs (Hyla versicolor). J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2024:10.1007/s00359-024-01702-9. [PMID: 38733407 DOI: 10.1007/s00359-024-01702-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/13/2024]
Abstract
Auditory streaming underlies a receiver's ability to organize complex mixtures of auditory input into distinct perceptual "streams" that represent different sound sources in the environment. During auditory streaming, sounds produced by the same source are integrated through time into a single, coherent auditory stream that is perceptually segregated from other concurrent sounds. Based on human psychoacoustic studies, one hypothesis regarding auditory streaming is that any sufficiently salient perceptual difference may lead to stream segregation. Here, we used the eastern grey treefrog, Hyla versicolor, to test this hypothesis in the context of vocal communication in a non-human animal. In this system, females choose their mate based on perceiving species-specific features of a male's pulsatile advertisement calls in social environments (choruses) characterized by mixtures of overlapping vocalizations. We employed an experimental paradigm from human psychoacoustics to design interleaved pulsatile sequences (ABAB…) that mimicked key features of the species' advertisement call, and in which alternating pulses differed in pulse rise time, which is a robust species recognition cue in eastern grey treefrogs. Using phonotaxis assays, we found no evidence that perceptually salient differences in pulse rise time promoted the segregation of interleaved pulse sequences into distinct auditory streams. These results do not support the hypothesis that any perceptually salient acoustic difference can be exploited as a cue for stream segregation in all species. We discuss these findings in the context of cues used for species recognition and auditory streaming.
Collapse
Affiliation(s)
- Lata Kalra
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN, 55108, USA.
| | - Shoshana Altman
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN, 55108, USA
| | - Mark A Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN, 55108, USA
| |
Collapse
|
2
|
Simmons JA, Hom KN, Simmons AM. Temporal coherence of harmonic frequencies affects echo detection in the big brown bat, Eptesicus fuscus. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:3321-3327. [PMID: 37983295 DOI: 10.1121/10.0022444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 11/01/2023] [Indexed: 11/22/2023]
Abstract
Echolocating big brown bats (Eptesicus fuscus) broadcast frequency modulated (FM) ultrasonic pulses containing two prominent harmonic sweeps (FM1, FM2). Both harmonics typically return as echoes at the same absolute time delay following the broadcast, making them coherent. Electronically splitting FM1 and FM2 allows their time delays to be controlled separately, making them non-coherent. Earlier work shows that big brown bats discriminate coherent from split harmonic, non-coherent echoes and that disruptions of harmonic coherence produce blurry acoustic images. A psychophysical experiment on two trained big brown bats tested the hypothesis that detection thresholds for split harmonic, non-coherent echoes are higher than those for coherent echoes. Thresholds of the two bats for detecting 1-glint echoes with coherent harmonics were around 35 and 36 dB sound pressure level, respectively, while thresholds for split harmonic echoes were about 10 dB higher. When the delay of FM2 in split harmonic echoes is shortened by 75 μs to offset neural amplitude-latency trading and restore coherence in the auditory representation, thresholds decreased back down to those estimated for coherent echoes. These results show that echo detection is affected by loss of harmonic coherence, consistent with the proposed broader role of coherence across frequencies for auditory perception.
Collapse
Affiliation(s)
- James A Simmons
- Department of Neuroscience and Carney Institute for Brain Science, Brown University, 185 Meeting Street, Providence, Rhode Island 02912, USA
| | - Kelsey N Hom
- Department of Neuroscience and Carney Institute for Brain Science, Brown University, 185 Meeting Street, Providence, Rhode Island 02912, USA
| | - Andrea Megela Simmons
- Department of Neuroscience and Carney Institute for Brain Science, Brown University, 185 Meeting Street, Providence, Rhode Island 02912, USA
- Department of Cognitive, Linguistic and Psychological Sciences, Brown University, 190 Thayer Street, Providence, Rhode Island 02912, USA
| |
Collapse
|
3
|
Shen Y, Langley L. Spectral weighting for sentence recognition in steady-state and amplitude-modulated noise. JASA EXPRESS LETTERS 2023; 3:2887651. [PMID: 37125871 PMCID: PMC10155216 DOI: 10.1121/10.0017934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Accepted: 04/06/2023] [Indexed: 05/03/2023]
Abstract
Spectral weights in octave-frequency bands from 0.25 to 4 kHz were estimated for speech-in-noise recognition using two sentence materials (i.e., the IEEE and AzBio sentences). The masking noise was either unmodulated or sinusoidally amplitude-modulated at 8 Hz. The estimated spectral weights did not vary significantly across two test sessions and were similar for the two sentence materials. Amplitude-modulating the masker increased the weight at 2 kHz and decreased the weight at 0.25 kHz, which may support an upward shift in spectral weights for temporally fluctuating maskers.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| | - Lauren Langley
- Department of Speech and Hearing Sciences, University of Washington, 1417 Northeast 42nd Street, Seattle, Washington 98105-6246, ,
| |
Collapse
|
4
|
Weise A, Grimm S, Maria Rimmele J, Schröger E. Auditory representations for long lasting sounds: Insights from event-related brain potentials and neural oscillations. BRAIN AND LANGUAGE 2023; 237:105221. [PMID: 36623340 DOI: 10.1016/j.bandl.2022.105221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The basic features of short sounds, such as frequency and intensity including their temporal dynamics, are integrated in a unitary representation. Knowledge on how our brain processes long lasting sounds is scarce. We review research utilizing the Mismatch Negativity event-related potential and neural oscillatory activity for studying representations for long lasting simple versus complex sounds such as sinusoidal tones versus speech. There is evidence for a temporal constraint in the formation of auditory representations: Auditory edges like sound onsets within long lasting sounds open a temporal window of about 350 ms in which the sounds' dynamics are integrated into a representation, while information beyond that window contributes less to that representation. This integration window segments the auditory input into short chunks. We argue that the representations established in adjacent integration windows can be concatenated into an auditory representation of a long sound, thus, overcoming the temporal constraint.
Collapse
Affiliation(s)
- Annekathrin Weise
- Department of Psychology, Ludwig-Maximilians-University Munich, Germany; Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Sabine Grimm
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Johanna Maria Rimmele
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Germany; Center for Language, Music and Emotion, New York University, Max Planck Institute, Department of Psychology, 6 Washington Place, New York, NY 10003, United States.
| | - Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| |
Collapse
|
5
|
Lanzilotti C, Andéol G, Micheyl C, Scannella S. Cocktail party training induces increased speech intelligibility and decreased cortical activity in bilateral inferior frontal gyri. A functional near-infrared study. PLoS One 2022; 17:e0277801. [PMID: 36454948 PMCID: PMC9714910 DOI: 10.1371/journal.pone.0277801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 11/03/2022] [Indexed: 12/03/2022] Open
Abstract
The human brain networks responsible for selectively listening to a voice amid other talkers remain to be clarified. The present study aimed to investigate relationships between cortical activity and performance in a speech-in-speech task, before (Experiment I) and after training-induced improvements (Experiment II). In Experiment I, 74 participants performed a speech-in-speech task while their cortical activity was measured using a functional near infrared spectroscopy (fNIRS) device. One target talker and one masker talker were simultaneously presented at three different target-to-masker ratios (TMRs): adverse, intermediate and favorable. Behavioral results show that performance may increase monotonically with TMR in some participants and failed to decrease, or even improved, in the adverse-TMR condition for others. On the neural level, an extensive brain network including the frontal (left prefrontal cortex, right dorsolateral prefrontal cortex and bilateral inferior frontal gyri) and temporal (bilateral auditory cortex) regions was more solicited by the intermediate condition than the two others. Additionally, bilateral frontal gyri and left auditory cortex activities were found to be positively correlated with behavioral performance in the adverse-TMR condition. In Experiment II, 27 participants, whose performance was the poorest in the adverse-TMR condition of Experiment I, were trained to improve performance in that condition. Results show significant performance improvements along with decreased activity in bilateral inferior frontal gyri, the right dorsolateral prefrontal cortex, the left inferior parietal cortex and the right auditory cortex in the adverse-TMR condition after training. Arguably, lower neural activity reflects higher efficiency in processing masker inhibition after speech-in-speech training. As speech-in-noise tasks also imply frontal and temporal regions, we suggest that regardless of the type of masking (speech or noise) the complexity of the task will prompt the implication of a similar brain network. Furthermore, the initial significant cognitive recruitment will be reduced following a training leading to an economy of cognitive resources.
Collapse
Affiliation(s)
- Cosima Lanzilotti
- Département Neuroscience et Sciences Cognitives, Institut de Recherche Biomédicale des Armées, Brétigny sur Orge, France
- ISAE-SUPAERO, Université de Toulouse, Toulouse, France
- Thales SIX GTS France, Gennevilliers, France
| | - Guillaume Andéol
- Département Neuroscience et Sciences Cognitives, Institut de Recherche Biomédicale des Armées, Brétigny sur Orge, France
| | | | | |
Collapse
|
6
|
Thomassen S, Hartung K, Einhäuser W, Bendixen A. Low-high-low or high-low-high? Pattern effects on sequential auditory scene analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2758. [PMID: 36456271 DOI: 10.1121/10.0015054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 10/17/2022] [Indexed: 06/17/2023]
Abstract
Sequential auditory scene analysis (ASA) is often studied using sequences of two alternating tones, such as ABAB or ABA_, with "_" denoting a silent gap, and "A" and "B" sine tones differing in frequency (nominally low and high). Many studies implicitly assume that the specific arrangement (ABAB vs ABA_, as well as low-high-low vs high-low-high within ABA_) plays a negligible role, such that decisions about the tone pattern can be governed by other considerations. To explicitly test this assumption, a systematic comparison of different tone patterns for two-tone sequences was performed in three different experiments. Participants were asked to report whether they perceived the sequences as originating from a single sound source (integrated) or from two interleaved sources (segregated). Results indicate that core findings of sequential ASA, such as an effect of frequency separation on the proportion of integrated and segregated percepts, are similar across the different patterns during prolonged listening. However, at sequence onset, the integrated percept was more likely to be reported by the participants in ABA_low-high-low than in ABA_high-low-high sequences. This asymmetry is important for models of sequential ASA, since the formation of percepts at onset is an integral part of understanding how auditory interpretations build up.
Collapse
Affiliation(s)
- Sabine Thomassen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Kevin Hartung
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Wolfgang Einhäuser
- Physics of Cognition Group, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| | - Alexandra Bendixen
- Cognitive Systems Lab, Faculty of Natural Sciences, Chemnitz University of Technology, 09107 Chemnitz, Germany
| |
Collapse
|
7
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker’s fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker’s speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- *Correspondence: Christian Brodbeck,
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
8
|
Abstract
Hearing in noise is a core problem in audition, and a challenge for hearing-impaired listeners, yet the underlying mechanisms are poorly understood. We explored whether harmonic frequency relations, a signature property of many communication sounds, aid hearing in noise for normal hearing listeners. We measured detection thresholds in noise for tones and speech synthesized to have harmonic or inharmonic spectra. Harmonic signals were consistently easier to detect than otherwise identical inharmonic signals. Harmonicity also improved discrimination of sounds in noise. The largest benefits were observed for two-note up-down "pitch" discrimination and melodic contour discrimination, both of which could be performed equally well with harmonic and inharmonic tones in quiet, but which showed large harmonic advantages in noise. The results show that harmonicity facilitates hearing in noise, plausibly by providing a noise-robust pitch cue that aids detection and discrimination.
Collapse
|
9
|
Guest DR, Oxenham AJ. Human discrimination and modeling of high-frequency complex tones shed light on the neural codes for pitch. PLoS Comput Biol 2022; 18:e1009889. [PMID: 35239639 PMCID: PMC8923464 DOI: 10.1371/journal.pcbi.1009889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Revised: 03/15/2022] [Accepted: 02/02/2022] [Indexed: 11/24/2022] Open
Abstract
Accurate pitch perception of harmonic complex tones is widely believed to rely on temporal fine structure information conveyed by the precise phase-locked responses of auditory-nerve fibers. However, accurate pitch perception remains possible even when spectrally resolved harmonics are presented at frequencies beyond the putative limits of neural phase locking, and it is unclear whether residual temporal information, or a coarser rate-place code, underlies this ability. We addressed this question by measuring human pitch discrimination at low and high frequencies for harmonic complex tones, presented either in isolation or in the presence of concurrent complex-tone maskers. We found that concurrent complex-tone maskers impaired performance at both low and high frequencies, although the impairment introduced by adding maskers at high frequencies relative to low frequencies differed between the tested masker types. We then combined simulated auditory-nerve responses to our stimuli with ideal-observer analysis to quantify the extent to which performance was limited by peripheral factors. We found that the worsening of both frequency discrimination and F0 discrimination at high frequencies could be well accounted for (in relative terms) by optimal decoding of all available information at the level of the auditory nerve. A Python package is provided to reproduce these results, and to simulate responses to acoustic stimuli from the three previously published models of the human auditory nerve used in our analyses.
Collapse
Affiliation(s)
- Daniel R. Guest
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Andrew J. Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
10
|
Sound Source Separation Mechanisms of Different Deep Networks Explained from the Perspective of Auditory Perception. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12020832] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Thanks to the development of deep learning, various sound source separation networks have been proposed and made significant progress. However, the study on the underlying separation mechanisms is still in its infancy. In this study, deep networks are explained from the perspective of auditory perception mechanisms. For separating two arbitrary sound sources from monaural recordings, three different networks with different parameters are trained and achieve excellent performances. The networks’ output can obtain an average scale-invariant signal-to-distortion ratio improvement (SI-SDRi) higher than 10 dB, comparable with the human performance to separate natural sources. More importantly, the most intuitive principle—proximity—is explored through simultaneous and sequential organization experiments. Results show that regardless of network structures and parameters, the proximity principle is learned spontaneously by all networks. If components are proximate in frequency or time, they are not easily separated by networks. Moreover, the frequency resolution at low frequencies is better than at high frequencies. These behavior characteristics of all three networks are highly consistent with those of the human auditory system, which implies that the learned proximity principle is not accidental, but the optimal strategy selected by networks and humans when facing the same task. The emergence of the auditory-like separation mechanisms provides the possibility to develop a universal system that can be adapted to all sources and scenes.
Collapse
|
11
|
Mehrkian S, Moossavi A, Gohari N, Nazari MA, Bakhshi E, Alain C. Long Latency Auditory Evoked Potentials and Object-Related Negativity Based on Harmonicity in Hearing-Impaired Children. Neurosci Res 2022; 178:52-59. [PMID: 35007647 DOI: 10.1016/j.neures.2022.01.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2021] [Revised: 11/10/2021] [Accepted: 01/06/2022] [Indexed: 11/28/2022]
Abstract
Hearing-impaired children (HIC) have difficulty understanding speech in noise, which may be due to difficulty parsing concurrent sound object based on harmonicity cues. Using long latency auditory evoked potentials (LLAEPs) and object-related negativity (ORN), a neural metric of concurrent sound segregation, this study investigated the sensitivity of HIC in processing harmonic relation. The participants were 14 normal-hearing children (NHC) with an average age of 7.82 ± 1.31 years and 17 HIC with an average age of 7.98 ± 1.25 years. They were presented with a sequence of 200 Hz harmonic complex tones that had either all harmonic in tune or the third harmonic mistuned by 2%, 4%, 8%, and 16% of its original value while neuroelectric brain activity was recorded. The analysis of scalp-recorded LLAEPs revealed lower N2 amplitudes elicited by the tuned stimuli in HIC than control. The ORN, isolated in difference wave between LLAEP elicited by tuned and mistuned stimuli, was delayed and smaller in HIC than NHC. This study showed that deficits in processing harmonic relation in HIC, which may contribute to their difficulty in understanding speech in noise. As a result, top-down and bottom-up rehabilitations aiming to improve processing of basic acoustic characteristics, including harmonics are recommended for children with hearing loss.
Collapse
Affiliation(s)
- Saeideh Mehrkian
- Department of Audiology, University of Social Welfare and Rehabilitation Science, Tehran, Iran
| | - Abdollah Moossavi
- Department of Otolaryngology and Head and Neck Surgery, School of Medicine, Iran University of Medical Science, Tehran, Iran
| | - Nasrin Gohari
- Department of Audiology, University of Social Welfare and Rehabilitation Science, Tehran, Iran.
| | - Mohammad Ali Nazari
- Department of Neuroscience, Faculty of Advanced Technologies in Medicine, Iran University of Medical Sciences, Tehran, Iran
| | - Enayatollah Bakhshi
- Department of Biostatistics and Epidemiology, University of Social Welfare and Rehabilitation Sciences, Tehran, Iran
| | - Claude Alain
- The Rotman Research Institute, Baycrest Centre for Geriatric Care, University of Toronto, Canada, & Department of Psychology, University of Toronto, Canada
| |
Collapse
|
12
|
Wagner JD, Gelman A, Hancock KE, Chung Y, Delgutte B. Rabbits use both spectral and temporal cues to discriminate the fundamental frequency of harmonic complexes with missing fundamentals. J Neurophysiol 2022; 127:290-312. [PMID: 34879207 PMCID: PMC8759963 DOI: 10.1152/jn.00366.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
The pitch of harmonic complex tones (HCTs) common in speech, music, and animal vocalizations plays a key role in the perceptual organization of sound. Unraveling the neural mechanisms of pitch perception requires animal models, but little is known about complex pitch perception by animals, and some species appear to use different pitch mechanisms than humans. Here, we tested rabbits' ability to discriminate the fundamental frequency (F0) of HCTs with missing fundamentals, using a behavioral paradigm inspired by foraging behavior in which rabbits learned to harness a spatial gradient in F0 to find the location of a virtual target within a room for a food reward. Rabbits were initially trained to discriminate HCTs with F0s in the range 400-800 Hz and with harmonics covering a wide frequency range (800-16,000 Hz) and then tested with stimuli differing in spectral composition to test the role of harmonic resolvability (experiment 1) or in F0 range (experiment 2) or in both F0 and spectral content (experiment 3). Together, these experiments show that rabbits can discriminate HCTs over a wide F0 range (200-1,600 Hz) encompassing the range of conspecific vocalizations and can use either the spectral pattern of harmonics resolved by the cochlea for higher F0s or temporal envelope cues resulting from interaction between unresolved harmonics for lower F0s. The qualitative similarity of these results to human performance supports the use of rabbits as an animal model for studies of pitch mechanisms, providing species differences in cochlear frequency selectivity and F0 range of vocalizations are taken into account.NEW & NOTEWORTHY Understanding the neural mechanisms of pitch perception requires experiments in animal models, but little is known about pitch perception by animals. Here we show that rabbits, a popular animal in auditory neuroscience, can discriminate complex sounds differing in pitch using either spectral cues or temporal cues. The results suggest that the role of spectral cues in pitch perception by animals may have been underestimated by predominantly testing low frequencies in the range of human voice.
Collapse
Affiliation(s)
- Joseph D. Wagner
- 1Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, Massachusetts,3Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Alice Gelman
- 1Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, Massachusetts
| | - Kenneth E. Hancock
- 1Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, Massachusetts,2Department of Otolaryngology, Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts
| | - Yoojin Chung
- 1Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, Massachusetts,2Department of Otolaryngology, Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts
| | - Bertrand Delgutte
- 1Eaton-Peabody Laboratories, Massachusetts Eye and Ear, Boston, Massachusetts,2Department of Otolaryngology, Head and Neck Surgery, Harvard Medical School, Boston, Massachusetts
| |
Collapse
|
13
|
Etard O, Messaoud RB, Gaugain G, Reichenbach T. No Evidence of Attentional Modulation of the Neural Response to the Temporal Fine Structure of Continuous Musical Pieces. J Cogn Neurosci 2021; 34:411-424. [PMID: 35015867 DOI: 10.1162/jocn_a_01811] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Speech and music are spectrotemporally complex acoustic signals that are highly relevant for humans. Both contain a temporal fine structure that is encoded in the neural responses of subcortical and cortical processing centers. The subcortical response to the temporal fine structure of speech has recently been shown to be modulated by selective attention to one of two competing voices. Music similarly often consists of several simultaneous melodic lines, and a listener can selectively attend to a particular one at a time. However, the neural mechanisms that enable such selective attention remain largely enigmatic, not least since most investigations to date have focused on short and simplified musical stimuli. Here, we studied the neural encoding of classical musical pieces in human volunteers, using scalp EEG recordings. We presented volunteers with continuous musical pieces composed of one or two instruments. In the latter case, the participants were asked to selectively attend to one of the two competing instruments and to perform a vibrato identification task. We used linear encoding and decoding models to relate the recorded EEG activity to the stimulus waveform. We show that we can measure neural responses to the temporal fine structure of melodic lines played by one single instrument, at the population level as well as for most individual participants. The neural response peaks at a latency of 7.6 msec and is not measurable past 15 msec. When analyzing the neural responses to the temporal fine structure elicited by competing instruments, we found no evidence of attentional modulation. We observed, however, that low-frequency neural activity exhibited a modulation consistent with the behavioral task at latencies from 100 to 160 msec, in a similar manner to the attentional modulation observed in continuous speech (N100). Our results show that, much like speech, the temporal fine structure of music is tracked by neural activity. In contrast to speech, however, this response appears unaffected by selective attention in the context of our experiment.
Collapse
|
14
|
Lebovich L, Yunerman M, Scaiewicz V, Loewenstein Y, Rokni D. Paradoxical relationship between speed and accuracy in olfactory figure-background segregation. PLoS Comput Biol 2021; 17:e1009674. [PMID: 34871306 PMCID: PMC8675919 DOI: 10.1371/journal.pcbi.1009674] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 12/16/2021] [Accepted: 11/20/2021] [Indexed: 11/19/2022] Open
Abstract
In natural settings, many stimuli impinge on our sensory organs simultaneously. Parsing these sensory stimuli into perceptual objects is a fundamental task faced by all sensory systems. Similar to other sensory modalities, increased odor backgrounds decrease the detectability of target odors by the olfactory system. The mechanisms by which background odors interfere with the detection and identification of target odors are unknown. Here we utilized the framework of the Drift Diffusion Model (DDM) to consider possible interference mechanisms in an odor detection task. We first considered pure effects of background odors on either signal or noise in the decision-making dynamics and showed that these produce different predictions about decision accuracy and speed. To test these predictions, we trained mice to detect target odors that are embedded in random background mixtures in a two-alternative choice task. In this task, the inter-trial interval was independent of behavioral reaction times to avoid motivating rapid responses. We found that increased backgrounds reduce mouse performance but paradoxically also decrease reaction times, suggesting that noise in the decision making process is increased by backgrounds. We further assessed the contributions of background effects on both noise and signal by fitting the DDM to the behavioral data. The models showed that background odors affect both the signal and the noise, but that the paradoxical relationship between trial difficulty and reaction time is caused by the added noise. Sensory systems are constantly stimulated by signals from many objects in the environment. Segmentation of important signals from the cluttered background is therefore a task that is faced by all sensory systems. For many mammalians, the sense of smell is the primary sense that guides many daily behaviors. As such, the olfactory system must be able to detect and identify odors of interest against varying and dynamic backgrounds. Here we studied how background odors interfere with the detection of target odors. We trained mice on a task in which they are presented with odor mixtures and are required to report whether they include either of two target odors. We analyze the behavioral data using a common model of sensory-guided decision-making—the drift-diffusion-model. In this model, decisions are influenced by two elements: a drift which is the signal produced by the stimulus, and noise. We show that the addition of background odors has a dual effect—a reduction in the drift, as well as an increase in the noise. The increased noise also causes more rapid decisions, thereby producing a paradoxical relationship between trial difficulty and decision speed; mice make faster decisions on more difficult trials.
Collapse
Affiliation(s)
- Lior Lebovich
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University, Jerusalem, Israel
| | - Michael Yunerman
- Department of Medical Neurobiology, School of Medicine and IMRIC, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Viviana Scaiewicz
- Department of Medical Neurobiology, School of Medicine and IMRIC, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yonatan Loewenstein
- The Edmond and Lily Safra Center for Brain Sciences, The Hebrew University, Jerusalem, Israel
- The Alexander Silberman Institute of Life Sciences, The Hebrew University, Jerusalem, Israel
- Department of Cognitive Sciences and The Federmann Center for the Study of Rationality, The Hebrew University, Jerusalem, Israel
| | - Dan Rokni
- Department of Medical Neurobiology, School of Medicine and IMRIC, The Hebrew University of Jerusalem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|
15
|
Allen KM, Salles A, Park S, Elhilali M, Moss CF. Effect of background clutter on neural discrimination in the bat auditory midbrain. J Neurophysiol 2021; 126:1772-1782. [PMID: 34669503 PMCID: PMC8794058 DOI: 10.1152/jn.00109.2021] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 09/22/2021] [Accepted: 10/12/2021] [Indexed: 11/22/2022] Open
Abstract
The discrimination of complex sounds is a fundamental function of the auditory system. This operation must be robust in the presence of noise and acoustic clutter. Echolocating bats are auditory specialists that discriminate sonar objects in acoustically complex environments. Bats produce brief signals, interrupted by periods of silence, rendering echo snapshots of sonar objects. Sonar object discrimination requires that bats process spatially and temporally overlapping echoes to make split-second decisions. The mechanisms that enable this discrimination are not well understood, particularly in complex environments. We explored the neural underpinnings of sonar object discrimination in the presence of acoustic scattering caused by physical clutter. We performed electrophysiological recordings in the inferior colliculus of awake big brown bats, to broadcasts of prerecorded echoes from physical objects. We acquired single unit responses to echoes and discovered a subpopulation of IC neurons that encode acoustic features that can be used to discriminate between sonar objects. We further investigated the effects of environmental clutter on this population's encoding of acoustic features. We discovered that the effect of background clutter on sonar object discrimination is highly variable and depends on object properties and target-clutter spatiotemporal separation. In many conditions, clutter impaired discrimination of sonar objects. However, in some instances clutter enhanced acoustic features of echo returns, enabling higher levels of discrimination. This finding suggests that environmental clutter may augment acoustic cues used for sonar target discrimination and provides further evidence in a growing body of literature that noise is not universally detrimental to sensory encoding.NEW & NOTEWORTHY Bats are powerful animal models for investigating the encoding of auditory objects under acoustically challenging conditions. Although past work has considered the effect of acoustic clutter on sonar target detection, less is known about target discrimination in clutter. Our work shows that the neural encoding of auditory objects was affected by clutter in a distance-dependent manner. These findings advance the knowledge on auditory object detection and discrimination and noise-dependent stimulus enhancement.
Collapse
Affiliation(s)
- Kathryne M Allen
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Angeles Salles
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
| | - Sangwook Park
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland
| | - Cynthia F Moss
- Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, Maryland
- Department of Neuroscience, Johns Hopkins University, Baltimore, Maryland
- Department of Mechanical Engineering, Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
16
|
Viswanathan V, Shinn-Cunningham BG, Heinz MG. Temporal fine structure influences voicing confusions for consonant identification in multi-talker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2664. [PMID: 34717498 PMCID: PMC8514254 DOI: 10.1121/10.0006527] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/07/2021] [Accepted: 09/09/2021] [Indexed: 05/17/2023]
Abstract
To understand the mechanisms of speech perception in everyday listening environments, it is important to elucidate the relative contributions of different acoustic cues in transmitting phonetic content. Previous studies suggest that the envelope of speech in different frequency bands conveys most speech content, while the temporal fine structure (TFS) can aid in segregating target speech from background noise. However, the role of TFS in conveying phonetic content beyond what envelopes convey for intact speech in complex acoustic scenes is poorly understood. The present study addressed this question using online psychophysical experiments to measure the identification of consonants in multi-talker babble for intelligibility-matched intact and 64-channel envelope-vocoded stimuli. Consonant confusion patterns revealed that listeners had a greater tendency in the vocoded (versus intact) condition to be biased toward reporting that they heard an unvoiced consonant, despite envelope and place cues being largely preserved. This result was replicated when babble instances were varied across independent experiments, suggesting that TFS conveys voicing information beyond what is conveyed by envelopes for intact speech in babble. Given that multi-talker babble is a masker that is ubiquitous in everyday environments, this finding has implications for the design of assistive listening devices such as cochlear implants.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G. Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
17
|
Viswanathan V, Bharadwaj HM, Shinn-Cunningham BG, Heinz MG. Modulation masking and fine structure shape neural envelope coding to predict speech intelligibility across diverse listening conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2230. [PMID: 34598642 PMCID: PMC8483789 DOI: 10.1121/10.0006385] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/22/2021] [Accepted: 08/30/2021] [Indexed: 05/28/2023]
Abstract
A fundamental question in the neuroscience of everyday communication is how scene acoustics shape the neural processing of attended speech sounds and in turn impact speech intelligibility. While it is well known that the temporal envelopes in target speech are important for intelligibility, how the neural encoding of target-speech envelopes is influenced by background sounds or other acoustic features of the scene is unknown. Here, we combine human electroencephalography with simultaneous intelligibility measurements to address this key gap. We find that the neural envelope-domain signal-to-noise ratio in target-speech encoding, which is shaped by masker modulations, predicts intelligibility over a range of strategically chosen realistic listening conditions unseen by the predictive model. This provides neurophysiological evidence for modulation masking. Moreover, using high-resolution vocoding to carefully control peripheral envelopes, we show that target-envelope coding fidelity in the brain depends not only on envelopes conveyed by the cochlea, but also on the temporal fine structure (TFS), which supports scene segregation. Our results are consistent with the notion that temporal coherence of sound elements across envelopes and/or TFS influences scene analysis and attentive selection of a target sound. Our findings also inform speech-intelligibility models and technologies attempting to improve real-world speech communication.
Collapse
Affiliation(s)
- Vibha Viswanathan
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, Indiana 47907, USA
| | - Hari M Bharadwaj
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | - Michael G Heinz
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| |
Collapse
|
18
|
Homma NY, Bajo VM. Lemniscal Corticothalamic Feedback in Auditory Scene Analysis. Front Neurosci 2021; 15:723893. [PMID: 34489635 PMCID: PMC8417129 DOI: 10.3389/fnins.2021.723893] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 07/30/2021] [Indexed: 12/15/2022] Open
Abstract
Sound information is transmitted from the ear to central auditory stations of the brain via several nuclei. In addition to these ascending pathways there exist descending projections that can influence the information processing at each of these nuclei. A major descending pathway in the auditory system is the feedback projection from layer VI of the primary auditory cortex (A1) to the ventral division of medial geniculate body (MGBv) in the thalamus. The corticothalamic axons have small glutamatergic terminals that can modulate thalamic processing and thalamocortical information transmission. Corticothalamic neurons also provide input to GABAergic neurons of the thalamic reticular nucleus (TRN) that receives collaterals from the ascending thalamic axons. The balance of corticothalamic and TRN inputs has been shown to refine frequency tuning, firing patterns, and gating of MGBv neurons. Therefore, the thalamus is not merely a relay stage in the chain of auditory nuclei but does participate in complex aspects of sound processing that include top-down modulations. In this review, we aim (i) to examine how lemniscal corticothalamic feedback modulates responses in MGBv neurons, and (ii) to explore how the feedback contributes to auditory scene analysis, particularly on frequency and harmonic perception. Finally, we will discuss potential implications of the role of corticothalamic feedback in music and speech perception, where precise spectral and temporal processing is essential.
Collapse
Affiliation(s)
- Natsumi Y. Homma
- Center for Integrative Neuroscience, University of California, San Francisco, San Francisco, CA, United States
- Coleman Memorial Laboratory, Department of Otolaryngology – Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Victoria M. Bajo
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
19
|
Demany L, Monteiro G, Semal C, Shamma S, Carlyon RP. The perception of octave pitch affinity and harmonic fusion have a common origin. Hear Res 2021; 404:108213. [PMID: 33662686 PMCID: PMC7614450 DOI: 10.1016/j.heares.2021.108213] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/05/2021] [Accepted: 02/10/2021] [Indexed: 02/06/2023]
Abstract
Musicians say that the pitches of tones with a frequency ratio of 2:1 (one octave) have a distinctive affinity, even if the tones do not have common spectral components. It has been suggested, however, that this affinity judgment has no biological basis and originates instead from an acculturation process ‒ the learning of musical rules unrelated to auditory physiology. We measured, in young amateur musicians, the perceptual detectability of octave mistunings for tones presented alternately (melodic condition) or simultaneously (harmonic condition). In the melodic condition, mistuning was detectable only by means of explicit pitch comparisons. In the harmonic condition, listeners could use a different and more efficient perceptual cue: in the absence of mistuning, the tones fused into a single sound percept; mistunings decreased fusion. Performance was globally better in the harmonic condition, in line with the hypothesis that listeners used a fusion cue in this condition; this hypothesis was also supported by results showing that an illusory simultaneity of the tones was much less advantageous than a real simultaneity. In the two conditions, mistuning detection was generally better for octave compressions than for octave stretchings. This asymmetry varied across listeners, but crucially the listener-specific asymmetries observed in the two conditions were highly correlated. Thus, the perception of the melodic octave appeared to be closely linked to the phenomenon of harmonic fusion. As harmonic fusion is thought to be determined by biological factors rather than factors related to musical culture or training, we argue that octave pitch affinity also has, at least in part, a biological basis.
Collapse
Affiliation(s)
- Laurent Demany
- Institut de Neurosciences Cognitives et Intégratives d'Aquitaine, CNRS, EPHE, and Université de Bordeaux, Bordeaux, France.
| | - Guilherme Monteiro
- Institut de Neurosciences Cognitives et Intégratives d'Aquitaine, CNRS, EPHE, and Université de Bordeaux, Bordeaux, France
| | - Catherine Semal
- Institut de Neurosciences Cognitives et Intégratives d'Aquitaine, CNRS, EPHE, and Université de Bordeaux, Bordeaux, France; Bordeaux INP, Bordeaux, France.
| | - Shihab Shamma
- Institute for Systems Research, University of Maryland, College Park, MD, United States; Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, France.
| | - Robert P Carlyon
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, Cambridge, United Kingdom.
| |
Collapse
|
20
|
de Cheveigné A. Harmonic Cancellation-A Fundamental of Auditory Scene Analysis. Trends Hear 2021; 25:23312165211041422. [PMID: 34698574 PMCID: PMC8552394 DOI: 10.1177/23312165211041422] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 07/23/2021] [Accepted: 07/09/2021] [Indexed: 11/16/2022] Open
Abstract
This paper reviews the hypothesis of harmonic cancellation according to which an interfering sound is suppressed or canceled on the basis of its harmonicity (or periodicity in the time domain) for the purpose of Auditory Scene Analysis. It defines the concept, discusses theoretical arguments in its favor, and reviews experimental results that support it, or not. If correct, the hypothesis may draw on time-domain processing of temporally accurate neural representations within the brainstem, as required also by the classic equalization-cancellation model of binaural unmasking. The hypothesis predicts that a target sound corrupted by interference will be easier to hear if the interference is harmonic than inharmonic, all else being equal. This prediction is borne out in a number of behavioral studies, but not all. The paper reviews those results, with the aim to understand the inconsistencies and come up with a reliable conclusion for, or against, the hypothesis of harmonic cancellation within the auditory system.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des systèmes perceptifs, CNRS, Paris, France
- Département d’études cognitives, École normale supérieure, PSL
University, Paris, France
- UCL Ear Institute, London, UK
| |
Collapse
|
21
|
Adaptation to pitch-altered feedback is independent of one's own voice pitch sensitivity. Sci Rep 2020; 10:16860. [PMID: 33033324 PMCID: PMC7544828 DOI: 10.1038/s41598-020-73932-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 09/23/2020] [Indexed: 01/17/2023] Open
Abstract
Monitoring voice pitch is a fine-tuned process in daily conversations as conveying accurately the linguistic and affective cues in a given utterance depends on the precise control of phonation and intonation. This monitoring is thought to depend on whether the error is treated as self-generated or externally-generated, resulting in either a correction or inflation of errors. The present study reports on two separate paradigms of adaptation to altered feedback to explore whether participants could behave in a more cohesive manner once the error is of comparable size perceptually. The vocal behavior of normal-hearing and fluent speakers was recorded in response to a personalized size of pitch shift versus a non-specific size, one semitone. The personalized size of shift was determined based on the just-noticeable difference in fundamental frequency (F0) of each participant’s voice. Here we show that both tasks successfully demonstrated opposing responses to a constant and predictable F0 perturbation (on from the production onset) but these effects barely carried over once the feedback was back to normal, depicting a pattern that bears some resemblance to compensatory responses. Experiencing a F0 shift that is perceived as self-generated (because it was precisely just-noticeable) is not enough to force speakers to behave more consistently and more homogeneously in an opposing manner. On the contrary, our results suggest that the type of the response as well as the magnitude of the response do not depend in any trivial way on the sensitivity of participants to their own voice pitch. Based on this finding, we speculate that error correction could possibly occur even with a bionic ear, typically even when F0 cues are too subtle for cochlear implant users to detect accurately.
Collapse
|
22
|
Paredes-Gallardo A, Dau T, Marozeau J. Auditory Stream Segregation Can Be Modeled by Neural Competition in Cochlear Implant Listeners. Front Comput Neurosci 2019; 13:42. [PMID: 31333438 PMCID: PMC6616076 DOI: 10.3389/fncom.2019.00042] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 06/17/2019] [Indexed: 11/13/2022] Open
Abstract
Auditory stream segregation is a perceptual process by which the human auditory system groups sounds from different sources into perceptually meaningful elements (e.g., a voice or a melody). The perceptual segregation of sounds is important, for example, for the understanding of speech in noisy scenarios, a particularly challenging task for listeners with a cochlear implant (CI). It has been suggested that some aspects of stream segregation may be explained by relatively basic neural mechanisms at a cortical level. During the past decades, a variety of models have been proposed to account for the data from stream segregation experiments in normal-hearing (NH) listeners. However, little attention has been given to corresponding findings in CI listeners. The present study investigated whether a neural model of sequential stream segregation, proposed to describe the behavioral effects observed in NH listeners, can account for behavioral data from CI listeners. The model operates on the stimulus features at the cortical level and includes a competition stage between the neuronal units encoding the different percepts. The competition arises from a combination of mutual inhibition, adaptation, and additive noise. The model was found to capture the main trends in the behavioral data from CI listeners, such as the larger probability of a segregated percept with increasing the feature difference between the sounds as well as the build-up effect. Importantly, this was achieved without any modification to the model's competition stage, suggesting that stream segregation could be mediated by a similar mechanism in both groups of listeners.
Collapse
Affiliation(s)
- Andreu Paredes-Gallardo
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Torsten Dau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Jeremy Marozeau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
23
|
Guest DR, Oxenham AJ. The role of pitch and harmonic cancellation when listening to speech in harmonic background sounds. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3011. [PMID: 31153349 PMCID: PMC6529328 DOI: 10.1121/1.5102169] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 04/16/2019] [Accepted: 04/19/2019] [Indexed: 05/29/2023]
Abstract
Fundamental frequency differences (ΔF0) between competing talkers aid in the perceptual segregation of the talkers (ΔF0 benefit), but the underlying mechanisms remain incompletely understood. A model of ΔF0 benefit based on harmonic cancellation proposes that a masker's periodicity can be used to cancel (i.e., filter out) its neural representation. Earlier work suggested that an octave ΔF0 provided little benefit, an effect predicted by harmonic cancellation due to the shared periodicity of masker and target. Alternatively, this effect can be explained by spectral overlap between the harmonic components of the target and masker. To assess these competing explanations, speech intelligibility of a monotonized target talker, masked by a speech-shaped harmonic complex tone, was measured as a function of ΔF0, masker spectrum (all harmonics or odd harmonics only), and masker temporal envelope (amplitude modulated or unmodulated). Removal of the masker's even harmonics when the target was one octave above the masker improved speech reception thresholds by about 5 dB. Because this manipulation eliminated spectral overlap between target and masker components but preserved shared periodicity, the finding is consistent with the explanation for the lack of ΔF0 benefit at the octave based on spectral overlap, but not with the explanation based on harmonic cancellation.
Collapse
|
24
|
Shamma S, Dutta K. Spectro-temporal templates unify the pitch percepts of resolved and unresolved harmonics. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:615. [PMID: 30823787 PMCID: PMC6910008 DOI: 10.1121/1.5088504] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2018] [Revised: 12/07/2018] [Accepted: 01/09/2019] [Indexed: 06/09/2023]
Abstract
Pitch is a fundamental attribute in auditory perception involved in source identification and segregation, music, and speech understanding. Pitch percepts are intimately related to harmonic resolvability of sound. When harmonics are well-resolved, the induced pitch is usually salient and precise, and several models relying on autocorrelations or harmonic spectral templates can account for these percepts. However, when harmonics are not completely resolved, the pitch percept becomes less salient, poorly discriminated, with upper range limited to a few hundred hertz, and spectral templates fail to convey percept since only temporal cues are available. Here, a biologically-motivated model is presented that combines spectral and temporal cues to account for both percepts. The model explains how temporal analysis to estimate the pitch of the unresolved harmonics is performed by bandpass filters implemented by resonances in dendritic trees of neurons in the early auditory pathway. It is demonstrated that organizing and exploiting such dendritic tuning can occur spontaneously in response to white noise. This paper then shows how temporal cues of unresolved harmonics may be integrated with spectrally resolved harmonics, creating spectro-temporal harmonic templates for all pitch percepts. Finally, the model extends its account of monaural pitch percepts to pitches evoked by dichotic binaural stimuli.
Collapse
Affiliation(s)
- Shihab Shamma
- Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Kelsey Dutta
- Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
25
|
Stuckenberg MV, Nayak CV, Meyer BT, Völker C, Hohmann V, Bendixen A. Age Effects on Concurrent Speech Segregation by Onset Asynchrony. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:177-189. [PMID: 30534994 DOI: 10.1044/2018_jslhr-h-18-0064] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Purpose For elderly listeners, it is more challenging to listen to 1 voice surrounded by other voices than for young listeners. This could be caused by a reduced ability to use acoustic cues-such as slight differences in onset time-for the segregation of concurrent speech signals. Here, we study whether the ability to benefit from onset asynchrony differs between young (18-33 years) and elderly (55-74 years) listeners. Method We investigated young (normal hearing, N = 20) and elderly (mildly hearing impaired, N = 26) listeners' ability to segregate 2 vowels with onset asynchronies ranging from 20 to 100 ms. Behavioral measures were complemented by a specific event-related brain potential component, the object-related negativity, indicating the perception of 2 distinct auditory objects. Results Elderly listeners' behavioral performance (identification accuracy of the 2 vowels) was considerably poorer than young listeners'. However, both age groups showed the same amount of improvement with increasing onset asynchrony. Object-related negativity amplitude also increased similarly in both age groups. Conclusion Both age groups benefit to a similar extent from onset asynchrony as a cue for concurrent speech segregation during active (behavioral measurement) and during passive (electroencephalographic measurement) listening.
Collapse
Affiliation(s)
- Maria V Stuckenberg
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
- Department of Psychology, University of Leipzig, Germany
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chaitra V Nayak
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
| | - Bernd T Meyer
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
| | - Christoph Völker
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
| | - Volker Hohmann
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
| | - Alexandra Bendixen
- Cluster of Excellence "Hearing4all," Carl von Ossietzky University of Oldenburg, Germany
- Faculty of Natural Sciences, Chemnitz University of Technology, Germany
| |
Collapse
|
26
|
Auditory Figure-Ground Segregation Is Impaired by High Visual Load. J Neurosci 2018; 39:1699-1708. [PMID: 30541915 PMCID: PMC6391559 DOI: 10.1523/jneurosci.2518-18.2018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/19/2018] [Accepted: 11/19/2018] [Indexed: 11/21/2022] Open
Abstract
Figure-ground segregation is fundamental to listening in complex acoustic environments. An ongoing debate pertains to whether segregation requires attention or is "automatic" and preattentive. In this magnetoencephalography study, we tested a prediction derived from load theory of attention (e.g., Lavie, 1995) that segregation requires attention but can benefit from the automatic allocation of any "leftover" capacity under low load. Complex auditory scenes were modeled with stochastic figure-ground stimuli (Teki et al., 2013), which occasionally contained repeated frequency component "figures." Naive human participants (both sexes) passively listened to these signals while performing a visual attention task of either low or high load. While clear figure-related neural responses were observed under conditions of low load, high visual load substantially reduced the neural response to the figure in auditory cortex (planum temporale, Heschl's gyrus). We conclude that fundamental figure-ground segregation in hearing is not automatic but draws on resources that are shared across vision and audition.SIGNIFICANCE STATEMENT This work resolves a long-standing question of whether figure-ground segregation, a fundamental process of auditory scene analysis, requires attention or is underpinned by automatic, encapsulated computations. Task-irrelevant sounds were presented during performance of a visual search task. We revealed a clear magnetoencephalography neural signature of figure-ground segregation in conditions of low visual load, which was substantially reduced in conditions of high visual load. This demonstrates that, although attention does not need to be actively allocated to sound for auditory segregation to occur, segregation depends on shared computational resources across vision and hearing. The findings further highlight that visual load can impair the computational capacity of the auditory system, even when it does not simply dampen auditory responses as a whole.
Collapse
|
27
|
Easwar V, Banyard A, Aiken SJ, Purcell DW. Phase‐locked responses to the vowel envelope vary in scalp‐recorded amplitude due to across‐frequency response interactions. Eur J Neurosci 2018; 48:3126-3145. [DOI: 10.1111/ejn.14161] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 08/22/2018] [Accepted: 08/28/2018] [Indexed: 01/26/2023]
Affiliation(s)
- Vijayalakshmi Easwar
- Communication Sciences & Disorders and Waisman CenterUniversity of Wisconsin Madison Wisconsin
- National Center for AudiologyWestern University London Ontario Canada
| | - Ashlee Banyard
- Communication Sciences and DisordersWestern University London Ontario Canada
| | - Steven J. Aiken
- School of Human Communication DisordersDalhousie University Halifax Nova Scotia Canada
| | - David W. Purcell
- National Center for AudiologyWestern University London Ontario Canada
- Communication Sciences and DisordersWestern University London Ontario Canada
| |
Collapse
|
28
|
Thomassen S, Bendixen A. Assessing the background decomposition of a complex auditory scene with event-related brain potentials. Hear Res 2018; 370:120-129. [PMID: 30368055 DOI: 10.1016/j.heares.2018.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Revised: 09/17/2018] [Accepted: 09/30/2018] [Indexed: 11/26/2022]
Abstract
A listener who focusses on a sound source of interest must continuously integrate the sounds emitted by the attended source and ignore the sounds emitted by the remaining sources in the auditory scene. Little is known about how the ignored sound sources in the background are mentally represented after the source of interest has formed the perceptual foreground. This is due to a key methodological challenge: the background representation is by definition not overtly reportable. Here we developed a paradigm based on event-related brain potentials (ERPs) to assess the mental representation of background sounds. Participants listened to sequences of three repeatedly presented tones arranged in an ascending order (low, middle, high frequency). They were instructed to detect intensity deviants in one of the tones, creating the perceptual foreground. The remaining two background tones contained timing and location deviants. Those deviants were set up such that mismatch negativity (MMN) components would be elicited in distinct ways if the background was decomposed into two separate sound streams (background segregation) or if it was not further decomposed (background integration). Results provide MMN-based evidence for background segregation and integration in parallel. This suggests that mental representations of background integration and segregation can be concurrently available, and that collecting empirical evidence for only one of these background organization alternatives might lead to erroneous conclusions.
Collapse
Affiliation(s)
- Sabine Thomassen
- Institute of Physics, School of Natural Sciences, Chemnitz University of Technology, Reichenhainer Str. 70, D-09126, Chemnitz, Germany; Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstr. 114-118, D-26129, Oldenburg, Germany.
| | - Alexandra Bendixen
- Institute of Physics, School of Natural Sciences, Chemnitz University of Technology, Reichenhainer Str. 70, D-09126, Chemnitz, Germany; Institute of Psychology, University of Leipzig, Neumarkt 9-19, D-04109, Leipzig, Germany.
| |
Collapse
|
29
|
Kanjlia S, Feigenson L, Bedny M. Numerical cognition is resilient to dramatic changes in early sensory experience. Cognition 2018; 179:111-120. [PMID: 29935427 PMCID: PMC6701182 DOI: 10.1016/j.cognition.2018.06.004] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Revised: 06/01/2018] [Accepted: 06/05/2018] [Indexed: 01/29/2023]
Abstract
Humans and non-human animals can approximate large visual quantities without counting. The approximate number representations underlying this ability are noisy, with the amount of noise proportional to the quantity being represented. Numerate humans also have access to a separate system for representing exact quantities using number symbols and words; it is this second, exact system that supports most of formal mathematics. Although numerical approximation abilities and symbolic number abilities are distinct in representational format and in their phylogenetic and ontogenetic histories, they appear to be linked throughout development--individuals who can more precisely discriminate quantities without counting are better at math. The origins of this relationship are debated. On the one hand, symbolic number abilities may be directly linked to, perhaps even rooted in, numerical approximation abilities. On the other hand, the relationship between the two systems may simply reflect their independent relationships with visual abilities. To test this possibility, we asked whether approximate number and symbolic math abilities are linked in congenitally blind individuals who have never experienced visual sets or used visual strategies to learn math. Congenitally blind and blind-folded sighted participants completed an auditory numerical approximation task, as well as a symbolic arithmetic task and non-math control tasks. We found that the precision of approximate number representations was identical across congenitally blind and sighted groups, suggesting that the development of the Approximate Number System (ANS) does not depend on visual experience. Crucially, the relationship between numerical approximation and symbolic math abilities is preserved in congenitally blind individuals. These data support the idea that the Approximate Number System and symbolic number abilities are intrinsically linked, rather than indirectly linked through visual abilities.
Collapse
Affiliation(s)
- Shipra Kanjlia
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States.
| | - Lisa Feigenson
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States
| | - Marina Bedny
- Department of Psychological and Brain Sciences, Johns Hopkins University, United States
| |
Collapse
|
30
|
Smith SS, Chintanpalli A, Heinz MG, Sumner CJ. Revisiting Models of Concurrent Vowel Identification: The Critical Case of No Pitch Differences. ACTA ACUST UNITED AC 2018; 104:922-925. [PMID: 30369861 DOI: 10.3813/aaa.919244] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
When presented with two vowels simultaneously, humans are often able to identify the constituent vowels. Computational models exist that simulate this ability, however they predict listener confusions poorly, particularly in the case where the two vowels have the same fundamental frequency. Presented here is a model that is uniquely able to predict the combined representation of concurrent vowels. The given model is able to predict listener's systematic perceptual decisions to a high degree of accuracy.
Collapse
Affiliation(s)
- Samuel S Smith
- Medical Research Council Institute of Hearing Research, University of Nottingham, NG7 2RD, UK.
| | - Ananthakrishna Chintanpalli
- Department of Electrical and Electronics Engineering, Birla Institute of Technology & Science, Pilani-333 031, Rajasthan, India
| | - Michael G Heinz
- Department of Speech, Language and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907-2028, USA
| | - Christian J Sumner
- Medical Research Council Institute of Hearing Research, University of Nottingham, NG7 2RD, UK.
| |
Collapse
|
31
|
Felix RA, Gourévitch B, Portfors CV. Subcortical pathways: Towards a better understanding of auditory disorders. Hear Res 2018; 362:48-60. [PMID: 29395615 PMCID: PMC5911198 DOI: 10.1016/j.heares.2018.01.008] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 12/11/2017] [Accepted: 01/16/2018] [Indexed: 01/13/2023]
Abstract
Hearing loss is a significant problem that affects at least 15% of the population. This percentage, however, is likely significantly higher because of a variety of auditory disorders that are not identifiable through traditional tests of peripheral hearing ability. In these disorders, individuals have difficulty understanding speech, particularly in noisy environments, even though the sounds are loud enough to hear. The underlying mechanisms leading to such deficits are not well understood. To enable the development of suitable treatments to alleviate or prevent such disorders, the affected processing pathways must be identified. Historically, mechanisms underlying speech processing have been thought to be a property of the auditory cortex and thus the study of auditory disorders has largely focused on cortical impairments and/or cognitive processes. As we review here, however, there is strong evidence to suggest that, in fact, deficits in subcortical pathways play a significant role in auditory disorders. In this review, we highlight the role of the auditory brainstem and midbrain in processing complex sounds and discuss how deficits in these regions may contribute to auditory dysfunction. We discuss current research with animal models of human hearing and then consider human studies that implicate impairments in subcortical processing that may contribute to auditory disorders.
Collapse
Affiliation(s)
- Richard A Felix
- School of Biological Sciences and Integrative Physiology and Neuroscience, Washington State University, Vancouver, WA, USA
| | - Boris Gourévitch
- Unité de Génétique et Physiologie de l'Audition, UMRS 1120 INSERM, Institut Pasteur, Université Pierre et Marie Curie, F-75015, Paris, France; CNRS, France
| | - Christine V Portfors
- School of Biological Sciences and Integrative Physiology and Neuroscience, Washington State University, Vancouver, WA, USA.
| |
Collapse
|
32
|
Eipert L, Klinge-Strahl A, Klump GM. Processing of interaural phase differences in components of harmonic and mistuned complexes in the inferior colliculus of the Mongolian gerbil. Eur J Neurosci 2018; 47:1242-1251. [PMID: 29603825 DOI: 10.1111/ejn.13922] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 02/19/2018] [Accepted: 03/22/2018] [Indexed: 11/30/2022]
Abstract
Harmonicity and spatial location provide eminent cues for the perceptual grouping of sounds. In general, harmonicity is a strong grouping cue. In contrast, spatial cues such as interaural phase or time difference provide for strong grouping of stimulus sequences but weak grouping for simultaneously presented sounds. By studying the neuronal basis underlying the interaction of these cues in processing simultaneous sounds using van Rossum spike train distance measures, we aim at explaining the interaction observed in psychophysical experiments. Responses to interaural phase differences imposed on single components of harmonic and mistuned complex tones as well as noise delay functions were recorded as multiunit responses from the inferior colliculus of Mongolian gerbils. Results revealed a better representation of interaural phase differences if imposed on a harmonic rather than a mistuned frequency component of a complex tone. The representation of interaural phase differences was better for long integration-time windows approximately reflecting firing rates rather than short integration-time windows reflecting the temporal pattern of the stimulus-driven response. We found only a weak impact of interaural phase differences if combined with mistuning of a component in a harmonic tone complex.
Collapse
Affiliation(s)
- Lena Eipert
- Animal Physiology and Behavior Group, Department for Neuroscience, School for Medicine and Health Sciences, Carl-von-Ossietzky University Oldenburg, 26111, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Carl-von-Ossietzky University Oldenburg, 26111, Oldenburg, Germany
| | - Astrid Klinge-Strahl
- Animal Physiology and Behavior Group, Department for Neuroscience, School for Medicine and Health Sciences, Carl-von-Ossietzky University Oldenburg, 26111, Oldenburg, Germany
| | - Georg M Klump
- Animal Physiology and Behavior Group, Department for Neuroscience, School for Medicine and Health Sciences, Carl-von-Ossietzky University Oldenburg, 26111, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Carl-von-Ossietzky University Oldenburg, 26111, Oldenburg, Germany
| |
Collapse
|
33
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
34
|
Paredes-Gallardo A, Madsen SMK, Dau T, Marozeau J. The Role of Place Cues in Voluntary Stream Segregation for Cochlear Implant Users. Trends Hear 2018; 22:2331216517750262. [PMID: 29347886 PMCID: PMC5777547 DOI: 10.1177/2331216517750262] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 11/28/2017] [Indexed: 11/15/2022] Open
Abstract
Sequential stream segregation by cochlear implant (CI) listeners was investigated using a temporal delay detection task composed of a sequence of regularly presented bursts of pulses on a single electrode (B) interleaved with an irregular sequence (A) presented on a different electrode. In half of the trials, a delay was added to the last burst of the regular B sequence, and the listeners were asked to detect this delay. As a jitter was added to the period between consecutive A bursts, time judgments between the A and B sequences provided an unreliable cue to perform the task. Thus, the segregation of the A and B sequences should improve performance. In Experiment 1, the electrode separation and the sequence duration were varied to clarify whether place cues help CI listeners to voluntarily segregate sounds and whether a two-stream percept needs time to build up. Results suggested that place cues can facilitate the segregation of sequential sounds if enough time is provided to build up a two-stream percept. In Experiment 2, the duration of the sequence was fixed, and only the electrode separation was varied to estimate the fission boundary. Most listeners were able to segregate the sounds for separations of three or more electrodes, and some listeners could segregate sounds coming from adjacent electrodes.
Collapse
Affiliation(s)
- Andreu Paredes-Gallardo
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| | - Sara M. K. Madsen
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| | - Jeremy Marozeau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| |
Collapse
|
35
|
Snyder JS, Elhilali M. Recent advances in exploring the neural underpinnings of auditory scene perception. Ann N Y Acad Sci 2017; 1396:39-55. [PMID: 28199022 PMCID: PMC5446279 DOI: 10.1111/nyas.13317] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Revised: 12/21/2016] [Accepted: 01/08/2017] [Indexed: 11/29/2022]
Abstract
Studies of auditory scene analysis have traditionally relied on paradigms using artificial sounds-and conventional behavioral techniques-to elucidate how we perceptually segregate auditory objects or streams from each other. In the past few decades, however, there has been growing interest in uncovering the neural underpinnings of auditory segregation using human and animal neuroscience techniques, as well as computational modeling. This largely reflects the growth in the fields of cognitive neuroscience and computational neuroscience and has led to new theories of how the auditory system segregates sounds in complex arrays. The current review focuses on neural and computational studies of auditory scene perception published in the last few years. Following the progress that has been made in these studies, we describe (1) theoretical advances in our understanding of the most well-studied aspects of auditory scene perception, namely segregation of sequential patterns of sounds and concurrently presented sounds; (2) the diversification of topics and paradigms that have been investigated; and (3) how new neuroscience techniques (including invasive neurophysiology in awake humans, genotyping, and brain stimulation) have been used in this field.
Collapse
Affiliation(s)
- Joel S. Snyder
- Department of Psychology, University of Nevada, Las Vegas, Las Vegas, Nevada
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland
| |
Collapse
|
36
|
Finton CJ, Keesom SM, Hood KE, Hurley LM. What's in a squeak? Female vocal signals predict the sexual behaviour of male house mice during courtship. Anim Behav 2017. [DOI: 10.1016/j.anbehav.2017.01.021] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
37
|
Informational masking and the effects of differences in fundamental frequency and fundamental-frequency contour on phonetic integration in a formant ensemble. Hear Res 2017; 344:295-303. [DOI: 10.1016/j.heares.2016.10.026] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
38
|
Summers RJ, Bailey PJ, Roberts B. WITHDRAWN: Informational masking and the effects of differences in fundamental frequency and fundamental-frequency contour on phonetic integration in a formant ensemble. Hear Res 2017:S0378-5955(16)30380-X. [PMID: 28110077 DOI: 10.1016/j.heares.2016.10.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 10/17/2016] [Accepted: 10/21/2016] [Indexed: 10/20/2022]
Affiliation(s)
- Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK
| | - Peter J Bailey
- Department of Psychology, University of York, Heslington, York YO10 5DD, UK
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, UK.
| |
Collapse
|
39
|
Thomassen S, Bendixen A. Subjective perceptual organization of a complex auditory scene. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:265. [PMID: 28147594 DOI: 10.1121/1.4973806] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Empirical research on the sequential decomposition of an auditory scene primarily relies on interleaved sound mixtures of only two tone sequences (e.g., ABAB…). This oversimplifies the sound decomposition problem by limiting the number of putative perceptual organizations. The current study used a sound mixture composed of three different tones (ABCABC…) that could be perceptually organized in many different ways. Participants listened to these sequences and reported their subjective perception by continuously choosing one out of 12 visually presented perceptual organization alternatives. Different levels of frequency and spatial separation were implemented to check whether participants' perceptual reports would be systematic and plausible. As hypothesized, while perception switched back and forth in each condition between various perceptual alternatives (multistability), spatial as well as frequency separation generally raised the proportion of segregated and reduced the proportion of integrated alternatives. During segregated percepts, in contrast to the hypothesis, many participants had a tendency to perceive two streams in the foreground, rather than reporting alternatives with a clear foreground-background differentiation. Finally, participants perceived the organization with intermediate feature values (e.g., middle tones of the pattern) segregated in the foreground slightly less often than similar alternatives with outer feature values (e.g., higher tones).
Collapse
Affiliation(s)
- Sabine Thomassen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| | - Alexandra Bendixen
- Auditory Psychophysiology Lab, Department of Psychology, Carl von Ossietzky University of Oldenburg, Ammerländer Heerstrasse 114-118, D-26129 Oldenburg, Germany
| |
Collapse
|
40
|
Chintanpalli A, Ahlstrom JB, Dubno JR. Effects of age and hearing loss on concurrent vowel identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:4142. [PMID: 28040038 PMCID: PMC5848863 DOI: 10.1121/1.4968781] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2016] [Revised: 11/09/2016] [Accepted: 11/11/2016] [Indexed: 06/06/2023]
Abstract
Differences in formant frequencies and fundamental frequencies (F0) are important cues for segregating and identifying two simultaneous vowels. This study assessed age- and hearing-loss-related changes in the use of these cues for recognition of one or both vowels in a pair and determined differences related to vowel identity and specific vowel pairings. Younger adults with normal hearing, older adults with normal hearing, and older adults with hearing loss listened to different-vowel and identical-vowel pairs that varied in F0 differences. Identification of both vowels as a function of F0 difference revealed that increased age affects the use of F0 and formant difference cues for different-vowel pairs. Hearing loss further reduced the use of these cues, which was not attributable to lower vowel sensation levels. High scores for one vowel in the pair and no effect of F0 differences suggested that F0 cues are important only for identifying both vowels. In contrast to mean scores, widely varying differences in effects of F0 cues, age, and hearing loss were observed for particular vowels and vowel pairings. These variations in identification of vowel pairs were not explained by acoustical models based on the location and level of formants within the two vowels.
Collapse
Affiliation(s)
- Ananthakrishna Chintanpalli
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Jayne B Ahlstrom
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| | - Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, South Carolina 29425-5500, USA
| |
Collapse
|
41
|
Shen Y. The effect of frequency cueing on the perceptual segregation of simultaneous tones: Bottom-up and top-down contributions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3496. [PMID: 27908095 PMCID: PMC5848834 DOI: 10.1121/1.4965969] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 09/30/2016] [Accepted: 10/08/2016] [Indexed: 06/06/2023]
Abstract
Listeners were presented with two simultaneous tones of different frequencies (more than one octave apart) and asked to identify the tone that was amplitude-modulated while a tonal precursor was presented to cue the frequency of the lower frequency tone. Performance thresholds were estimated based on the duration of the tone-pair. In Exp. I the duration of the precursor varied from 100 to 400 ms and the inter-stimulus interval (ISI) between the precursor and the tone-pair varied from 0 to 1 s. The presence of the precursor facilitated segregation. As the ISI increased, the facilitation effect of the precursor increased for the precursor durations of 100 and 200 ms, but not for the 400-ms precursor duration. When the precursor was presented to the contralateral ear relative to the tone-pair in Exp. II, no significant change to the precursor effect was observed. These observations contradict the predictions of the model based solely on bottom-up processing, suggesting the likely involvement of top-down processes.
Collapse
Affiliation(s)
- Yi Shen
- Department of Speech and Hearing Sciences, Indiana University Bloomington, Bloomington, Indiana 47405, USA
| |
Collapse
|
42
|
Nambi PMA, Mahajan Y, Francis N, Bhat JS. Temporal fine structure mediated recognition of speech in the presence of multitalker babble. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL296. [PMID: 27794309 DOI: 10.1121/1.4964416] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
This experiment investigated the mechanisms of temporal fine structure (TFS) mediated speech recognition in multi-talker babble. The signal-to-noise ratio 50 (SNR-50) for naive-listeners was measured when the TFS was retained in its original form (ORIG-TFS), the TFS was time reversed (REV-TFS), and the TFS was replaced by noise (NO-TFS). The original envelope was unchanged. In the REV-TFS condition, periodicity cues for stream segregation were preserved, but envelope recovery was compromised. Both the mechanisms were compromised in the NO-TFS condition. The SNR-50 was lowest for ORIG-TFS followed by REV-TFS, which was lower than NO-TFS. Results suggest both stream segregation and envelope recovery aided TFS mediated speech recognition.
Collapse
Affiliation(s)
- Pitchai Muthu Arivudai Nambi
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| | - Yatin Mahajan
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney Australia , , , and
| | - Nikita Francis
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| | - Jayashree S Bhat
- Department of Audiology and Speech Language Pathology, Kasturba Medical College (Manipal University), Mangalore, India
| |
Collapse
|
43
|
Neural Representation of Concurrent Vowels in Macaque Primary Auditory Cortex. eNeuro 2016; 3:eN-NWR-0071-16. [PMID: 27294198 PMCID: PMC4901243 DOI: 10.1523/eneuro.0071-16.2016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 04/15/2016] [Indexed: 11/30/2022] Open
Abstract
Successful speech perception in real-world environments requires that the auditory system segregate competing voices that overlap in frequency and time into separate streams. Vowels are major constituents of speech and are comprised of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). The pitch and identity of a vowel are determined by its F0 and spectral envelope (formant structure), respectively. When two spectrally overlapping vowels differing in F0 are presented concurrently, they can be readily perceived as two separate “auditory objects” with pitches at their respective F0s. A difference in pitch between two simultaneous vowels provides a powerful cue for their segregation, which in turn, facilitates their individual identification. The neural mechanisms underlying the segregation of concurrent vowels based on pitch differences are poorly understood. Here, we examine neural population responses in macaque primary auditory cortex (A1) to single and double concurrent vowels (/a/ and /i/) that differ in F0 such that they are heard as two separate auditory objects with distinct pitches. We find that neural population responses in A1 can resolve, via a rate-place code, lower harmonics of both single and double concurrent vowels. Furthermore, we show that the formant structures, and hence the identities, of single vowels can be reliably recovered from the neural representation of double concurrent vowels. We conclude that A1 contains sufficient spectral information to enable concurrent vowel segregation and identification by downstream cortical areas.
Collapse
|
44
|
Bonnard D, Dauman R, Semal C, Demany L. Harmonic fusion and pitch affinity: Is there a direct link? Hear Res 2016; 333:247-254. [DOI: 10.1016/j.heares.2015.08.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 08/19/2015] [Accepted: 08/27/2015] [Indexed: 10/23/2022]
|
45
|
Winkler I, Schröger E. Auditory perceptual objects as generative models: Setting the stage for communication by sound. BRAIN AND LANGUAGE 2015; 148:1-22. [PMID: 26184883 DOI: 10.1016/j.bandl.2015.05.003] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Revised: 03/03/2015] [Accepted: 05/03/2015] [Indexed: 06/04/2023]
Abstract
Communication by sounds requires that the communication channels (i.e. speech/speakers and other sound sources) had been established. This allows to separate concurrently active sound sources, to track their identity, to assess the type of message arriving from them, and to decide whether and when to react (e.g., reply to the message). We propose that these functions rely on a common generative model of the auditory environment. This model predicts upcoming sounds on the basis of representations describing temporal/sequential regularities. Predictions help to identify the continuation of the previously discovered sound sources to detect the emergence of new sources as well as changes in the behavior of the known ones. It produces auditory event representations which provide a full sensory description of the sounds, including their relation to the auditory context and the current goals of the organism. Event representations can be consciously perceived and serve as objects in various cognitive operations.
Collapse
Affiliation(s)
- István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Hungary; Institute of Psychology, University of Szeged, Hungary.
| | - Erich Schröger
- Institute for Psychology, University of Leipzig, Germany.
| |
Collapse
|
46
|
Woods KJP, McDermott JH. Attentive Tracking of Sound Sources. Curr Biol 2015; 25:2238-46. [PMID: 26279234 DOI: 10.1016/j.cub.2015.07.043] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2015] [Revised: 06/17/2015] [Accepted: 07/15/2015] [Indexed: 10/23/2022]
Abstract
Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected based on particular values of their features (e.g., high pitch). Here we show that human listeners can circumvent this challenge by tracking sounds with a movable focus of attention. We synthesized pairs of voices that changed in pitch and timbre over random, intertwined trajectories, lacking distinguishing features or linguistic information. Listeners were cued beforehand to attend to one of the voices. We measured their ability to extract this cued voice from the mixture by subsequently presenting the ending portion of one voice and asking whether it came from the cued voice. We found that listeners could perform this task but that performance was mediated by attention-listeners who performed best were also more sensitive to perturbations in the cued voice than in the uncued voice. Moreover, the task was impossible if the source trajectories did not maintain sufficient separation in feature space. The results suggest a locus of attention that can follow a sound's trajectory through a feature space, likely aiding selection and segregation amid similar distractors.
Collapse
Affiliation(s)
- Kevin J P Woods
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, USA
| |
Collapse
|
47
|
Nie Y, Nelson PB. Auditory stream segregation using amplitude modulated bandpass noise. Front Psychol 2015; 6:1151. [PMID: 26300831 PMCID: PMC4528102 DOI: 10.3389/fpsyg.2015.01151] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2015] [Accepted: 07/23/2015] [Indexed: 12/23/2022] Open
Abstract
The purpose of this study was to investigate the roles of spectral overlap and amplitude modulation (AM) rate for stream segregation for noise signals, as well as to test the build-up effect based on these two cues. Segregation ability was evaluated using an objective paradigm with listeners' attention focused on stream segregation. Stimulus sequences consisted of two interleaved sets of bandpass noise bursts (A and B bursts). The A and B bursts differed in spectrum, AM-rate, or both. The amount of the difference between the two sets of noise bursts was varied. Long and short sequences were studied to investigate the build-up effect for segregation based on spectral and AM-rate differences. Results showed the following: (1). Stream segregation ability increased with greater spectral separation. (2). Larger AM-rate separations were associated with stronger segregation abilities. (3). Spectral separation was found to elicit the build-up effect for the range of spectral differences assessed in the current study. (4). AM-rate separation interacted with spectral separation suggesting an additive effect of spectral separation and AM-rate separation on segregation build-up. The findings suggest that, when normal-hearing listeners direct their attention towards segregation, they are able to segregate auditory streams based on reduced spectral contrast cues that vary by the amount of spectral overlap. Further, regardless of the spectral separation they are able to use AM-rate difference as a secondary/weaker cue. Based on the spectral differences, listeners can segregate auditory streams better as the listening duration is prolonged—i.e., sparse spectral cues elicit build-up segregation; however, AM-rate differences only appear to elicit build-up when in combination with spectral difference cues.
Collapse
Affiliation(s)
- Yingjiu Nie
- Department of Communication Sciences and Disorders, James Madison University Harrisonburg, VA, USA
| | - Peggy B Nelson
- Department of Speech-Language-Hearing Sciences, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
48
|
Trainor LJ. The origins of music in auditory scene analysis and the roles of evolution and culture in musical creation. Philos Trans R Soc Lond B Biol Sci 2015; 370:20140089. [PMID: 25646512 PMCID: PMC4321130 DOI: 10.1098/rstb.2014.0089] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Whether music was an evolutionary adaptation that conferred survival advantages or a cultural creation has generated much debate. Consistent with an evolutionary hypothesis, music is unique to humans, emerges early in development and is universal across societies. However, the adaptive benefit of music is far from obvious. Music is highly flexible, generative and changes rapidly over time, consistent with a cultural creation hypothesis. In this paper, it is proposed that much of musical pitch and timing structure adapted to preexisting features of auditory processing that evolved for auditory scene analysis (ASA). Thus, music may have emerged initially as a cultural creation made possible by preexisting adaptations for ASA. However, some aspects of music, such as its emotional and social power, may have subsequently proved beneficial for survival and led to adaptations that enhanced musical behaviour. Ontogenetic and phylogenetic evidence is considered in this regard. In particular, enhanced auditory-motor pathways in humans that enable movement entrainment to music and consequent increases in social cohesion, and pathways enabling music to affect reward centres in the brain should be investigated as possible musical adaptations. It is concluded that the origins of music are complex and probably involved exaptation, cultural creation and evolutionary adaptation.
Collapse
Affiliation(s)
- Laurel J Trainor
- Department of Psychology, Neuroscience and Behaviour, McMaster University, Hamilton, Ontario, Canada McMaster Institute for Music and the Mind, McMaster University, Hamilton, Ontario, Canada Rotman Research Institute, Baycrest Hospital, Toronto, Ontario, Canada
| |
Collapse
|
49
|
Sayles M, Stasiak A, Winter IM. Reverberation impairs brainstem temporal representations of voiced vowel sounds: challenging "periodicity-tagged" segregation of competing speech in rooms. Front Syst Neurosci 2015; 8:248. [PMID: 25628545 PMCID: PMC4290552 DOI: 10.3389/fnsys.2014.00248] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 12/18/2014] [Indexed: 11/26/2022] Open
Abstract
The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into “auditory objects.” Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions.
Collapse
Affiliation(s)
- Mark Sayles
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Arkadiusz Stasiak
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| | - Ian M Winter
- Centre for the Neural Basis of Hearing, The Physiological Laboratory, Department of Physiology, Development and Neuroscience, University of Cambridge Cambridge, UK
| |
Collapse
|
50
|
Krishnan L, Elhilali M, Shamma S. Segregating complex sound sources through temporal coherence. PLoS Comput Biol 2014; 10:e1003985. [PMID: 25521593 PMCID: PMC4270434 DOI: 10.1371/journal.pcbi.1003985] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2014] [Accepted: 10/14/2014] [Indexed: 11/18/2022] Open
Abstract
A new approach for the segregation of monaural sound mixtures is presented based on the principle of temporal coherence and using auditory cortical representations. Temporal coherence is the notion that perceived sources emit coherently modulated features that evoke highly-coincident neural response patterns. By clustering the feature channels with coincident responses and reconstructing their input, one may segregate the underlying source from the simultaneously interfering signals that are uncorrelated with it. The proposed algorithm requires no prior information or training on the sources. It can, however, gracefully incorporate cognitive functions and influences such as memories of a target source or attention to a specific set of its attributes so as to segregate it from its background. Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of this ubiquitous and remarkable perceptual ability, and of its psychophysical manifestations in navigating complex sensory environments. Humans and many animals can effortlessly navigate complex sensory environments, segregating and attending to one desired target source while suppressing distracting and interfering others. In this paper, we present an algorithmic model that can accomplish this task with no prior information or training on complex signals such as speech mixtures, and speech in noise and music. The model accounts for this ability relying solely on the temporal coherence principle, the notion that perceived sources emit coherently modulated features that evoke coincident cortical response patterns. It further demonstrates how basic cortical mechanisms common to all sensory systems can implement the necessary representations, as well as the adaptive computations necessary to maintain continuity by tracking slowly changing characteristics of different sources in a scene.
Collapse
Affiliation(s)
- Lakshmi Krishnan
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, The Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Shihab Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department Etudes Cognitive, Ecole Normale Superieure, Paris, France
| |
Collapse
|