1
|
Kojima S, Kanoh S. An auditory brain-computer interface based on selective attention to multiple tone streams. PLoS One 2024; 19:e0303565. [PMID: 38781127 PMCID: PMC11115270 DOI: 10.1371/journal.pone.0303565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 04/27/2024] [Indexed: 05/25/2024] Open
Abstract
In this study, we attempted to improve brain-computer interface (BCI) systems by means of auditory stream segregation in which alternately presented tones are perceived as sequences of various different tones (streams). A 3-class BCI using three tone sequences, which were perceived as three different tone streams, was investigated and evaluated. Each presented musical tone was generated by a software synthesizer. Eleven subjects took part in the experiment. Stimuli were presented to each user's right ear. Subjects were requested to attend to one of three streams and to count the number of target stimuli in the attended stream. In addition, 64-channel electroencephalogram (EEG) and two-channel electrooculogram (EOG) signals were recorded from participants with a sampling frequency of 1000 Hz. The measured EEG data were classified based on Riemannian geometry to detect the object of the subject's selective attention. P300 activity was elicited by the target stimuli in the segregated tone streams. In five out of eleven subjects, P300 activity was elicited only by the target stimuli included in the attended stream. In a 10-fold cross validation test, a classification accuracy over 80% for five subjects and over 75% for nine subjects was achieved. For subjects whose accuracy was lower than 75%, either the P300 was also elicited for nonattended streams or the amplitude of P300 was small. It was concluded that the number of selected BCI systems based on auditory stream segregation can be increased to three classes, and these classes can be detected by a single ear without the aid of any visual modality.
Collapse
Affiliation(s)
- Simon Kojima
- Graduate School of Engineering and Science, Shibaura Institute of Technology, Koto-ku, Tokyo, Japan
| | - Shin’ichiro Kanoh
- Graduate School of Engineering and Science, Shibaura Institute of Technology, Koto-ku, Tokyo, Japan
- College of Engineering, Shibaura Institute of Technology, Koto-ku, Tokyo, Japan
| |
Collapse
|
2
|
van der Willigen RF, Versnel H, van Opstal AJ. Spectral-temporal processing of naturalistic sounds in monkeys and humans. J Neurophysiol 2024; 131:38-63. [PMID: 37965933 DOI: 10.1152/jn.00129.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Revised: 10/23/2023] [Accepted: 11/13/2023] [Indexed: 11/16/2023] Open
Abstract
Human speech and vocalizations in animals are rich in joint spectrotemporal (S-T) modulations, wherein acoustic changes in both frequency and time are functionally related. In principle, the primate auditory system could process these complex dynamic sounds based on either an inseparable representation of S-T features or, alternatively, a separable representation. The separability hypothesis implies an independent processing of spectral and temporal modulations. We collected comparative data on the S-T hearing sensitivity in humans and macaque monkeys to a wide range of broadband dynamic spectrotemporal ripple stimuli employing a yes-no signal-detection task. Ripples were systematically varied, as a function of density (spectral modulation frequency), velocity (temporal modulation frequency), or modulation depth, to cover a listener's full S-T modulation sensitivity, derived from a total of 87 psychometric ripple detection curves. Audiograms were measured to control for normal hearing. Determined were hearing thresholds, reaction time distributions, and S-T modulation transfer functions (MTFs), both at the ripple detection thresholds and at suprathreshold modulation depths. Our psychophysically derived MTFs are consistent with the hypothesis that both monkeys and humans employ analogous perceptual strategies: S-T acoustic information is primarily processed separable. Singular value decomposition (SVD), however, revealed a small, but consistent, inseparable spectral-temporal interaction. Finally, SVD analysis of the known visual spatiotemporal contrast sensitivity function (CSF) highlights that human vision is space-time inseparable to a much larger extent than is the case for S-T sensitivity in hearing. Thus, the specificity with which the primate brain encodes natural sounds appears to be less strict than is required to adequately deal with natural images.NEW & NOTEWORTHY We provide comparative data on primate audition of naturalistic sounds comprising hearing thresholds, reaction time distributions, and spectral-temporal modulation transfer functions. Our psychophysical experiments demonstrate that auditory information is primarily processed in a spectral-temporal-independent manner by both monkeys and humans. Singular value decomposition of known visual spatiotemporal contrast sensitivity, in comparison to our auditory spectral-temporal sensitivity, revealed a striking contrast in how the brain encodes natural sounds as opposed to natural images, as vision appears to be space-time inseparable.
Collapse
Affiliation(s)
- Robert F van der Willigen
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- School of Communication, Media and Information Technology, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
- Research Center Creating 010, Rotterdam University of Applied Sciences, Rotterdam, The Netherlands
| | - Huib Versnel
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
- Department of Otorhinolaryngology and Head & Neck Surgery, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - A John van Opstal
- Section Neurophysics, Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
3
|
Xie Y, Ma J. How to discern external acoustic waves in a piezoelectric neuron under noise? J Biol Phys 2022; 48:339-353. [PMID: 35948818 PMCID: PMC9411441 DOI: 10.1007/s10867-022-09611-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/27/2022] [Indexed: 10/15/2022] Open
Abstract
Biological neurons keep sensitive to external stimuli and appropriate firing modes can be triggered to give effective response to external chemical and physical signals. A piezoelectric neural circuit can perceive external voice and nonlinear vibration by generating equivalent piezoelectric voltage, which can generate an equivalent trans-membrane current for inducing a variety of firing modes in the neural activities. Biological neurons can receive external stimuli from more ion channels and synapse synchronously, but the further encoding and priority in mode selection are competitive. In particular, noisy disturbance and electromagnetic radiation make it more difficult in signals identification and mode selection in the firing patterns of neurons driven by multi-channel signals. In this paper, two different periodic signals accompanied by noise are used to excite the piezoelectric neural circuit, and the signal processing in the piezoelectric neuron driven by acoustic waves under noise is reproduced and explained. The physical energy of the piezoelectric neural circuit and Hamilton energy in the neuron driven by mixed signals are calculated to explain the biophysical mechanism of auditory neuron when external stimuli are applied. It is found that the neuron prefers to respond to the external stimulus with higher physical energy and the signal which can increase the Hamilton energy of the neuron. For example, stronger inputs used to inject higher energy and it is detected and responded more sensitively. The involvement of noise is helpful to detect the external signal under stochastic resonance, and the additive noise changes the excitability of neuron as the external stimulus. The results indicate that energy controls the firing patterns and mode selection in neurons, and it provides clues to control the neural activities by injecting appropriate energy into the neurons and network.
Collapse
Affiliation(s)
- Ying Xie
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China
| | - Jun Ma
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China.
- School of Science, Chongqing University of Posts and Telecommunications, Chongqing, 430065, China.
| |
Collapse
|
4
|
Wang L, Wang Y, Liu Z, Wu EX, Chen F. A Speech-Level–Based Segmented Model to Decode the Dynamic Auditory Attention States in the Competing Speaker Scenes. Front Neurosci 2022; 15:760611. [PMID: 35221885 PMCID: PMC8866945 DOI: 10.3389/fnins.2021.760611] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 12/30/2021] [Indexed: 11/21/2022] Open
Abstract
In the competing speaker environments, human listeners need to focus or switch their auditory attention according to dynamic intentions. The reliable cortical tracking ability to the speech envelope is an effective feature for decoding the target speech from the neural signals. Moreover, previous studies revealed that the root mean square (RMS)–level–based speech segmentation made a great contribution to the target speech perception with the modulation of sustained auditory attention. This study further investigated the effect of the RMS-level–based speech segmentation on the auditory attention decoding (AAD) performance with both sustained and switched attention in the competing speaker auditory scenes. Objective biomarkers derived from the cortical activities were also developed to index the dynamic auditory attention states. In the current study, subjects were asked to concentrate or switch their attention between two competing speaker streams. The neural responses to the higher- and lower-RMS-level speech segments were analyzed via the linear temporal response function (TRF) before and after the attention switching from one to the other speaker stream. Furthermore, the AAD performance decoded by the unified TRF decoding model was compared to that by the speech-RMS-level–based segmented decoding model with the dynamic change of the auditory attention states. The results showed that the weight of the typical TRF component approximately 100-ms time lag was sensitive to the switching of the auditory attention. Compared to the unified AAD model, the segmented AAD model improved attention decoding performance under both the sustained and switched auditory attention modulations in a wide range of signal-to-masker ratios (SMRs). In the competing speaker scenes, the TRF weight and AAD accuracy could be used as effective indicators to detect the changes of the auditory attention. In addition, with a wide range of SMRs (i.e., from 6 to –6 dB in this study), the segmented AAD model showed the robust decoding performance even with short decision window length, suggesting that this speech-RMS-level–based model has the potential to decode dynamic attention states in the realistic auditory scenarios.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Yihan Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Zhixing Liu
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
| | - Ed X. Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam, Hong Kong SAR, China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, China
- *Correspondence: Fei Chen,
| |
Collapse
|
5
|
Luberadzka J, Kayser H, Hohmann V. Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:712. [PMID: 35232067 PMCID: PMC9088677 DOI: 10.1121/10.0009337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/13/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238-2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency (F0) and formants (F1,F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.
Collapse
Affiliation(s)
- Joanna Luberadzka
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Hendrik Kayser
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Volker Hohmann
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| |
Collapse
|
6
|
Cortical Processing of Binaural Cues as Shown by EEG Responses to Random-Chord Stereograms. J Assoc Res Otolaryngol 2021; 23:75-94. [PMID: 34904205 PMCID: PMC8783002 DOI: 10.1007/s10162-021-00820-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 10/06/2021] [Indexed: 10/26/2022] Open
Abstract
Spatial hearing facilitates the perceptual organization of complex soundscapes into accurate mental representations of sound sources in the environment. Yet, the role of binaural cues in auditory scene analysis (ASA) has received relatively little attention in recent neuroscientific studies employing novel, spectro-temporally complex stimuli. This may be because a stimulation paradigm that provides binaurally derived grouping cues of sufficient spectro-temporal complexity has not yet been established for neuroscientific ASA experiments. Random-chord stereograms (RCS) are a class of auditory stimuli that exploit spectro-temporal variations in the interaural envelope correlation of noise-like sounds with interaurally coherent fine structure; they evoke salient auditory percepts that emerge only under binaural listening. Here, our aim was to assess the usability of the RCS paradigm for indexing binaural processing in the human brain. To this end, we recorded EEG responses to RCS stimuli from 12 normal-hearing subjects. The stimuli consisted of an initial 3-s noise segment with interaurally uncorrelated envelopes, followed by another 3-s segment, where envelope correlation was modulated periodically according to the RCS paradigm. Modulations were applied either across the entire stimulus bandwidth (wideband stimuli) or in temporally shifting frequency bands (ripple stimulus). Event-related potentials and inter-trial phase coherence analyses of the EEG responses showed that the introduction of the 3- or 5-Hz wideband modulations produced a prominent change-onset complex and ongoing synchronized responses to the RCS modulations. In contrast, the ripple stimulus elicited a change-onset response but no response to ongoing RCS modulation. Frequency-domain analyses revealed increased spectral power at the fundamental frequency and the first harmonic of wideband RCS modulations. RCS stimulation yields robust EEG measures of binaurally driven auditory reorganization and has potential to provide a flexible stimulation paradigm suitable for isolating binaural effects in ASA experiments.
Collapse
|
7
|
Wang L, Wu EX, Chen F. EEG-based auditory attention decoding using speech-level-based segmented computational models. J Neural Eng 2021; 18. [PMID: 33957606 DOI: 10.1088/1741-2552/abfeba] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]
Abstract
Objective.Auditory attention in complex scenarios can be decoded by electroencephalography (EEG)-based cortical speech-envelope tracking. The relative root-mean-square (RMS) intensity is a valuable cue for the decomposition of speech into distinct characteristic segments. To improve auditory attention decoding (AAD) performance, this work proposed a novel segmented AAD approach to decode target speech envelopes from different RMS-level-based speech segments.Approach.Speech was decomposed into higher- and lower-RMS-level speech segments with a threshold of -10 dB relative RMS level. A support vector machine classifier was designed to identify higher- and lower-RMS-level speech segments, using clean target and mixed speech as reference signals based on corresponding EEG signals recorded when subjects listened to target auditory streams in competing two-speaker auditory scenes. Segmented computational models were developed with the classification results of higher- and lower-RMS-level speech segments. Speech envelopes were reconstructed based on segmented decoding models for either higher- or lower-RMS-level speech segments. AAD accuracies were calculated according to the correlations between actual and reconstructed speech envelopes. The performance of the proposed segmented AAD computational model was compared to those of traditional AAD methods with unified decoding functions.Main results.Higher- and lower-RMS-level speech segments in continuous sentences could be identified robustly with classification accuracies that approximated or exceeded 80% based on corresponding EEG signals at 6 dB, 3 dB, 0 dB, -3 dB and -6 dB signal-to-mask ratios (SMRs). Compared with unified AAD decoding methods, the proposed segmented AAD approach achieved more accurate results in the reconstruction of target speech envelopes and in the detection of attentional directions. Moreover, the proposed segmented decoding method had higher information transfer rates (ITRs) and shorter minimum expected switch times compared with the unified decoder.Significance.This study revealed that EEG signals may be used to classify higher- and lower-RMS-level-based speech segments across a wide range of SMR conditions (from 6 dB to -6 dB). A novel finding was that the specific information in different RMS-level-based speech segments facilitated EEG-based decoding of auditory attention. The significantly improved AAD accuracies and ITRs of the segmented decoding method suggests that this proposed computational model may be an effective method for the application of neuro-controlled brain-computer interfaces in complex auditory scenes.
Collapse
Affiliation(s)
- Lei Wang
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China.,Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Ed X Wu
- Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, People's Republic of China
| | - Fei Chen
- Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, People's Republic of China
| |
Collapse
|
8
|
Johnson JCS, Marshall CR, Weil RS, Bamiou DE, Hardy CJD, Warren JD. Hearing and dementia: from ears to brain. Brain 2021; 144:391-401. [PMID: 33351095 PMCID: PMC7940169 DOI: 10.1093/brain/awaa429] [Citation(s) in RCA: 79] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/02/2020] [Accepted: 10/17/2020] [Indexed: 12/19/2022] Open
Abstract
The association between hearing impairment and dementia has emerged as a major public health challenge, with significant opportunities for earlier diagnosis, treatment and prevention. However, the nature of this association has not been defined. We hear with our brains, particularly within the complex soundscapes of everyday life: neurodegenerative pathologies target the auditory brain, and are therefore predicted to damage hearing function early and profoundly. Here we present evidence for this proposition, based on structural and functional features of auditory brain organization that confer vulnerability to neurodegeneration, the extensive, reciprocal interplay between 'peripheral' and 'central' hearing dysfunction, and recently characterized auditory signatures of canonical neurodegenerative dementias (Alzheimer's disease, Lewy body disease and frontotemporal dementia). Moving beyond any simple dichotomy of ear and brain, we argue for a reappraisal of the role of auditory cognitive dysfunction and the critical coupling of brain to peripheral organs of hearing in the dementias. We call for a clinical assessment of real-world hearing in these diseases that moves beyond pure tone perception to the development of novel auditory 'cognitive stress tests' and proximity markers for the early diagnosis of dementia and management strategies that harness retained auditory plasticity.
Collapse
Affiliation(s)
- Jeremy C S Johnson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Charles R Marshall
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London, UK
| | - Rimona S Weil
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
- Movement Disorders Centre, Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, University College London, London, UK
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Doris-Eva Bamiou
- UCL Ear Institute and UCL/UCLH Biomedical Research Centre, National Institute for Health Research, University College London, London, UK
| | - Chris J D Hardy
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| | - Jason D Warren
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, University College London, London, UK
| |
Collapse
|
9
|
Holmes E, Zeidman P, Friston KJ, Griffiths TD. Difficulties with Speech-in-Noise Perception Related to Fundamental Grouping Processes in Auditory Cortex. Cereb Cortex 2020; 31:1582-1596. [PMID: 33136138 PMCID: PMC7869094 DOI: 10.1093/cercor/bhaa311] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/04/2020] [Accepted: 09/22/2020] [Indexed: 01/05/2023] Open
Abstract
In our everyday lives, we are often required to follow a conversation when background noise is present (“speech-in-noise” [SPIN] perception). SPIN perception varies widely—and people who are worse at SPIN perception are also worse at fundamental auditory grouping, as assessed by figure-ground tasks. Here, we examined the cortical processes that link difficulties with SPIN perception to difficulties with figure-ground perception using functional magnetic resonance imaging. We found strong evidence that the earliest stages of the auditory cortical hierarchy (left core and belt areas) are similarly disinhibited when SPIN and figure-ground tasks are more difficult (i.e., at target-to-masker ratios corresponding to 60% rather than 90% performance)—consistent with increased cortical gain at lower levels of the auditory hierarchy. Overall, our results reveal a common neural substrate for these basic (figure-ground) and naturally relevant (SPIN) tasks—which provides a common computational basis for the link between SPIN perception and fundamental auditory grouping.
Collapse
Affiliation(s)
- Emma Holmes
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Peter Zeidman
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Karl J Friston
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK
| | - Timothy D Griffiths
- Wellcome Centre for Human Neuroimaging, UCL Queen Square Institute of Neurology, UCL, London WC1N 3AR, UK.,Biosciences Institute, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| |
Collapse
|
10
|
Gupta S, Bee MA. Treefrogs exploit temporal coherence to form perceptual objects of communication signals. Biol Lett 2020; 16:20200573. [PMID: 32961090 PMCID: PMC7532704 DOI: 10.1098/rsbl.2020.0573] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 09/07/2020] [Indexed: 11/12/2022] Open
Abstract
For many animals, navigating their environment requires an ability to organize continuous streams of sensory input into discrete 'perceptual objects' that correspond to physical entities in visual and auditory scenes. The human visual and auditory systems follow several Gestalt laws of perceptual organization to bind constituent features into coherent perceptual objects. A largely unexplored question is whether nonhuman animals follow similar Gestalt laws in perceiving behaviourally relevant stimuli, such as communication signals. We used females of Cope's grey treefrog (Hyla chrysoscelis) to test the hypothesis that temporal coherence-a powerful Gestalt principle in human auditory scene analysis-promotes perceptual binding in forming auditory objects of species-typical vocalizations. According to the principle of temporal coherence, sound elements that start and stop at the same time or that modulate coherently over time are likely to become bound together into the same auditory object. We found that the natural temporal coherence between two spectral components of advertisement calls promotes their perceptual binding into auditory objects of advertisement calls. Our findings confirm the broad ecological validity of temporal coherence as a Gestalt law of auditory perceptual organization guiding the formation of biologically relevant perceptual objects in animal behaviour.
Collapse
Affiliation(s)
- Saumya Gupta
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN 55108, USA
| | - Mark A. Bee
- Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN 55108, USA
- Graduate Program in Neuroscience, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
11
|
Schiavi C, Finzi A, Cellini M. Steady-State Pattern Electroretinogram and Frequency Doubling Technology in Adult Dyslexic Readers. Clin Ophthalmol 2019; 13:2451-2459. [PMID: 31849443 PMCID: PMC6912011 DOI: 10.2147/opth.s229898] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Accepted: 11/27/2019] [Indexed: 11/23/2022] Open
Abstract
Purpose Dyslexia is a reading disorder with neurological deficit of the magnocellular pathway. The aim of our study was to evaluate the functionality of the magnocellular-Y (M-Y) retinal ganglion cells in adult dyslexic subjects using steady-state pattern electroretinogram and frequency doubling perimetry. Methods Ten patients with dyslexia (7 females and 3 males), mean age 28.7 ± 5.9 years, and 10 subjects without dyslexia (6 females and 4 males), mean age 27.8 ± 4.1 years, were enrolled in the study and underwent both steady-state pattern-electroretinogram examination and frequency doubling perimetry. Results There was a significant difference in the amplitude of the steady-state pattern electroretinogram of the dyslexic group and the healthy controls (0.610±0.110 μV vs 1.250±0.296 μV; p=0.0001). Furthermore, in the dyslexic group we found a significant difference between the right eye and the left eye (0.671±0.11 μV vs 0.559±0.15 μV; p=0.001). With frequency doubling perimetry, the pattern standard deviation index increased in dyslexic eyes compared to healthy controls (4.40±0.81 dB vs 2.99±0.35 dB; p=0.0001) and in the left eye versus the right eye of the dyslexic group (4.43±1.10 dB vs 3.66±0.96 dB; p=0.031). There was a correlation between the reduction in the wave amplitude of the pattern electroretinogram and the simultaneous increase in the pattern standard deviation values (r=0.80; p=0.001). This correlation was also found to be present in the left eye (r=0.93; p<0.001) and the right eye (r=0.81; p=0.005) of dyslexic subjects. Conclusion Our study shows that there was an alteration of the activity of M-Y retinal ganglion cells, especially in the left eye. It confirms that in dyslexia there is a deficit of visual attention with damage not only of the magnocellular-dorsal pathway but also of the M-Y retinal ganglion cells.
Collapse
Affiliation(s)
- Costantino Schiavi
- Department of Experimental, Diagnostic, and Specialty Medicine, Ophthalmology Service, University of Bologna, Bologna 40138, Italy
| | - Alessandro Finzi
- Department of Experimental, Diagnostic, and Specialty Medicine, Ophthalmology Service, University of Bologna, Bologna 40138, Italy
| | - Mauro Cellini
- Department of Experimental, Diagnostic, and Specialty Medicine, Ophthalmology Service, University of Bologna, Bologna 40138, Italy
| |
Collapse
|
12
|
Coffey EBJ, Arseneau-Bruneau I, Zhang X, Zatorre RJ. The Music-In-Noise Task (MINT): A Tool for Dissecting Complex Auditory Perception. Front Neurosci 2019; 13:199. [PMID: 30930734 PMCID: PMC6427094 DOI: 10.3389/fnins.2019.00199] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2018] [Accepted: 02/20/2019] [Indexed: 11/30/2022] Open
Abstract
The ability to segregate target sounds in noisy backgrounds is relevant both to neuroscience and to clinical applications. Recent research suggests that hearing-in-noise (HIN) problems are solved using combinations of sub-skills that are applied according to task demand and information availability. While evidence is accumulating for a musician advantage in HIN, the exact nature of the reported training effect is not fully understood. Existing HIN tests focus on tasks requiring understanding of speech in the presence of competing sound. Because visual, spatial and predictive cues are not systematically considered in these tasks, few tools exist to investigate the most relevant components of cognitive processes involved in stream segregation. We present the Music-In-Noise Task (MINT) as a flexible tool to expand HIN measures beyond speech perception, and for addressing research questions pertaining to the relative contributions of HIN sub-skills, inter-individual differences in their use, and their neural correlates. The MINT uses a match-mismatch trial design: in four conditions (Baseline, Rhythm, Spatial, and Visual) subjects first hear a short instrumental musical excerpt embedded in an informational masker of "multi-music" noise, followed by either a matching or scrambled repetition of the target musical excerpt presented in silence; the four conditions differ according to the presence or absence of additional cues. In a fifth condition (Prediction), subjects hear the excerpt in silence as a target first, which helps to anticipate incoming information when the target is embedded in masking sound. Data from samples of young adults show that the MINT has good reliability and internal consistency, and demonstrate selective benefits of musicianship in the Prediction, Rhythm, and Visual subtasks. We also report a performance benefit of multilingualism that is separable from that of musicianship. Average MINT scores were correlated with scores on a sentence-in-noise perception task, but only accounted for a relatively small percentage of the variance, indicating that the MINT is sensitive to additional factors and can provide a complement and extension of speech-based tests for studying stream segregation. A customizable version of the MINT is made available for use and extension by the scientific community.
Collapse
Affiliation(s)
- Emily B. J. Coffey
- Department of Psychology, Concordia University, Montreal, QC, Canada
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
| | - Isabelle Arseneau-Bruneau
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| | - Xiaochen Zhang
- Department of Biomedical Engineering, School of Medicine, Tsinghua University, Beijing, China
| | - Robert J. Zatorre
- Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), Montreal, QC, Canada
- Centre for Interdisciplinary Research in Music Media and Technology (CIRMMT), Montreal, QC, Canada
- Montreal Neurological Institute, McGill University, Montreal, QC, Canada
| |
Collapse
|
13
|
Auditory Figure-Ground Segregation Is Impaired by High Visual Load. J Neurosci 2018; 39:1699-1708. [PMID: 30541915 PMCID: PMC6391559 DOI: 10.1523/jneurosci.2518-18.2018] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 11/19/2018] [Accepted: 11/19/2018] [Indexed: 11/21/2022] Open
Abstract
Figure-ground segregation is fundamental to listening in complex acoustic environments. An ongoing debate pertains to whether segregation requires attention or is "automatic" and preattentive. In this magnetoencephalography study, we tested a prediction derived from load theory of attention (e.g., Lavie, 1995) that segregation requires attention but can benefit from the automatic allocation of any "leftover" capacity under low load. Complex auditory scenes were modeled with stochastic figure-ground stimuli (Teki et al., 2013), which occasionally contained repeated frequency component "figures." Naive human participants (both sexes) passively listened to these signals while performing a visual attention task of either low or high load. While clear figure-related neural responses were observed under conditions of low load, high visual load substantially reduced the neural response to the figure in auditory cortex (planum temporale, Heschl's gyrus). We conclude that fundamental figure-ground segregation in hearing is not automatic but draws on resources that are shared across vision and audition.SIGNIFICANCE STATEMENT This work resolves a long-standing question of whether figure-ground segregation, a fundamental process of auditory scene analysis, requires attention or is underpinned by automatic, encapsulated computations. Task-irrelevant sounds were presented during performance of a visual search task. We revealed a clear magnetoencephalography neural signature of figure-ground segregation in conditions of low visual load, which was substantially reduced in conditions of high visual load. This demonstrates that, although attention does not need to be actively allocated to sound for auditory segregation to occur, segregation depends on shared computational resources across vision and hearing. The findings further highlight that visual load can impair the computational capacity of the auditory system, even when it does not simply dampen auditory responses as a whole.
Collapse
|
14
|
Gómez-Álvarez M, Gourévitch B, Felix RA, Nyberg T, Hernández-Montiel HL, Magnusson AK. Temporal information in tones, broadband noise, and natural vocalizations is conveyed by differential spiking responses in the superior paraolivary nucleus. Eur J Neurosci 2018; 48:2030-2049. [PMID: 30019495 DOI: 10.1111/ejn.14073] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 06/12/2018] [Accepted: 06/26/2018] [Indexed: 12/31/2022]
Abstract
Communication sounds across all mammals consist of multiple frequencies repeated in sequence. The onset and offset of vocalizations are potentially important cues for recognizing distinct units, such as phonemes and syllables, which are needed to perceive meaningful communication. The superior paraolivary nucleus (SPON) in the auditory brainstem has been implicated in the processing of rhythmic sounds. Here, we compared how best frequency tones (BFTs), broadband noise (BBN), and natural mouse calls elicit onset and offset spiking in the mouse SPON. The results demonstrate that onset spiking typically occurs in response to BBN, but not BFT stimulation, while spiking at the sound offset occurs for both stimulus types. This effect of stimulus bandwidth on spiking is consistent with two of the established inputs to the SPON from the octopus cells (onset spiking) and medial nucleus of the trapezoid body (offset spiking). Natural mouse calls elicit two main spiking peaks. The first spiking peak, which is weak or absent with BFT stimulation, occurs most consistently during the call envelope, while the second spiking peak occurs at the call offset. This suggests that the combined spiking activity in the SPON elicited by vocalizations reflects the entire envelope, that is, the coarse amplitude waveform. Since the output from the SPON is purely inhibitory, it is speculated that, at the level of the inferior colliculus, the broadly tuned first peak may improve the signal-to-noise ratio of the subsequent, more call frequency-specific peak. Thus, the SPON may provide a dual inhibition mechanism for tracking phonetic boundaries in social-vocal communication.
Collapse
Affiliation(s)
- Marcelo Gómez-Álvarez
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| | - Boris Gourévitch
- Unité de Génétique et Physiologie de l'Audition, INSERM, Institut Pasteur, Sorbonne Université Paris, Paris, France.,CNRS, Paris, France
| | | | - Tobias Nyberg
- Division of Neuronic Engineering, Department of Biomedical Engineering and Health Systems, KTH Royal Institute of Technology, Stockholm, Sweden
| | - Hebert L Hernández-Montiel
- Laboratorio de Neurobiología y Bioingeniería Celular, Clínica del Sistema Nervioso, Universidad Autónoma de Querétaro, Santiago de Querétaro, México
| | - Anna K Magnusson
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
15
|
Popham S, Boebinger D, Ellis DPW, Kawahara H, McDermott JH. Inharmonic speech reveals the role of harmonicity in the cocktail party problem. Nat Commun 2018; 9:2122. [PMID: 29844313 PMCID: PMC5974276 DOI: 10.1038/s41467-018-04551-8] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Accepted: 05/08/2018] [Indexed: 11/22/2022] Open
Abstract
The "cocktail party problem" requires us to discern individual sound sources from mixtures of sources. The brain must use knowledge of natural sound regularities for this purpose. One much-discussed regularity is the tendency for frequencies to be harmonically related (integer multiples of a fundamental frequency). To test the role of harmonicity in real-world sound segregation, we developed speech analysis/synthesis tools to perturb the carrier frequencies of speech, disrupting harmonic frequency relations while maintaining the spectrotemporal envelope that determines phonemic content. We find that violations of harmonicity cause individual frequencies of speech to segregate from each other, impair the intelligibility of concurrent utterances despite leaving intelligibility of single utterances intact, and cause listeners to lose track of target talkers. However, additional segregation deficits result from replacing harmonic frequencies with noise (simulating whispering), suggesting additional grouping cues enabled by voiced speech excitation. Our results demonstrate acoustic grouping cues in real-world sound segregation.
Collapse
Affiliation(s)
- Sara Popham
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Helen Wills Neuroscience Institute, UC Berkeley, Berkeley, CA, 94720, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA
| | | | | | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- Program in Speech and Hearing Sciences, Harvard University, Cambridge, MA, 02138, USA.
| |
Collapse
|
16
|
Felix RA, Gourévitch B, Portfors CV. Subcortical pathways: Towards a better understanding of auditory disorders. Hear Res 2018; 362:48-60. [PMID: 29395615 PMCID: PMC5911198 DOI: 10.1016/j.heares.2018.01.008] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 12/11/2017] [Accepted: 01/16/2018] [Indexed: 01/13/2023]
Abstract
Hearing loss is a significant problem that affects at least 15% of the population. This percentage, however, is likely significantly higher because of a variety of auditory disorders that are not identifiable through traditional tests of peripheral hearing ability. In these disorders, individuals have difficulty understanding speech, particularly in noisy environments, even though the sounds are loud enough to hear. The underlying mechanisms leading to such deficits are not well understood. To enable the development of suitable treatments to alleviate or prevent such disorders, the affected processing pathways must be identified. Historically, mechanisms underlying speech processing have been thought to be a property of the auditory cortex and thus the study of auditory disorders has largely focused on cortical impairments and/or cognitive processes. As we review here, however, there is strong evidence to suggest that, in fact, deficits in subcortical pathways play a significant role in auditory disorders. In this review, we highlight the role of the auditory brainstem and midbrain in processing complex sounds and discuss how deficits in these regions may contribute to auditory dysfunction. We discuss current research with animal models of human hearing and then consider human studies that implicate impairments in subcortical processing that may contribute to auditory disorders.
Collapse
Affiliation(s)
- Richard A Felix
- School of Biological Sciences and Integrative Physiology and Neuroscience, Washington State University, Vancouver, WA, USA
| | - Boris Gourévitch
- Unité de Génétique et Physiologie de l'Audition, UMRS 1120 INSERM, Institut Pasteur, Université Pierre et Marie Curie, F-75015, Paris, France; CNRS, France
| | - Christine V Portfors
- School of Biological Sciences and Integrative Physiology and Neuroscience, Washington State University, Vancouver, WA, USA.
| |
Collapse
|
17
|
Paavilainen P, Kaukinen C, Koskinen O, Kylmälä J, Rehn L. Mismatch negativity (MMN) elicited by abstract regularity violations in two concurrent auditory streams. Heliyon 2018; 4:e00608. [PMID: 29862369 PMCID: PMC5968198 DOI: 10.1016/j.heliyon.2018.e00608] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2018] [Revised: 03/19/2018] [Accepted: 04/12/2018] [Indexed: 12/04/2022] Open
Abstract
The study investigated whether violations of abstract regularities in two parallel auditory stimulus streams can elicit the MMN (mismatch negativity) event-related potential. Tone pairs from a low (220–392 Hz) and a high (1319–2349 Hz) stream were delivered in an alternating order either at a fast or a slow pace. With the slow pace, the pairs were perceptually heard as a single stream obeying an alternating low pair-high pair pattern, whereas with the fast pace, an experience of two separate auditory streams, low and high, emerged. Both streams contained standard and deviant pairs. The standard pairs were either in both streams ascending in the direction of the within-pair pitch change or in the one stream ascending and in the other stream descending. The direction of the deviant pairs was opposite to that of the same-stream standard pairs. The participant's task was either to ignore the auditory stimuli or to detect the deviant pairs in the designated stream. The deviant pairs elicited an MMN both when the directions of the standard pairs in the two streams were the same or when they were opposite. The MMN was present irrespective of the pace of stimulation. The results indicate that the preattentive brain mechanisms, reflected by the MMN, can extract abstract regularities from two concurrent streams even when the regularities are opposite in the two streams, and independently of whether there perceptually exists only one stimulus stream or two segregated streams. These results demonstrate the brain's remarkable ability to model various regularities embedded in the auditory environment and update the models when the regularities are violated. The observed phenomena can be related to several aspects of auditory information processing, e.g., music and speech perception and different forms of attention.
Collapse
Affiliation(s)
- Petri Paavilainen
- Department of Psychology and Logopedics, 00014, University of Helsinki, Finland.,Cognitive Brain Research Unit, 00014, University of Helsinki, Finland
| | - Crista Kaukinen
- Department of Psychology and Logopedics, 00014, University of Helsinki, Finland
| | - Oskari Koskinen
- Department of Psychology and Logopedics, 00014, University of Helsinki, Finland
| | - Julia Kylmälä
- Cognitive Science, 00014, University of Helsinki, Finland
| | - Leila Rehn
- Department of Psychology and Logopedics, 00014, University of Helsinki, Finland
| |
Collapse
|
18
|
Abstract
The cocktail party problem requires listeners to infer individual sound sources from mixtures of sound. The problem can be solved only by leveraging regularities in natural sound sources, but little is known about how such regularities are internalized. We explored whether listeners learn source "schemas"-the abstract structure shared by different occurrences of the same type of sound source-and use them to infer sources from mixtures. We measured the ability of listeners to segregate mixtures of time-varying sources. In each experiment a subset of trials contained schema-based sources generated from a common template by transformations (transposition and time dilation) that introduced acoustic variation but preserved abstract structure. Across several tasks and classes of sound sources, schema-based sources consistently aided source separation, in some cases producing rapid improvements in performance over the first few exposures to a schema. Learning persisted across blocks that did not contain the learned schema, and listeners were able to learn and use multiple schemas simultaneously. No learning was evident when schema were presented in the task-irrelevant (i.e., distractor) source. However, learning from task-relevant stimuli showed signs of being implicit, in that listeners were no more likely to report that sources recurred in experiments containing schema-based sources than in control experiments containing no schema-based sources. The results implicate a mechanism for rapidly internalizing abstract sound structure, facilitating accurate perceptual organization of sound sources that recur in the environment.
Collapse
|
19
|
Disbergen NR, Valente G, Formisano E, Zatorre RJ. Assessing Top-Down and Bottom-Up Contributions to Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2018; 12:121. [PMID: 29563861 PMCID: PMC5845899 DOI: 10.3389/fnins.2018.00121] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Accepted: 02/15/2018] [Indexed: 11/24/2022] Open
Abstract
Polyphonic music listening well exemplifies processes typically involved in daily auditory scene analysis situations, relying on an interactive interplay between bottom-up and top-down processes. Most studies investigating scene analysis have used elementary auditory scenes, however real-world scene analysis is far more complex. In particular, music, contrary to most other natural auditory scenes, can be perceived by either integrating or, under attentive control, segregating sound streams, often carried by different instruments. One of the prominent bottom-up cues contributing to multi-instrument music perception is their timbre difference. In this work, we introduce and validate a novel paradigm designed to investigate, within naturalistic musical auditory scenes, attentive modulation as well as its interaction with bottom-up processes. Two psychophysical experiments are described, employing custom-composed two-voice polyphonic music pieces within a framework implementing a behavioral performance metric to validate listener instructions requiring either integration or segregation of scene elements. In Experiment 1, the listeners' locus of attention was switched between individual instruments or the aggregate (i.e., both instruments together), via a task requiring the detection of temporal modulations (i.e., triplets) incorporated within or across instruments. Subjects responded post-stimulus whether triplets were present in the to-be-attended instrument(s). Experiment 2 introduced the bottom-up manipulation by adding a three-level morphing of instrument timbre distance to the attentional framework. The task was designed to be used within neuroimaging paradigms; Experiment 2 was additionally validated behaviorally in the functional Magnetic Resonance Imaging (fMRI) environment. Experiment 1 subjects (N = 29, non-musicians) completed the task at high levels of accuracy, showing no group differences between any experimental conditions. Nineteen listeners also participated in Experiment 2, showing a main effect of instrument timbre distance, even though within attention-condition timbre-distance contrasts did not demonstrate any timbre effect. Correlation of overall scores with morph-distance effects, computed by subtracting the largest from the smallest timbre distance scores, showed an influence of general task difficulty on the timbre distance effect. Comparison of laboratory and fMRI data showed scanner noise had no adverse effect on task performance. These Experimental paradigms enable to study both bottom-up and top-down contributions to auditory stream segregation and integration within psychophysical and neuroimaging experiments.
Collapse
Affiliation(s)
- Niels R Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands.,Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands.,Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands.,Maastricht Brain Imaging Center (MBIC), Maastricht, Netherlands
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada.,International Laboratory for Brain Music and Sound Research (BRAMS), Montreal, QC, Canada
| |
Collapse
|
20
|
Felix Ii RA, Gourévitch B, Gómez-Álvarez M, Leijon SCM, Saldaña E, Magnusson AK. Octopus Cells in the Posteroventral Cochlear Nucleus Provide the Main Excitatory Input to the Superior Paraolivary Nucleus. Front Neural Circuits 2017; 11:37. [PMID: 28620283 PMCID: PMC5449481 DOI: 10.3389/fncir.2017.00037] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 05/19/2017] [Indexed: 12/26/2022] Open
Abstract
Auditory streaming enables perception and interpretation of complex acoustic environments that contain competing sound sources. At early stages of central processing, sounds are segregated into separate streams representing attributes that later merge into acoustic objects. Streaming of temporal cues is critical for perceiving vocal communication, such as human speech, but our understanding of circuits that underlie this process is lacking, particularly at subcortical levels. The superior paraolivary nucleus (SPON), a prominent group of inhibitory neurons in the mammalian brainstem, has been implicated in processing temporal information needed for the segmentation of ongoing complex sounds into discrete events. The SPON requires temporally precise and robust excitatory input(s) to convey information about the steep rise in sound amplitude that marks the onset of voiced sound elements. Unfortunately, the sources of excitation to the SPON and the impact of these inputs on the behavior of SPON neurons have yet to be resolved. Using anatomical tract tracing and immunohistochemistry, we identified octopus cells in the contralateral cochlear nucleus (CN) as the primary source of excitatory input to the SPON. Cluster analysis of miniature excitatory events also indicated that the majority of SPON neurons receive one type of excitatory input. Precise octopus cell-driven onset spiking coupled with transient offset spiking make SPON responses well-suited to signal transitions in sound energy contained in vocalizations. Targets of octopus cell projections, including the SPON, are strongly implicated in the processing of temporal sound features, which suggests a common pathway that conveys information critical for perception of complex natural sounds.
Collapse
Affiliation(s)
- Richard A Felix Ii
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska InstitutetStockholm, Sweden
| | - Boris Gourévitch
- Institut Pasteur, Unité de Génétique et Physiologie de l'AuditionParis, France.,Institut National de la Santé et de la Recherche Médicale, UMRS 1120Paris, France.,Université Pierre et Marie CurieParis, France
| | - Marcelo Gómez-Álvarez
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska InstitutetStockholm, Sweden.,Neuroscience Institute of Castilla y León (INCyL), Universidad de SalamancaSalamanca, Spain.,Institute of Biomedical Research of Salamanca (IBSAL)Salamanca, Spain
| | - Sara C M Leijon
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska InstitutetStockholm, Sweden
| | - Enrique Saldaña
- Neuroscience Institute of Castilla y León (INCyL), Universidad de SalamancaSalamanca, Spain.,Institute of Biomedical Research of Salamanca (IBSAL)Salamanca, Spain
| | - Anna K Magnusson
- Unit of Audiology, Department of Clinical Science, Intervention and Technology, Karolinska InstitutetStockholm, Sweden
| |
Collapse
|
21
|
Ni R, Bender DA, Shanechi AM, Gamble JR, Barbour DL. Contextual effects of noise on vocalization encoding in primary auditory cortex. J Neurophysiol 2016; 117:713-727. [PMID: 27881720 DOI: 10.1152/jn.00476.2016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2016] [Accepted: 11/17/2016] [Indexed: 11/22/2022] Open
Abstract
Robust auditory perception plays a pivotal function for processing behaviorally relevant sounds, particularly with distractions from the environment. The neuronal coding enabling this ability, however, is still not well understood. In this study, we recorded single-unit activity from the primary auditory cortex (A1) of awake marmoset monkeys (Callithrix jacchus) while delivering conspecific vocalizations degraded by two different background noises: broadband white noise and vocalization babble. Noise effects on neural representation of target vocalizations were quantified by measuring the responses' similarity to those elicited by natural vocalizations as a function of signal-to-noise ratio. A clustering approach was used to describe the range of response profiles by reducing the population responses to a summary of four response classes (robust, balanced, insensitive, and brittle) under both noise conditions. This clustering approach revealed that, on average, approximately two-thirds of the neurons change their response class when encountering different noises. Therefore, the distortion induced by one particular masking background in single-unit responses is not necessarily predictable from that induced by another, suggesting the low likelihood of a unique group of noise-invariant neurons across different background conditions in A1. Regarding noise influence on neural activities, the brittle response group showed addition of spiking activity both within and between phrases of vocalizations relative to clean vocalizations, whereas the other groups generally showed spiking activity suppression within phrases, and the alteration between phrases was noise dependent. Overall, the variable single-unit responses, yet consistent response types, imply that primate A1 performs scene analysis through the collective activity of multiple neurons. NEW & NOTEWORTHY The understanding of where and how auditory scene analysis is accomplished is of broad interest to neuroscientists. In this paper, we systematically investigated neuronal coding of multiple vocalizations degraded by two distinct noises at various signal-to-noise ratios in nonhuman primates. In the process, we uncovered heterogeneity of single-unit representations for different auditory scenes yet homogeneity of responses across the population.
Collapse
Affiliation(s)
- Ruiye Ni
- Laboratory of Sensory Neuroscience and Neuroengineering, Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| | - David A Bender
- Laboratory of Sensory Neuroscience and Neuroengineering, Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| | - Amirali M Shanechi
- Laboratory of Sensory Neuroscience and Neuroengineering, Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| | - Jeffrey R Gamble
- Laboratory of Sensory Neuroscience and Neuroengineering, Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| | - Dennis L Barbour
- Laboratory of Sensory Neuroscience and Neuroengineering, Department of Biomedical Engineering, Washington University in St. Louis, St. Louis, Missouri
| |
Collapse
|
22
|
Mehta AH, Yasin I, Oxenham AJ, Shamma S. Neural correlates of attention and streaming in a perceptually multistable auditory illusion. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:2225. [PMID: 27794350 PMCID: PMC5849028 DOI: 10.1121/1.4963902] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Revised: 09/12/2016] [Accepted: 09/20/2016] [Indexed: 06/06/2023]
Abstract
In a complex acoustic environment, acoustic cues and attention interact in the formation of streams within the auditory scene. In this study, a variant of the "octave illusion" [Deutsch (1974). Nature 251, 307-309] was used to investigate the neural correlates of auditory streaming, and to elucidate the effects of attention on the interaction between sequential and concurrent sound segregation in humans. By directing subjects' attention to different frequencies and ears, it was possible to elicit several different illusory percepts with the identical stimulus. The first experiment tested the hypothesis that the illusion depends on the ability of listeners to perceptually stream the target tones from within the alternating sound sequences. In the second experiment, concurrent psychophysical measures and electroencephalography recordings provided neural correlates of the various percepts elicited by the multistable stimulus. The results show that the perception and neural correlates of the auditory illusion can be manipulated robustly by attentional focus and that the illusion is constrained in much the same way as auditory stream segregation, suggesting common underlying mechanisms.
Collapse
Affiliation(s)
- Anahita H Mehta
- Ear Institute, University College London, 332 Gray's Inn Road, London WC1X 8EE, United Kingdom
| | - Ifat Yasin
- Department of Computer Science, University College London, 66-72 Gower Street, London WC1E 6BT, United Kingdom
| | - Andrew J Oxenham
- Department of Psychology, University of Minnesota, 75 East River Parkway, Minneapolis, Minnesota 55455, USA
| | - Shihab Shamma
- Institute for Systems Research, 2203 A.V. Williams Building, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
23
|
Teki S, Barascud N, Picard S, Payne C, Griffiths TD, Chait M. Neural Correlates of Auditory Figure-Ground Segregation Based on Temporal Coherence. Cereb Cortex 2016; 26:3669-80. [PMID: 27325682 PMCID: PMC5004755 DOI: 10.1093/cercor/bhw173] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
To make sense of natural acoustic environments, listeners must parse complex mixtures of sounds that vary in frequency, space, and time. Emerging work suggests that, in addition to the well-studied spectral cues for segregation, sensitivity to temporal coherence-the coincidence of sound elements in and across time-is also critical for the perceptual organization of acoustic scenes. Here, we examine pre-attentive, stimulus-driven neural processes underlying auditory figure-ground segregation using stimuli that capture the challenges of listening in complex scenes where segregation cannot be achieved based on spectral cues alone. Signals ("stochastic figure-ground": SFG) comprised a sequence of brief broadband chords containing random pure tone components that vary from 1 chord to another. Occasional tone repetitions across chords are perceived as "figures" popping out of a stochastic "ground." Magnetoencephalography (MEG) measurement in naïve, distracted, human subjects revealed robust evoked responses, commencing from about 150 ms after figure onset that reflect the emergence of the "figure" from the randomly varying "ground." Neural sources underlying this bottom-up driven figure-ground segregation were localized to planum temporale, and the intraparietal sulcus, demonstrating that this area, outside the "classic" auditory system, is also involved in the early stages of auditory scene analysis."
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
- Current address: Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3QX, UK
| | - Nicolas Barascud
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Ear Institute, University College London, London WC1X 8EE, UK
| | - Samuel Picard
- Ear Institute, University College London, London WC1X 8EE, UK
| | | | - Timothy D. Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
- Auditory Cognition Group, Institute of Neuroscience, Newcastle University, Newcastle upon Tyne NE2 4HH, UK
| | - Maria Chait
- Ear Institute, University College London, London WC1X 8EE, UK
| |
Collapse
|
24
|
Yamagishi S, Otsuka S, Furukawa S, Kashino M. Subcortical correlates of auditory perceptual organization in humans. Hear Res 2016; 339:104-11. [PMID: 27371867 DOI: 10.1016/j.heares.2016.06.016] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 06/22/2016] [Accepted: 06/27/2016] [Indexed: 11/25/2022]
Abstract
To make sense of complex auditory scenes, the auditory system sequentially organizes auditory components into perceptual objects or streams. In the conventional view of this process, the cortex plays a major role in perceptual organization, and subcortical mechanisms merely provide the cortex with acoustical features. Here, we show that the neural activities of the brainstem are linked to perceptual organization, which alternates spontaneously for human listeners without any stimulus change. The stimulus used in the experiment was an unchanging sequence of repeated triplet tones, which can be interpreted as either one or two streams. Listeners were instructed to report the perceptual states whenever they experienced perceptual switching between one and two streams throughout the stimulus presentation. Simultaneously, we recorded event related potentials with scalp electrodes. We measured the frequency-following response (FFR), which is considered to originate from the brainstem. We also assessed thalamo-cortical activity through the middle-latency response (MLR). The results demonstrate that the FFR and MLR varied with the state of auditory stream perception. In addition, we found that the MLR change precedes the FFR change with perceptual switching from a one-stream to a two-stream percept. This suggests that there are top-down influences on brainstem activity from the thalamo-cortical pathway. These findings are consistent with the idea of a distributed, hierarchical neural network for perceptual organization and suggest that the network extends to the brainstem level.
Collapse
Affiliation(s)
- Shimpei Yamagishi
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8503, Japan.
| | - Sho Otsuka
- NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198, Japan.
| | - Shigeto Furukawa
- NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198, Japan.
| | - Makio Kashino
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama, Kanagawa, 226-8503, Japan; NTT Communication Science Laboratories, NTT Corporation, 3-1 Morinosato Wakamiya, Atsugi, Kanagawa, 243-0198, Japan.
| |
Collapse
|
25
|
Theta oscillations accompanying concurrent auditory stream segregation. Int J Psychophysiol 2016; 106:141-51. [PMID: 27170058 DOI: 10.1016/j.ijpsycho.2016.05.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 04/25/2016] [Accepted: 05/06/2016] [Indexed: 11/21/2022]
Abstract
The ability to isolate a single sound source among concurrent sources is crucial for veridical auditory perception. The present study investigated the event-related oscillations evoked by complex tones, which could be perceived as a single sound and tonal complexes with cues promoting the perception of two concurrent sounds by inharmonicity, onset asynchrony, and/or perceived source location difference of the components tones. In separate task conditions, participants performed a visual change detection task (visual control), watched a silent movie (passive listening) or reported for each tone whether they perceived one or two concurrent sounds (active listening). In two time windows, the amplitude of theta oscillation was modulated by the presence vs. absence of the cues: 60-350ms/6-8Hz (early) and 350-450ms/4-8Hz (late). The early response appeared both in the passive and the active listening conditions; it did not closely match the task performance; and it had a fronto-central scalp distribution. The late response was only elicited in the active listening condition; it closely matched the task performance; and it had a centro-parietal scalp distribution. The neural processes reflected by these responses are probably involved in the processing of concurrent sound segregation cues, in sound categorization, and response preparation and monitoring. The current results are compatible with the notion that theta oscillations mediate some of the processes involved in concurrent sound segregation.
Collapse
|
26
|
Abstract
The hearing of turtles is poorly understood compared with the other reptiles. Although the mechanism of transduction of sound into a neural signal via hair cells has been described in detail, the rest of the auditory system is largely a black box. What is known is that turtles have higher hearing thresholds than other reptiles, with best frequencies around 500 Hz. They also have lower underwater hearing thresholds than those in air, owing to resonance of the middle ear cavity. Further studies demonstrated that all families of turtles and tortoises share a common middle ear cavity morphology, with scaling best suited to underwater hearing. This supports an aquatic origin of the group. Because turtles hear best under water, it is important to examine their vulnerability to anthropogenic noise. However, the lack of basic data makes such experiments difficult because only a few species of turtles have published audiograms. There are also almost no behavioral data available (understandable due to training difficulties). Finally, few studies show what kinds of sounds are behaviorally relevant. One notable paper revealed that the Australian snake-necked turtle (Chelodina oblonga) has a vocal repertoire in air, at the interface, and under water. Findings like these suggest that there is more to the turtle aquatic auditory scene than previously thought.
Collapse
|
27
|
Riecke L, Sack AT, Schroeder CE. Endogenous Delta/Theta Sound-Brain Phase Entrainment Accelerates the Buildup of Auditory Streaming. Curr Biol 2015; 25:3196-201. [PMID: 26628008 DOI: 10.1016/j.cub.2015.10.045] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2015] [Revised: 10/01/2015] [Accepted: 10/19/2015] [Indexed: 11/30/2022]
Abstract
In many natural listening situations, meaningful sounds (e.g., speech) fluctuate in slow rhythms among other sounds. When a slow rhythmic auditory stream is selectively attended, endogenous delta (1‒4 Hz) oscillations in auditory cortex may shift their timing so that higher-excitability neuronal phases become aligned with salient events in that stream [1, 2]. As a consequence of this stream-brain phase entrainment [3], these events are processed and perceived more readily than temporally non-overlapping events [4-11], essentially enhancing the neural segregation between the attended stream and temporally noncoherent streams [12]. Stream-brain phase entrainment is robust to acoustic interference [13-20] provided that target stream-evoked rhythmic activity can be segregated from noncoherent activity evoked by other sounds [21], a process that usually builds up over time [22-27]. However, it has remained unclear whether stream-brain phase entrainment functionally contributes to this buildup of rhythmic streams or whether it is merely an epiphenomenon of it. Here, we addressed this issue directly by experimentally manipulating endogenous stream-brain phase entrainment in human auditory cortex with non-invasive transcranial alternating current stimulation (TACS) [28-30]. We assessed the consequences of these manipulations on the perceptual buildup of the target stream (the time required to recognize its presence in a noisy background), using behavioral measures in 20 healthy listeners performing a naturalistic listening task. Experimentally induced cyclic 4-Hz variations in stream-brain phase entrainment reliably caused a cyclic 4-Hz pattern in perceptual buildup time. Our findings demonstrate that strong endogenous delta/theta stream-brain phase entrainment accelerates the perceptual emergence of task-relevant rhythmic streams in noisy environments.
Collapse
Affiliation(s)
- Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 Maastricht, the Netherlands.
| | - Alexander T Sack
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 Maastricht, the Netherlands
| | - Charles E Schroeder
- Cognitive Neuroscience and Schizophrenia Program, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY 10962, USA; Departments of Neurosurgery and Psychiatry, Columbia University College of Physicians and Surgeons, New York, NY 10032-2695, USA
| |
Collapse
|
28
|
Krause MB. Pay Attention!: Sluggish Multisensory Attentional Shifting as a Core Deficit in Developmental Dyslexia. DYSLEXIA (CHICHESTER, ENGLAND) 2015; 21:285-303. [PMID: 26338085 DOI: 10.1002/dys.1505] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2014] [Revised: 04/29/2015] [Accepted: 08/06/2015] [Indexed: 06/05/2023]
Abstract
The aim of this review is to provide a background on the neurocognitive aspects of the reading process and review neuroscientific studies of individuals with developmental dyslexia, which provide evidence for amodal processing deficits. Hari, Renvall, and Tanskanen (2001) propose amodal sluggish attentional shifting (SAS) as a causal factor for temporal processing deficits in dyslexia. Undergirding this theory is the notion that when dyslexics are faced with rapid sequences of stimuli, their automatic attentional systems fail to disengage efficiently, which leads to difficulty when moving from one item to the next (Lallier et al., ). This results in atypical perception of rapid stimulus sequences. Until recently, the SAS theory, particularly the examination of amodal attentional deficits, was studied solely through the use of behavioural measures (Facoetti et al., ; Facoetti, Lorusso, Cattaneo, Galli, & Molteni, ). This paper examines evidence within the literature that provides a basis for further exploration of amodal SAS as an underlying deficit in developmental dyslexia.
Collapse
Affiliation(s)
- Margaret B Krause
- University of South Florida, 4202 E Fowler Ave, Tampa, FL, 33620, USA
| |
Collapse
|
29
|
Zhou W, Xia Z, Bi Y, Shu H. Altered connectivity of the dorsal and ventral visual regions in dyslexic children: a resting-state fMRI study. Front Hum Neurosci 2015; 9:495. [PMID: 26441595 PMCID: PMC4564758 DOI: 10.3389/fnhum.2015.00495] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 08/27/2015] [Indexed: 01/23/2023] Open
Abstract
While there is emerging evidence from behavioral studies that visual attention skills are impaired in dyslexia, the corresponding neural mechanism (i.e., deficits in the dorsal visual region) needs further investigation. We used resting-state fMRI to explore the functional connectivity (FC) patterns of the left intraparietal sulcus (IPS) and the visual word form area (VWFA) in dyslexic children (N = 21, age mean = 12) and age-matched controls (N = 26, age mean = 12). The results showed that the left IPS and the VWFA were functionally connected to each other in both groups and that both were functionally connected to left middle frontal gyrus (MFG). Importantly, we observed significant group differences in FC between the left IPS and the left MFG and between the VWFA and the left MFG. In addition, the strengths of the identified FCs were significantly correlated with the score of fluent reading, which required obvious eye movement and visual attention processing, but not with the lexical decision score. We conclude that dyslexics have deficits in the network composed of the prefrontal, dorsal visual and ventral visual regions and may have a lack of modulation from the left MFG to the dorsal and ventral visual regions.
Collapse
Affiliation(s)
- Wei Zhou
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University Beijing, China ; Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University Beijing, China ; Beijing Key Lab of Learning and Cognition, Department of Psychology, Capital Normal University Beijing, China
| | - Zhichao Xia
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University Beijing, China ; Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University Beijing, China
| | - Yanchao Bi
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University Beijing, China ; Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University Beijing, China
| | - Hua Shu
- State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute for Brain Research, Beijing Normal University Beijing, China ; Center for Collaboration and Innovation in Brain and Learning Sciences, Beijing Normal University Beijing, China
| |
Collapse
|
30
|
Felix RA, Magnusson AK, Berrebi AS. The superior paraolivary nucleus shapes temporal response properties of neurons in the inferior colliculus. Brain Struct Funct 2015; 220:2639-52. [PMID: 24973970 PMCID: PMC4278952 DOI: 10.1007/s00429-014-0815-8] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 06/04/2014] [Indexed: 10/25/2022]
Abstract
The mammalian superior paraolivary nucleus (SPON) is a major source of GABAergic inhibition to neurons in the inferior colliculus (IC), a well-studied midbrain nucleus that is the site of convergence and integration for the majority ascending auditory pathways en route to the cortex. Neurons in the SPON and IC exhibit highly precise responses to temporal sound features, which are important perceptual cues for naturally occurring sounds. To determine how inhibitory input from the SPON contributes to the encoding of temporal information in the IC, a reversible inactivation procedure was conducted to silence SPON neurons, while recording responses to amplitude-modulated tones and silent gaps between tones in the IC. The results show that SPON-derived inhibition shapes responses of onset and sustained units in the IC via different mechanisms. Onset neurons appear to be driven primarily by excitatory inputs and their responses are shaped indirectly by SPON-derived inhibition, whereas sustained neurons are heavily influenced directly by transient offset inhibition from the SPON. The findings also demonstrate that a more complete dissection of temporal processing pathways is critical for understanding how biologically important sounds are encoded by the brain.
Collapse
Affiliation(s)
- Richard A. Felix
- Department of Otolaryngology–Head and Neck Surgery and the Sensory Neuroscience Research Center, West Virginia University School of Medicine, Morgantown, West Virginia 26506 USA
| | - Anna K. Magnusson
- Center for Hearing and Communication Research, Karolinska Institutet and Department of Clinical Science, Intervention and Technology, Karolinska University Hospital, 17176 Stockholm, Sweden
| | - Albert S. Berrebi
- Department of Otolaryngology–Head and Neck Surgery and the Sensory Neuroscience Research Center, West Virginia University School of Medicine, Morgantown, West Virginia 26506 USA
| |
Collapse
|
31
|
Stream segregation in the anesthetized auditory cortex. Hear Res 2015; 328:48-58. [PMID: 26163899 PMCID: PMC4582803 DOI: 10.1016/j.heares.2015.07.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/18/2015] [Revised: 06/25/2015] [Accepted: 07/01/2015] [Indexed: 02/07/2023]
Abstract
Auditory stream segregation describes the way that sounds are perceptually segregated into groups or streams on the basis of perceptual attributes such as pitch or spectral content. For sequences of pure tones, segregation depends on the tones' proximity in frequency and time. In the auditory cortex (and elsewhere) responses to sequences of tones are dependent on stimulus conditions in a similar way to the perception of these stimuli. However, although highly dependent on stimulus conditions, perception is also clearly influenced by factors unrelated to the stimulus, such as attention. Exactly how ‘bottom-up’ sensory processes and non-sensory ‘top-down’ influences interact is still not clear. Here, we recorded responses to alternating tones (ABAB …) of varying frequency difference (FD) and rate of presentation (PR) in the auditory cortex of anesthetized guinea-pigs. These data complement previous studies, in that top-down processing resulting from conscious perception should be absent or at least considerably attenuated. Under anesthesia, the responses of cortical neurons to the tone sequences adapted rapidly, in a manner sensitive to both the FD and PR of the sequences. While the responses to tones at frequencies more distant from neuron best frequencies (BFs) decreased as the FD increased, the responses to tones near to BF increased, consistent with a release from adaptation, or forward suppression. Increases in PR resulted in reductions in responses to all tones, but the reduction was greater for tones further from BF. Although asymptotically adapted responses to tones showed behavior that was qualitatively consistent with perceptual stream segregation, responses reached asymptote within 2 s, and responses to all tones were very weak at high PRs (>12 tones per second). A signal-detection model, driven by the cortical population response, made decisions that were dependent on both FD and PR in ways consistent with perceptual stream segregation. This included showing a range of conditions over which decisions could be made either in favor of perceptual integration or segregation, depending on the model ‘decision criterion’. However, the rate of ‘build-up’ was more rapid than seen perceptually, and at high PR responses to tones were sometimes so weak as to be undetectable by the model. Under anesthesia, adaptation occurs rapidly, and at high PRs tones are generally poorly represented, which compromises the interpretation of the experiment. However, within these limitations, these results complement experiments in awake animals and humans. They generally support the hypothesis that ‘bottom-up’ sensory processing plays a major role in perceptual organization, and that processes underlying stream segregation are active in the absence of attention. We recorded responses of cortical neurons to sequences of tones under anesthesia. Fully adapted responses correlated reasonably with perceptual stream segregation. Responses to tone sequences were weak during rapid tone presentation (>12 Hz). Adaptation under anesthesia is too rapid to account for perceptual ‘build-up’. Neural correlates of stream segregation are not reliant on top-down influences.
Collapse
|
32
|
O'Sullivan JA, Shamma SA, Lalor EC. Evidence for Neural Computations of Temporal Coherence in an Auditory Scene and Their Enhancement during Active Listening. J Neurosci 2015; 35:7256-63. [PMID: 25948273 PMCID: PMC6605258 DOI: 10.1523/jneurosci.4973-14.2015] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2014] [Revised: 03/10/2015] [Accepted: 03/31/2015] [Indexed: 11/21/2022] Open
Abstract
The human brain has evolved to operate effectively in highly complex acoustic environments, segregating multiple sound sources into perceptually distinct auditory objects. A recent theory seeks to explain this ability by arguing that stream segregation occurs primarily due to the temporal coherence of the neural populations that encode the various features of an individual acoustic source. This theory has received support from both psychoacoustic and functional magnetic resonance imaging (fMRI) studies that use stimuli which model complex acoustic environments. Termed stochastic figure-ground (SFG) stimuli, they are composed of a "figure" and background that overlap in spectrotemporal space, such that the only way to segregate the figure is by computing the coherence of its frequency components over time. Here, we extend these psychoacoustic and fMRI findings by using the greater temporal resolution of electroencephalography to investigate the neural computation of temporal coherence. We present subjects with modified SFG stimuli wherein the temporal coherence of the figure is modulated stochastically over time, which allows us to use linear regression methods to extract a signature of the neural processing of this temporal coherence. We do this under both active and passive listening conditions. Our findings show an early effect of coherence during passive listening, lasting from ∼115 to 185 ms post-stimulus. When subjects are actively listening to the stimuli, these responses are larger and last longer, up to ∼265 ms. These findings provide evidence for early and preattentive neural computations of temporal coherence that are enhanced by active analysis of an auditory scene.
Collapse
Affiliation(s)
- James A O'Sullivan
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland, and
| | - Shihab A Shamma
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland, and
| |
Collapse
|
33
|
Andreou LV, Griffiths TD, Chait M. Sensitivity to the temporal structure of rapid sound sequences - An MEG study. Neuroimage 2015; 110:194-204. [PMID: 25659464 PMCID: PMC4389832 DOI: 10.1016/j.neuroimage.2015.01.052] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 12/15/2014] [Accepted: 01/27/2015] [Indexed: 11/28/2022] Open
Abstract
To probe sensitivity to the time structure of ongoing sound sequences, we measured MEG responses, in human listeners, to the offset of long tone-pip sequences containing various forms of temporal regularity. If listeners learn sequence temporal properties and form expectancies about the arrival time of an upcoming tone, sequence offset should be detectable as soon as an expected tone fails to arrive. Therefore, latencies of offset responses are indicative of the extent to which the temporal pattern has been acquired. In Exp1, sequences were isochronous with tone inter-onset-interval (IOI) set to 75, 125 or 225ms. Exp2 comprised of non-isochronous, temporally regular sequences, comprised of the IOIs above. Exp3 used the same sequences as Exp2 but listeners were required to monitor them for occasional frequency deviants. Analysis of the latency of offset responses revealed that the temporal structure of (even rather simple) regular sequences is not learnt precisely when the sequences are ignored. Pattern coding, supported by a network of temporal, parietal and frontal sources, improved considerably when the signals were made behaviourally pertinent. Thus, contrary to what might be expected in the context of an 'early warning system' framework, learning of temporal structure is not automatic, but affected by the signal's behavioural relevance.
Collapse
Affiliation(s)
| | - Timothy D Griffiths
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK; Institute of Neuroscience, Newcastle University Medical School, Newcastle upon Tyne NE2 4HH, UK
| | - Maria Chait
- UCL Ear Institute, 332 Gray's Inn Road, London WC1X 8EE, UK.
| |
Collapse
|
34
|
Montejo N, Noreña AJ. Dynamic representation of spectral edges in guinea pig primary auditory cortex. J Neurophysiol 2015; 113:2998-3012. [PMID: 25744885 PMCID: PMC4416612 DOI: 10.1152/jn.00785.2014] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2014] [Accepted: 03/02/2015] [Indexed: 11/22/2022] Open
Abstract
The central representation of a given acoustic motif is thought to be strongly context dependent, i.e., to rely on the spectrotemporal past and present of the acoustic mixture in which it is embedded. The present study investigated the cortical representation of spectral edges (i.e., where stimulus energy changes abruptly over frequency) and its dependence on stimulus duration and depth of the spectral contrast in guinea pig. We devised a stimulus ensemble composed of random tone pips with or without an attenuated frequency band (AFB) of variable depth. Additionally, the multitone ensemble with AFB was interleaved with periods of silence or with multitone ensembles without AFB. We have shown that the representation of the frequencies near but outside the AFB is greatly enhanced, whereas the representation of frequencies near and inside the AFB is strongly suppressed. These cortical changes depend on the depth of the AFB: although they are maximal for the largest depth of the AFB, they are also statistically significant for depths as small as 10 dB. Finally, the cortical changes are quick, occurring within a few seconds of stimulus ensemble presentation with AFB, and are very labile, disappearing within a few seconds after the presentation without AFB. Overall, this study demonstrates that the representation of spectral edges is dynamically enhanced in the auditory centers. These central changes may have important functional implications, particularly in noisy environments where they could contribute to preserving the central representation of spectral edges.
Collapse
Affiliation(s)
- Noelia Montejo
- Laboratoire de Neurosciences Intégratives et Adaptatives, Aix Marseille Université, CNRS UMR 7260, Marseille, France
| | - Arnaud J Noreña
- Laboratoire de Neurosciences Intégratives et Adaptatives, Aix Marseille Université, CNRS UMR 7260, Marseille, France
| |
Collapse
|
35
|
Liu AS, Tsunada J, Gold JI, Cohen YE. Temporal Integration of Auditory Information Is Invariant to Temporal Grouping Cues. eNeuro 2015; 2:ENEURO.0077-14.2015. [PMID: 26464975 PMCID: PMC4596088 DOI: 10.1523/eneuro.0077-14.2015] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2014] [Revised: 03/01/2015] [Accepted: 03/30/2015] [Indexed: 11/29/2022] Open
Abstract
Auditory perception depends on the temporal structure of incoming acoustic stimuli. Here, we examined whether a temporal manipulation that affects the perceptual grouping also affects the time dependence of decisions regarding those stimuli. We designed a novel discrimination task that required human listeners to decide whether a sequence of tone bursts was increasing or decreasing in frequency. We manipulated temporal perceptual-grouping cues by changing the time interval between the tone bursts, which led to listeners hearing the sequences as a single sound for short intervals or discrete sounds for longer intervals. Despite these strong perceptual differences, this manipulation did not affect the efficiency of how auditory information was integrated over time to form a decision. Instead, the grouping manipulation affected subjects' speed-accuracy trade-offs. These results indicate that the temporal dynamics of evidence accumulation for auditory perceptual decisions can be invariant to manipulations that affect the perceptual grouping of the evidence.
Collapse
Affiliation(s)
| | - Joji Tsunada
- Department of Otorhinolaryngology, Perelman School of Medicine
| | - Joshua I. Gold
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| | - Yale E. Cohen
- Department of Otorhinolaryngology, Perelman School of Medicine
- Department of Neuroscience, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania 19104
| |
Collapse
|
36
|
Smith NA, Joshi S. Neural correlates of auditory stream segregation: an analysis of onset- and change-related responses. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 136:EL295-EL301. [PMID: 25324113 PMCID: PMC4223979 DOI: 10.1121/1.4896414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/28/2014] [Revised: 08/18/2014] [Accepted: 09/12/2014] [Indexed: 06/04/2023]
Abstract
The temporal order discrimination of target tone pairs is hindered by the presence of flanker tones but is improved when the flanker tones are captured by a separate stream of tones that match the flankers in frequency [Bregman and Rudnicky (1975). J. Exp. Psychol. 1, 263-267]. In an event-related potential (ERP) study with these stimuli, listeners' mismatch negativity (MMN) responses were temporally linked to the position of the changing target tones, irrespective of streaming. In contrast, N1 response latency varied as a function of the perceived grouping of flanker tones established by previous behavioral studies, providing a neurophysiological index of auditory stream segregation.
Collapse
Affiliation(s)
- Nicholas A Smith
- Perceptual Development Laboratory, Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska, 68131 ,
| | - Suyash Joshi
- Perceptual Development Laboratory, Boys Town National Research Hospital, 555 North 30th Street, Omaha, Nebraska, 68131 ,
| |
Collapse
|
37
|
Roberts B, Summers RJ, Bailey PJ. Formant-frequency variation and informational masking of speech by extraneous formants: evidence against dynamic and speech-specific acoustical constraints. J Exp Psychol Hum Percept Perform 2014; 40:1507-25. [PMID: 24842068 PMCID: PMC4120706 DOI: 10.1037/a0036629] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 - F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints.
Collapse
|
38
|
Choi I, Wang L, Bharadwaj H, Shinn-Cunningham B. Individual differences in attentional modulation of cortical responses correlate with selective attention performance. Hear Res 2014; 314:10-9. [PMID: 24821552 DOI: 10.1016/j.heares.2014.04.008] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2013] [Revised: 04/18/2014] [Accepted: 04/23/2014] [Indexed: 11/29/2022]
Abstract
Many studies have shown that attention modulates the cortical representation of an auditory scene, emphasizing an attended source while suppressing competing sources. Yet, individual differences in the strength of this attentional modulation and their relationship with selective attention ability are poorly understood. Here, we ask whether differences in how strongly attention modulates cortical responses reflect differences in normal-hearing listeners' selective auditory attention ability. We asked listeners to attend to one of three competing melodies and identify its pitch contour while we measured cortical electroencephalographic responses. The three melodies were either from widely separated pitch ranges ("easy trials"), or from a narrow, overlapping pitch range ("hard trials"). The melodies started at slightly different times; listeners attended either the leading or lagging melody. Because of the timing of the onsets, the leading melody drew attention exogenously. In contrast, attending the lagging melody required listeners to direct top-down attention volitionally. We quantified how attention amplified auditory N1 response to the attended melody and found large individual differences in the N1 amplification, even though only correctly answered trials were used to quantify the ERP gain. Importantly, listeners with the strongest amplification of N1 response to the lagging melody in the easy trials were the best performers across other types of trials. Our results raise the possibility that individual differences in the strength of top-down gain control reflect inherent differences in the ability to control top-down attention.
Collapse
Affiliation(s)
- Inyong Choi
- Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA 02215, USA
| | - Le Wang
- Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA 02215, USA
| | - Hari Bharadwaj
- Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA
| | - Barbara Shinn-Cunningham
- Center for Computational Neuroscience and Neural Technology, Boston University, Boston, MA 02215, USA; Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
39
|
Bressler S, Masud S, Bharadwaj H, Shinn-Cunningham B. Bottom-up influences of voice continuity in focusing selective auditory attention. PSYCHOLOGICAL RESEARCH 2014; 78:349-60. [PMID: 24633644 DOI: 10.1007/s00426-014-0555-7] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Accepted: 02/19/2014] [Indexed: 11/29/2022]
Abstract
Selective auditory attention causes a relative enhancement of the neural representation of important information and suppression of the neural representation of distracting sound, which enables a listener to analyze and interpret information of interest. Some studies suggest that in both vision and in audition, the "unit" on which attention operates is an object: an estimate of the information coming from a particular external source out in the world. In this view, which object ends up in the attentional foreground depends on the interplay of top-down, volitional attention and stimulus-driven, involuntary attention. Here, we test the idea that auditory attention is object based by exploring whether continuity of a non-spatial feature (talker identity, a feature that helps acoustic elements bind into one perceptual object) also influences selective attention performance. In Experiment 1, we show that perceptual continuity of target talker voice helps listeners report a sequence of spoken target digits embedded in competing reversed digits spoken by different talkers. In Experiment 2, we provide evidence that this benefit of voice continuity is obligatory and automatic, as if voice continuity biases listeners by making it easier to focus on a subsequent target digit when it is perceptually linked to what was already in the attentional foreground. Our results support the idea that feature continuity enhances streaming automatically, thereby influencing the dynamic processes that allow listeners to successfully attend to objects through time in the cacophony that assails our ears in many everyday settings.
Collapse
Affiliation(s)
- Scott Bressler
- Center for Computational Neuroscience and Neural Technology, Boston University, 677 Beacon St., Boston, MA, 02421, USA
| | | | | | | |
Collapse
|
40
|
Christison-Lagay KL, Cohen YE. Behavioral correlates of auditory streaming in rhesus macaques. Hear Res 2014; 309:17-25. [PMID: 24239869 PMCID: PMC3991243 DOI: 10.1016/j.heares.2013.11.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Revised: 10/30/2013] [Accepted: 11/03/2013] [Indexed: 11/24/2022]
Abstract
Perceptual representations of auditory stimuli (i.e., sounds) are derived from the auditory system's ability to segregate and group the spectral, temporal, and spatial features of auditory stimuli-a process called "auditory scene analysis". Psychophysical studies have identified several of the principles and mechanisms that underlie a listener's ability to segregate and group acoustic stimuli. One important psychophysical task that has illuminated many of these principles and mechanisms is the "streaming" task. Despite the wide use of this task to study psychophysical mechanisms of human audition, no studies have explicitly tested the streaming abilities of non-human animals using the standard methodologies employed in human-audition studies. Here, we trained rhesus macaques to participate in the streaming task using methodologies and controls similar to those presented in previous human studies. Overall, we found that the monkeys' behavioral reports were qualitatively consistent with those of human listeners, thus suggesting that this task may be a valuable tool for future neurophysiological studies.
Collapse
Affiliation(s)
| | - Yale E Cohen
- Dept. Otorhinolaryngology and Neuroscience, Perelman School of Medicine, U. Pennsylvania, Philadelphia, PA 19104, USA; Dept. Bioengineering, U. Pennsylvania, Philadelphia, PA, 19104, USA
| |
Collapse
|
41
|
Nourski KV, Steinschneider M, Oya H, Kawasaki H, Jones RD, Howard MA. Spectral organization of the human lateral superior temporal gyrus revealed by intracranial recordings. Cereb Cortex 2014; 24:340-52. [PMID: 23048019 PMCID: PMC3888366 DOI: 10.1093/cercor/bhs314] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The place of the posterolateral superior temporal (PLST) gyrus within the hierarchical organization of the human auditory cortex is unknown. Understanding how PLST processes spectral information is imperative for its functional characterization. Pure-tone stimuli were presented to subjects undergoing invasive monitoring for refractory epilepsy. Recordings were made using high-density subdural grid electrodes. Pure tones elicited robust high gamma event-related band power responses along a portion of PLST adjacent to the transverse temporal sulcus (TTS). Responses were frequency selective, though typically broadly tuned. In several subjects, mirror-image response patterns around a low-frequency center were observed, but typically, more complex and distributed patterns were seen. Frequency selectivity was greatest early in the response. Classification analysis using a sparse logistic regression algorithm yielded above-chance accuracy in all subjects. Classifier performance typically peaked at 100-150 ms after stimulus onset, was comparable for the left and right hemisphere cases, and was stable across stimulus intensities. Results demonstrate that representations of spectral information within PLST are temporally dynamic and contain sufficient information for accurate discrimination of tone frequencies. PLST adjacent to the TTS appears to be an early stage in the hierarchy of cortical auditory processing. Pure-tone response patterns may aid auditory field identification.
Collapse
Affiliation(s)
| | - Mitchell Steinschneider
- Department of Neurology
- Department of Neuroscience, Albert Einstein College of Medicine, New York, NY 10461, USA
| | | | | | - Robert D. Jones
- Department of Neurology, The University of Iowa, Iowa, IA 52242, USA
| | | |
Collapse
|
42
|
An objective measure of auditory stream segregation based on molecular psychophysics. Atten Percept Psychophys 2014; 76:829-51. [DOI: 10.3758/s13414-013-0613-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
43
|
Abstract
The fundamental perceptual unit in hearing is the 'auditory object'. Similar to visual objects, auditory objects are the computational result of the auditory system's capacity to detect, extract, segregate and group spectrotemporal regularities in the acoustic environment; the multitude of acoustic stimuli around us together form the auditory scene. However, unlike the visual scene, resolving the component objects within the auditory scene crucially depends on their temporal structure. Neural correlates of auditory objects are found throughout the auditory system. However, neural responses do not become correlated with a listener's perceptual reports until the level of the cortex. The roles of different neural structures and the contribution of different cognitive states to the perception of auditory objects are not yet fully understood.
Collapse
|
44
|
Wang Q, Bao M, Chen L. The role of spatiotemporal and spectral cues in segregating short sound events: evidence from auditory Ternus display. Exp Brain Res 2013; 232:273-82. [PMID: 24141518 DOI: 10.1007/s00221-013-3738-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 10/03/2013] [Indexed: 11/30/2022]
Abstract
Previous studies using auditory sequences with rapid repetition of tones revealed that spatiotemporal cues and spectral cues are important cues used to fuse or segregate sound streams. However, the perceptual grouping was partially driven by the cognitive processing of the periodicity cues of the long sequence. Here, we investigate whether perceptual groupings (spatiotemporal grouping vs. frequency grouping) could also be applicable to short auditory sequences, where auditory perceptual organization is mainly subserved by lower levels of perceptual processing. To find the answer to that question, we conducted two experiments using an auditory Ternus display. The display was composed of three speakers (A, B and C), with each speaker consecutively emitting one sound consisting of two frames (AB and BC). Experiment 1 manipulated both spatial and temporal factors. We implemented three 'within-frame intervals' (WFIs, or intervals between A and B, and between B and C), seven 'inter-frame intervals' (IFIs, or intervals between AB and BC) and two different speaker layouts (inter-distance of speakers: near or far). Experiment 2 manipulated the differentiations of frequencies between two auditory frames, in addition to the spatiotemporal cues as in Experiment 1. Listeners were required to make two alternative forced choices (2AFC) to report the perception of a given Ternus display: element motion (auditory apparent motion from sound A to B to C) or group motion (auditory apparent motion from sound 'AB' to 'BC'). The results indicate that the perceptual grouping of short auditory sequences (materialized by the perceptual decisions of the auditory Ternus display) was modulated by temporal and spectral cues, with the latter contributing more to segregating auditory events. Spatial layout plays a less role in perceptual organization. These results could be accounted for by the 'peripheral channeling' theory.
Collapse
Affiliation(s)
- Qingcui Wang
- Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China,
| | | | | |
Collapse
|
45
|
Teki S, Chait M, Kumar S, Shamma S, Griffiths TD. Segregation of complex acoustic scenes based on temporal coherence. eLife 2013; 2:e00699. [PMID: 23898398 PMCID: PMC3721234 DOI: 10.7554/elife.00699] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2013] [Accepted: 06/16/2013] [Indexed: 11/13/2022] Open
Abstract
In contrast to the complex acoustic environments we encounter everyday, most studies of auditory segregation have used relatively simple signals. Here, we synthesized a new stimulus to examine the detection of coherent patterns (‘figures’) from overlapping ‘background’ signals. In a series of experiments, we demonstrate that human listeners are remarkably sensitive to the emergence of such figures and can tolerate a variety of spectral and temporal perturbations. This robust behavior is consistent with the existence of automatic auditory segregation mechanisms that are highly sensitive to correlations across frequency and time. The observed behavior cannot be explained purely on the basis of adaptation-based models used to explain the segregation of deterministic narrowband signals. We show that the present results are consistent with the predictions of a model of auditory perceptual organization based on temporal coherence. Our data thus support a role for temporal coherence as an organizational principle underlying auditory segregation. DOI:http://dx.doi.org/10.7554/eLife.00699.001 Even when seated in the middle of a crowded restaurant, we are still able to distinguish the speech of the person sitting opposite us from the conversations of fellow diners and a host of other background noise. While we generally perform this task almost effortlessly, it is unclear how the brain solves what is in reality a complex information processing problem. In the 1970s, researchers began to address this question using stimuli consisting of simple tones. When subjects are played a sequence of alternating high and low frequency tones, they perceive them as two independent streams of sound. Similar experiments in macaque monkeys reveal that each stream activates a different area of auditory cortex, suggesting that the brain may distinguish acoustic stimuli on the basis of their frequency. However, the simple tones that are used in laboratory experiments bear little resemblance to the complex sounds we encounter in everyday life. These are often made up of multiple frequencies, and overlap—both in frequency and in time—with other sounds in the environment. Moreover, recent experiments have shown that if a subject hears two tones simultaneously, he or she perceives them as belonging to a single stream of sound even if they have different frequencies: models that assume that we distinguish stimuli from noise on the basis of frequency alone struggle to explain this observation. Now, Teki, Chait, et al. have used more complex sounds, in which frequency components of the target stimuli overlap with those of background signals, to obtain new insights into how the brain solves this problem. Subjects were extremely good at discriminating these complex target stimuli from background noise, and computational modelling confirmed that they did so via integration of both frequency and temporal information. The work of Teki, Chait, et al. thus offers the first explanation for our ability to home in on speech and other pertinent sounds, even amidst a sea of background noise. DOI:http://dx.doi.org/10.7554/eLife.00699.002
Collapse
Affiliation(s)
- Sundeep Teki
- Wellcome Trust Centre for Neuroimaging , University College London , London , United Kingdom
| | | | | | | | | |
Collapse
|
46
|
Catz N, Noreña AJ. Enhanced representation of spectral contrasts in the primary auditory cortex. Front Syst Neurosci 2013; 7:21. [PMID: 23801943 PMCID: PMC3686080 DOI: 10.3389/fnsys.2013.00021] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2013] [Accepted: 05/23/2013] [Indexed: 11/15/2022] Open
Abstract
The role of early auditory processing may be to extract some elementary features from an acoustic mixture in order to organize the auditory scene. To accomplish this task, the central auditory system may rely on the fact that sensory objects are often composed of spectral edges, i.e., regions where the stimulus energy changes abruptly over frequency. The processing of acoustic stimuli may benefit from a mechanism enhancing the internal representation of spectral edges. While the visual system is thought to rely heavily on this mechanism (enhancing spatial edges), it is still unclear whether a related process plays a significant role in audition. We investigated the cortical representation of spectral edges, using acoustic stimuli composed of multi-tone pips whose time-averaged spectral envelope contained suppressed or enhanced regions. Importantly, the stimuli were designed such that neural responses properties could be assessed as a function of stimulus frequency during stimulus presentation. Our results suggest that the representation of acoustic spectral edges is enhanced in the auditory cortex, and that this enhancement is sensitive to the characteristics of the spectral contrast profile, such as depth, sharpness and width. Spectral edges are maximally enhanced for sharp contrast and large depth. Cortical activity was also suppressed at frequencies within the suppressed region. To note, the suppression of firing was larger at frequencies nearby the lower edge of the suppressed region than at the upper edge. Overall, the present study gives critical insights into the processing of spectral contrasts in the auditory system.
Collapse
Affiliation(s)
- Nicolas Catz
- Laboratory of Adaptive and Integrative Neurobiology, Fédération de recherche 3C, UMR CNRS 7260, Université Aix-Marseille Marseille, France
| | | |
Collapse
|
47
|
Micheyl C, Kreft H, Shamma S, Oxenham AJ. Temporal coherence versus harmonicity in auditory stream formation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 133:EL188-EL194. [PMID: 23464127 PMCID: PMC3579859 DOI: 10.1121/1.4789866] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2012] [Revised: 01/02/2013] [Accepted: 01/16/2013] [Indexed: 06/01/2023]
Abstract
This study sought to investigate the influence of temporal incoherence and inharmonicity on concurrent stream segregation, using performance-based measures. Subjects discriminated frequency shifts in a temporally regular sequence of target pure tones, embedded in a constant or randomly varying multi-tone background. Depending on the condition tested, the target tones were either temporally coherent or incoherent with, and either harmonically or inharmonically related to, the background tones. The results provide further evidence that temporal incoherence facilitates stream segregation and they suggest that deviations from harmonicity can cause similar facilitation effects, even when the targets and the maskers are temporally coherent.
Collapse
Affiliation(s)
- Christophe Micheyl
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA.
| | | | | | | |
Collapse
|
48
|
|
49
|
Shinn-Cunningham B, Ruggles DR, Bharadwaj H. How early aging and environment interact in everyday listening: from brainstem to behavior through modeling. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2013; 787:501-10. [PMID: 23716257 PMCID: PMC4629495 DOI: 10.1007/978-1-4614-1590-9_55] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We recently showed that listeners with normal hearing thresholds vary in their ability to direct spatial attention and that ability is related to the fidelity of temporal coding in the brainstem. Here, we recruited additional middle-aged listeners and extended our analysis of the brainstem response, measured using the frequency-following response (FFR). We found that even though age does not predict overall selective attention ability, middle-aged listeners are more susceptible to the detrimental effects of reverberant energy than young adults. We separated the overall FFR into orthogonal envelope and carrier components and used an existing model to predict which auditory channels drive each component. We find that responses in mid- to high-frequency auditory channels dominate envelope FFR, while lower-frequency channels dominate the carrier FFR. Importantly, we find that which component of the FFR predicts selective attention performance changes with age. We suggest that early aging degrades peripheral temporal coding in mid-to-high frequencies, interfering with the coding of envelope interaural time differences. We argue that, compared to young adults, middle-aged listeners, who do not have strong temporal envelope coding, have more trouble following a conversation in a reverberant room because they are forced to rely on fragile carrier ITDs that are susceptible to the degrading effects of reverberation.
Collapse
Affiliation(s)
- Barbara Shinn-Cunningham
- Department of Biomedical Engineering, Boston University Center for Computational Neuroscience and Neural Technology, Boston, MA 02215, USA.
| | | | | |
Collapse
|
50
|
Oberfeld D, Stahn P. Sequential grouping modulates the effect of non-simultaneous masking on auditory intensity resolution. PLoS One 2012; 7:e48054. [PMID: 23110174 PMCID: PMC3480468 DOI: 10.1371/journal.pone.0048054] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 09/26/2012] [Indexed: 11/22/2022] Open
Abstract
The presence of non-simultaneous maskers can result in strong impairment in auditory intensity resolution relative to a condition without maskers, and causes a complex pattern of effects that is difficult to explain on the basis of peripheral processing. We suggest that the failure of selective attention to the target tones is a useful framework for understanding these effects. Two experiments tested the hypothesis that the sequential grouping of the targets and the maskers into separate auditory objects facilitates selective attention and therefore reduces the masker-induced impairment in intensity resolution. In Experiment 1, a condition favoring the processing of the maskers and the targets as two separate auditory objects due to grouping by temporal proximity was contrasted with the usual forward masking setting where the masker and the target presented within each observation interval of the two-interval task can be expected to be grouped together. As expected, the former condition resulted in a significantly smaller masker-induced elevation of the intensity difference limens (DLs). In Experiment 2, embedding the targets in an isochronous sequence of maskers led to a significantly smaller DL-elevation than control conditions not favoring the perception of the maskers as a separate auditory stream. The observed effects of grouping are compatible with the assumption that a precise representation of target intensity is available at the decision stage, but that this information is used only in a suboptimal fashion due to limitations of selective attention. The data can be explained within a framework of object-based attention. The results impose constraints on physiological models of intensity discrimination. We discuss candidate structures for physiological correlates of the psychophysical data.
Collapse
Affiliation(s)
- Daniel Oberfeld
- Department of Psychology, Section Experimental Psychology, Johannes Gutenberg-Universität Mainz, Mainz, Germany.
| | | |
Collapse
|