1
|
Razzaghipour A, Ashrafi M, Mohammadzadeh A. A Review of Auditory Attention: Neural Mechanisms, Theories, and Affective Disorders. Indian J Otolaryngol Head Neck Surg 2024; 76:2250-2256. [PMID: 38883545 PMCID: PMC11169100 DOI: 10.1007/s12070-023-04373-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Accepted: 11/17/2023] [Indexed: 06/18/2024] Open
Abstract
Attention is a fundamental aspect of human cognitive function and is crucial for essential activities such as learning, social interaction, and routine tasks. Notably, Auditory attention involves complex interactions and collaboration among multiple brain networks. Recognizing the impairment of auditory attention, comprehending its underlying mechanisms, and identifying the activated brain regions essential for the development of treatments and interventions for individuals facing auditory attention deficits, emphasizes the significance of investigating these matters. In the current study, we conducted a review by searching for the full text of 53 articles published related to auditory attention, mechanisms, and networks in databases like Science Direct, Google Scholar, ProQuest, and PubMed using the keywords Attention, Auditory Attention, Auditory Attention Impairment, theories of attention were investigated in the years 2000 to 2023 And focused on articles that provided discussions within this research domain. The studies have demonstrated that auditory attention exceeds being an acoustic attribute and assumes a fundamental role in complex acoustic environments, information processing, and even speech comprehension. In the context of this study, we have conducted a review and summary of the proposed theories related to attention and the brain networks involved in different forms of auditory attention. In conclusion, the integration of auditory attention assessments, behavioral observations, and an understanding of the neural mechanisms and brain regions implicated in auditory attention proves to be an effective approach for the diagnosis and treatment of attention-related disorders.
Collapse
Affiliation(s)
- Amirreza Razzaghipour
- Student Research Committee, Department of Audiology, Faculty of Rehabilitation, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Majid Ashrafi
- Department of Audiology, Faculty of Rehabilitation, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Ali Mohammadzadeh
- Department of Audiology, Faculty of Rehabilitation, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
Ying R, Stolzberg DJ, Caras ML. Neural correlates of flexible sound perception in the auditory midbrain and thalamus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589266. [PMID: 38645241 PMCID: PMC11030403 DOI: 10.1101/2024.04.12.589266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Hearing is an active process in which listeners must detect and identify sounds, segregate and discriminate stimulus features, and extract their behavioral relevance. Adaptive changes in sound detection can emerge rapidly, during sudden shifts in acoustic or environmental context, or more slowly as a result of practice. Although we know that context- and learning-dependent changes in the spectral and temporal sensitivity of auditory cortical neurons support many aspects of flexible listening, the contribution of subcortical auditory regions to this process is less understood. Here, we recorded single- and multi-unit activity from the central nucleus of the inferior colliculus (ICC) and the ventral subdivision of the medial geniculate nucleus (MGV) of Mongolian gerbils under two different behavioral contexts: as animals performed an amplitude modulation (AM) detection task and as they were passively exposed to AM sounds. Using a signal detection framework to estimate neurometric sensitivity, we found that neural thresholds in both regions improved during task performance, and this improvement was driven by changes in firing rate rather than phase locking. We also found that ICC and MGV neurometric thresholds improved and correlated with behavioral performance as animals learn to detect small AM depths during a multi-day perceptual training paradigm. Finally, we reveal that in the MGV, but not the ICC, context-dependent enhancements in AM sensitivity grow stronger during perceptual training, mirroring prior observations in the auditory cortex. Together, our results suggest that the auditory midbrain and thalamus contribute to flexible sound processing and perception over rapid and slow timescales.
Collapse
Affiliation(s)
- Rose Ying
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, Maryland, 20742
- Department of Biology, University of Maryland, College Park, Maryland, 20742
- Center for Comparative and Evolutionary Biology of Hearing, University of Maryland, College Park, Maryland, 20742
| | - Daniel J. Stolzberg
- Department of Biology, University of Maryland, College Park, Maryland, 20742
| | - Melissa L. Caras
- Neuroscience and Cognitive Science Program, University of Maryland, College Park, Maryland, 20742
- Department of Biology, University of Maryland, College Park, Maryland, 20742
- Center for Comparative and Evolutionary Biology of Hearing, University of Maryland, College Park, Maryland, 20742
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland, 20742
| |
Collapse
|
3
|
Eqlimi E, Bockstael A, Schönwiesner M, Talsma D, Botteldooren D. Time course of EEG complexity reflects attentional engagement during listening to speech in noise. Eur J Neurosci 2023; 58:4043-4069. [PMID: 37814423 DOI: 10.1111/ejn.16159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 08/31/2023] [Accepted: 09/13/2023] [Indexed: 10/11/2023]
Abstract
Auditory distractions are recognized to considerably challenge the quality of information encoding during speech comprehension. This study explores electroencephalography (EEG) microstate dynamics in ecologically valid, noisy settings, aiming to uncover how these auditory distractions influence the process of information encoding during speech comprehension. We examined three listening scenarios: (1) speech perception with background noise (LA), (2) focused attention on the background noise (BA), and (3) intentional disregard of the background noise (BUA). Our findings showed that microstate complexity and unpredictability increased when attention was directed towards speech compared with tasks without speech (LA > BA & BUA). Notably, the time elapsed between the recurrence of microstates increased significantly in LA compared with both BA and BUA. This suggests that coping with background noise during speech comprehension demands more sustained cognitive effort. Additionally, a two-stage time course for both microstate complexity and alpha-to-theta power ratio was observed. Specifically, in the early epochs, a lower level was observed, which gradually increased and eventually reached a steady level in the later epochs. The findings suggest that the initial stage is primarily driven by sensory processes and information gathering, while the second stage involves higher level cognitive engagement, including mnemonic binding and memory encoding.
Collapse
Affiliation(s)
- Ehsan Eqlimi
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium
| | - Annelies Bockstael
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium
| | | | - Durk Talsma
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Dick Botteldooren
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium
| |
Collapse
|
4
|
Weise A, Grimm S, Maria Rimmele J, Schröger E. Auditory representations for long lasting sounds: Insights from event-related brain potentials and neural oscillations. BRAIN AND LANGUAGE 2023; 237:105221. [PMID: 36623340 DOI: 10.1016/j.bandl.2022.105221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 12/26/2022] [Accepted: 12/27/2022] [Indexed: 06/17/2023]
Abstract
The basic features of short sounds, such as frequency and intensity including their temporal dynamics, are integrated in a unitary representation. Knowledge on how our brain processes long lasting sounds is scarce. We review research utilizing the Mismatch Negativity event-related potential and neural oscillatory activity for studying representations for long lasting simple versus complex sounds such as sinusoidal tones versus speech. There is evidence for a temporal constraint in the formation of auditory representations: Auditory edges like sound onsets within long lasting sounds open a temporal window of about 350 ms in which the sounds' dynamics are integrated into a representation, while information beyond that window contributes less to that representation. This integration window segments the auditory input into short chunks. We argue that the representations established in adjacent integration windows can be concatenated into an auditory representation of a long sound, thus, overcoming the temporal constraint.
Collapse
Affiliation(s)
- Annekathrin Weise
- Department of Psychology, Ludwig-Maximilians-University Munich, Germany; Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Sabine Grimm
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| | - Johanna Maria Rimmele
- Department of Neuroscience, Max-Planck-Institute for Empirical Aesthetics, Germany; Center for Language, Music and Emotion, New York University, Max Planck Institute, Department of Psychology, 6 Washington Place, New York, NY 10003, United States.
| | - Erich Schröger
- Wilhelm Wundt Institute for Psychology, Leipzig University, Germany.
| |
Collapse
|
5
|
Harrison J, Archer-Boyd AW, Francombe J, Pike C, Murphy DT. The relationship between environmental context and attentional engagement in podcast listening experiences. Front Psychol 2023; 13:1074320. [PMID: 36726519 PMCID: PMC9885971 DOI: 10.3389/fpsyg.2022.1074320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 12/14/2022] [Indexed: 01/18/2023] Open
Abstract
Introduction Previous research has shown that podcasts are most frequently consumed using mobile listening devices across a wide variety of environmental, situational, and social contexts. To date, no studies have investigated how an individual's environmental context might influence their attentional engagement in podcast listening experiences. Improving understanding of the contexts in which episodes of listening take place, and how they might affect listener engagement, could be highly valuable to researchers and producers working in the fields of object-based and personalized media. Methods An online questionnaire on listening habits and behaviors was distributed to a sample of 264 podcast listeners. An exploratory factor analysis was run to identify factors of environmental context that influence attentional engagement in podcast listening experiences. Five aspects of podcast listening engagement were also defined and measured across the sample. Results The exploratory factor analysis revealed five factors of environmental context labeled as: outdoors, indoors & at home, evenings, soundscape & at work, and exercise. The aspects of podcast listening engagement provided a comprehensive quantitative account of contemporary podcast listening experiences. Discussion The results presented support the hypothesis that elements of a listener's environmental context can influence their attentional engagement in podcast listening experiences. The soundscape & at work factor suggests that some listeners actively choose to consume podcasts to mask disturbing stimuli in their surrounding soundscape. Further analysis suggested that the proposed factors of environmental context were positively correlated with the measured aspects of podcast listening engagement. The results are highly pertinent to the fields of podcast studies, mobile listening experiences, and personalized media, and provide a basis for researchers seeking to explore how other forms of listening context might influence attentional engagement.
Collapse
Affiliation(s)
- Jay Harrison
- AudioLab, School of Physics, Engineering and Technology, University of York, York, United Kingdom,*Correspondence: Jay Harrison ✉
| | | | - Jon Francombe
- BBC Research and Development, London, United Kingdom
| | - Chris Pike
- BBC Research and Development, London, United Kingdom
| | - Damian T. Murphy
- AudioLab, School of Physics, Engineering and Technology, University of York, York, United Kingdom
| |
Collapse
|
6
|
Shofner WP. Cochlear tuning and the peripheral representation of harmonic sounds in mammals. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2023; 209:145-161. [PMID: 35867137 DOI: 10.1007/s00359-022-01560-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 06/24/2022] [Accepted: 07/01/2022] [Indexed: 02/07/2023]
Abstract
Albert Feng was a prominent comparative neurophysiologist whose research provided numerous contributions towards understanding how the spectral and temporal characteristics of vocalizations underlie sound communication in frogs and bats. The present study is dedicated to Al's memory and compares the spectral and temporal representations of stochastic, complex sounds which underlie the perception of pitch strength in humans and chinchillas. Specifically, the pitch strengths of these stochastic sounds differ between humans and chinchillas, suggesting that humans and chinchillas may be using different cues. Outputs of auditory filterbank models based on human and chinchilla cochlear tuning were examined. Excitation patterns of harmonics are enhanced in humans as compared with chinchillas. In contrast, summary correlograms are degraded in humans as compared with chinchillas. Comparing summary correlograms and excitation patterns with corresponding behavioral data on pitch strength suggests that the dominant cue for pitch strength in humans is spectral (i.e., harmonic) structure, whereas the dominant cue for chinchillas is temporal (i.e., envelope) structure. The results support arguments that the broader cochlear tuning in non-human mammals emphasizes temporal cues for pitch perception, whereas the sharper cochlear tuning in humans emphasizes spectral cues.
Collapse
Affiliation(s)
- William P Shofner
- Department of Speech, Language and Hearing Sciences, Indiana University, 2631 East Discovery Parkway, Bloomington, IN, 47408, USA.
| |
Collapse
|
7
|
Wu J, Xu C, Han X, Zhou D, Zhang M, Li H, Tan KC. Progressive Tandem Learning for Pattern Recognition With Deep Spiking Neural Networks. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:7824-7840. [PMID: 34546918 DOI: 10.1109/tpami.2021.3114196] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Spiking neural networks (SNNs) have shown clear advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency, due to their event-driven nature and sparse communication. However, the training of deep SNNs is not straightforward. In this paper, we propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition, which is referred to as progressive tandem learning. By studying the equivalence between ANNs and SNNs in the discrete representation space, a primitive network conversion method is introduced that takes full advantage of spike count to approximate the activation value of ANN neurons. To compensate for the approximation errors arising from the primitive network conversion, we further introduce a layer-wise learning method with an adaptive training scheduler to fine-tune the network weights. The progressive tandem learning framework also allows hardware constraints, such as limited weight precision and fan-in connections, to be progressively imposed during training. The SNNs thus trained have demonstrated remarkable classification and regression capabilities on large-scale object recognition, image reconstruction, and speech separation tasks, while requiring at least an order of magnitude reduced inference time and synaptic operations than other state-of-the-art SNN implementations. It, therefore, opens up a myriad of opportunities for pervasive mobile and embedded devices with a limited power budget.
Collapse
|
8
|
Xie Y, Ma J. How to discern external acoustic waves in a piezoelectric neuron under noise? J Biol Phys 2022; 48:339-353. [PMID: 35948818 PMCID: PMC9411441 DOI: 10.1007/s10867-022-09611-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 07/27/2022] [Indexed: 10/15/2022] Open
Abstract
Biological neurons keep sensitive to external stimuli and appropriate firing modes can be triggered to give effective response to external chemical and physical signals. A piezoelectric neural circuit can perceive external voice and nonlinear vibration by generating equivalent piezoelectric voltage, which can generate an equivalent trans-membrane current for inducing a variety of firing modes in the neural activities. Biological neurons can receive external stimuli from more ion channels and synapse synchronously, but the further encoding and priority in mode selection are competitive. In particular, noisy disturbance and electromagnetic radiation make it more difficult in signals identification and mode selection in the firing patterns of neurons driven by multi-channel signals. In this paper, two different periodic signals accompanied by noise are used to excite the piezoelectric neural circuit, and the signal processing in the piezoelectric neuron driven by acoustic waves under noise is reproduced and explained. The physical energy of the piezoelectric neural circuit and Hamilton energy in the neuron driven by mixed signals are calculated to explain the biophysical mechanism of auditory neuron when external stimuli are applied. It is found that the neuron prefers to respond to the external stimulus with higher physical energy and the signal which can increase the Hamilton energy of the neuron. For example, stronger inputs used to inject higher energy and it is detected and responded more sensitively. The involvement of noise is helpful to detect the external signal under stochastic resonance, and the additive noise changes the excitability of neuron as the external stimulus. The results indicate that energy controls the firing patterns and mode selection in neurons, and it provides clues to control the neural activities by injecting appropriate energy into the neurons and network.
Collapse
Affiliation(s)
- Ying Xie
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China
| | - Jun Ma
- Department of Physics, Lanzhou University of Technology, Lanzhou, 730050, China.
- School of Science, Chongqing University of Posts and Telecommunications, Chongqing, 430065, China.
| |
Collapse
|
9
|
Luberadzka J, Kayser H, Hohmann V. Making sense of periodicity glimpses in a prediction-update-loop-A computational model of attentive voice tracking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:712. [PMID: 35232067 PMCID: PMC9088677 DOI: 10.1121/10.0009337] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Revised: 11/13/2021] [Accepted: 01/03/2022] [Indexed: 06/14/2023]
Abstract
Humans are able to follow a speaker even in challenging acoustic conditions. The perceptual mechanisms underlying this ability remain unclear. A computational model of attentive voice tracking, consisting of four computational blocks: (1) sparse periodicity-based auditory features (sPAF) extraction, (2) foreground-background segregation, (3) state estimation, and (4) top-down knowledge, is presented. The model connects the theories about auditory glimpses, foreground-background segregation, and Bayesian inference. It is implemented with the sPAF, sequential Monte Carlo sampling, and probabilistic voice models. The model is evaluated by comparing it with the human data obtained in the study by Woods and McDermott [Curr. Biol. 25(17), 2238-2246 (2015)], which measured the ability to track one of two competing voices with time-varying parameters [fundamental frequency (F0) and formants (F1,F2)]. Three model versions were tested, which differ in the type of information used for the segregation: version (a) uses the oracle F0, version (b) uses the estimated F0, and version (c) uses the spectral shape derived from the estimated F0 and oracle F1 and F2. Version (a) simulates the optimal human performance in conditions with the largest separation between the voices, version (b) simulates the conditions in which the separation in not sufficient to follow the voices, and version (c) is closest to the human performance for moderate voice separation.
Collapse
Affiliation(s)
- Joanna Luberadzka
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Hendrik Kayser
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| | - Volker Hohmann
- Auditory Signal Processing, Department of Medical Physics and Acoustics, University of Oldenburg, Germany
| |
Collapse
|
10
|
Aldag N, Büchner A, Lenarz T, Nogueira W. Towards decoding selective attention through cochlear implant electrodes as sensors in subjects with contralateral acoustic hearing. J Neural Eng 2022; 19. [DOI: 10.1088/1741-2552/ac4de6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 01/21/2022] [Indexed: 11/12/2022]
Abstract
Abstract
Objectives: Focusing attention on one speaker in a situation with multiple background speakers or noise is referred to as auditory selective attention. Decoding selective attention is an interesting line of research with respect to future brain-guided hearing aids or cochlear implants (CIs) that are designed to adaptively adjust sound processing through cortical feedback loops. This study investigates the feasibility of using the electrodes and backward telemetry of a CI to record electroencephalography (EEG). Approach: The study population included 6 normal-hearing (NH) listeners and 5 CI users with contralateral acoustic hearing. Cortical auditory evoked potentials (CAEP) and selective attention were recorded using a state-of-the-art high-density scalp EEG and, in the case of CI users, also using two CI electrodes as sensors in combination with the backward telemetry system of these devices (iEEG). Main results: In the selective attention paradigm with multi-channel scalp EEG the mean decoding accuracy across subjects was 94.8 % and 94.6 % for NH listeners and CI users, respectively. With single-channel scalp EEG the accuracy dropped but was above chance level in 8 to 9 out of 11 subjects, depending on the electrode montage. With the single-channel iEEG, the selective attention decoding accuracy could only be analyzed in 2 out of 5 CI users due to a loss of data in the other 3 subjects. In these 2 CI users, the selective attention decoding accuracy was above chance level. Significance: This study shows that single-channel EEG is suitable for auditory selective attention decoding, even though it reduces the decoding quality compared to a multi-channel approach. CI-based iEEG can be used for the purpose of recording CAEPs and decoding selective attention. However, the study also points out the need for further technical development for the CI backward telemetry regarding long-term recordings and the optimal sensor positions.
Collapse
|
11
|
Shirakura M, Kawase T, Kanno A, Ohta J, Nakasato N, Kawashima R, Katori Y. Different contra-sound effects between noise and music stimuli seen in N1m and psychophysical responses. PLoS One 2021; 16:e0261637. [PMID: 34928999 PMCID: PMC8687558 DOI: 10.1371/journal.pone.0261637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2021] [Accepted: 12/06/2021] [Indexed: 11/26/2022] Open
Abstract
Auditory-evoked responses can be affected by the sound presented to the contralateral ear. The different contra-sound effects between noise and music stimuli on N1m responses of auditory-evoked fields and those on psychophysical response were examined in 12 and 15 subjects, respectively. In the magnetoencephalographic study, the stimulus to elicit the N1m response was a tone burst of 500 ms duration at a frequency of 250 Hz, presented at a level of 70 dB, and white noise filtered with high-pass filter at 2000 Hz and music stimuli filtered with high-pass filter at 2000 Hz were used as contralateral noise. The contralateral stimuli (noise or music) were presented in 10 dB steps from 80 dB to 30 dB. Subjects were instructed to focus their attention to the left ear and to press the response button each time they heard burst stimuli presented to the left ear. In the psychophysical study, the effects of contralateral sound presentation on the response time for detection of the probe sound of a 250 Hz tone burst presented at a level of 70 dB were examined for the same contra-noise and contra-music used in the magnetoencephalographic study. The amplitude reduction and latency delay of N1m caused by contra-music stimuli were significantly larger than those by contra-noise stimuli in bilateral hemisphere, even for low level of contra-music near the psychophysical threshold. Moreover, this larger suppressive effect induced by contra-music effects was also observed psychophysically; i.e., the change in response time for detection of the probe sound was significantly longer by adding contralateral music stimuli than by adding contra-noise stimuli. Regarding differences in effect between contra-music and contra-noise, differences in the degree of saliency may be responsible for their different abilities to disturb auditory attention to the probe sound, but further investigation is required to confirm this hypothesis.
Collapse
Affiliation(s)
- Masayuki Shirakura
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Tetsuaki Kawase
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
- Laboratory of Rehabilitative Auditory Science, Tohoku University Graduate School of Biomedical Engineering, Sendai, Miyagi, Japan
- Department of Audiology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
- * E-mail:
| | - Akitake Kanno
- Department of Electromagnetic Neurophysiology, Tohoku University School of Medicine, Sendai, Miyagi, Japan
- Department of Epileptology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Jun Ohta
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Nobukazu Nakasato
- Department of Electromagnetic Neurophysiology, Tohoku University School of Medicine, Sendai, Miyagi, Japan
- Department of Epileptology, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| | - Ryuta Kawashima
- Institute of Development, Aging and Cancer, Tohoku University, Sendai, Miyagi, Japan
| | - Yukio Katori
- Department of Otolaryngology-Head and Neck Surgery, Tohoku University Graduate School of Medicine, Sendai, Miyagi, Japan
| |
Collapse
|
12
|
Straetmans L, Holtze B, Debener S, Jaeger M, Mirkovic B. Neural tracking to go: auditory attention decoding and saliency detection with mobile EEG. J Neural Eng 2021; 18. [PMID: 34902846 DOI: 10.1088/1741-2552/ac42b5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/13/2021] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Neuro-steered assistive technologies have been suggested to offer a major advancement in future devices like neuro-steered hearing aids. Auditory attention decoding methods would in that case allow for identification of an attended speaker within complex auditory environments, exclusively from neural data. Decoding the attended speaker using neural information has so far only been done in controlled laboratory settings. Yet, it is known that ever-present factors like distraction and movement are reflected in the neural signal parameters related to attention. APPROACH Thus, in the current study we applied a two-competing speaker paradigm to investigate performance of a commonly applied EEG-based auditory attention decoding (AAD) model outside of the laboratory during leisure walking and distraction. Unique environmental sounds were added to the auditory scene and served as distractor events. MAIN RESULTS The current study shows, for the first time, that the attended speaker can be accurately decoded during natural movement. At a temporal resolution of as short as 5-seconds and without artifact attenuation, decoding was found to be significantly above chance level. Further, as hypothesized, we found a decrease in attention to the to-be-attended and the to-be-ignored speech stream after the occurrence of a salient event. Additionally, we demonstrate that it is possible to predict neural correlates of distraction with a computational model of auditory saliency based on acoustic features. CONCLUSION Taken together, our study shows that auditory attention tracking outside of the laboratory in ecologically valid conditions is feasible and a step towards the development of future neural-steered hearing aids.
Collapse
Affiliation(s)
- Lisa Straetmans
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstraße 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - B Holtze
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Stefan Debener
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Manuela Jaeger
- Department of Psychology, Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| | - Bojana Mirkovic
- Department of Psychology , Carl von Ossietzky Universität Oldenburg Fakultät für Medizin und Gesundheitswissenschaften, Ammerländer Heerstr. 114-118, Oldenburg, Niedersachsen, 26129, GERMANY
| |
Collapse
|
13
|
Attentional control via synaptic gain mechanisms in auditory streaming. Brain Res 2021; 1778:147720. [PMID: 34785256 DOI: 10.1016/j.brainres.2021.147720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 09/13/2021] [Accepted: 11/05/2021] [Indexed: 11/21/2022]
Abstract
Attention is a crucial component in sound source segregation allowing auditory objects of interest to be both singled out and held in focus. Our study utilizes a fundamental paradigm for sound source segregation: a sequence of interleaved tones, A and B, of different frequencies that can be heard as a single integrated stream or segregated into two streams (auditory streaming paradigm). We focus on the irregular alternations between integrated and segregated that occur for long presentations, so-called auditory bistability. Psychaoustic experiments demonstrate how attentional control, a listener's intention to experience integrated or segregated, biases perception in favour of different perceptual interpretations. Our data show that this is achieved by prolonging the dominance times of the attended percept and, to a lesser extent, by curtailing the dominance times of the unattended percept, an effect that remains consistent across a range of values for the difference in frequency between A and B. An existing neuromechanistic model describes the neural dynamics of perceptual competition downstream of primary auditory cortex (A1). The model allows us to propose plausible neural mechanisms for attentional control, as linked to different attentional strategies, in a direct comparison with behavioural data. A mechanism based on a percept-specific input gain best accounts for the effects of attentional control.
Collapse
|
14
|
Kothinti SR, Huang N, Elhilali M. Auditory salience using natural scenes: An online study. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2952. [PMID: 34717500 PMCID: PMC8528551 DOI: 10.1121/10.0006750] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Salience is the quality of a sensory signal that attracts involuntary attention in humans. While it primarily reflects conspicuous physical attributes of a scene, our understanding of processes underlying what makes a certain object or event salient remains limited. In the vision literature, experimental results, theoretical accounts, and large amounts of eye-tracking data using rich stimuli have shed light on some of the underpinnings of visual salience in the brain. In contrast, studies of auditory salience have lagged behind due to limitations in both experimental designs and stimulus datasets used to probe the question of salience in complex everyday soundscapes. In this work, we deploy an online platform to study salience using a dichotic listening paradigm with natural auditory stimuli. The study validates crowd-sourcing as a reliable platform to collect behavioral responses to auditory salience by comparing experimental outcomes to findings acquired in a controlled laboratory setting. A model-based analysis demonstrates the benefits of extending behavioral measures of salience to broader selection of auditory scenes and larger pools of subjects. Overall, this effort extends our current knowledge of auditory salience in everyday soundscapes and highlights the limitations of low-level acoustic attributes in capturing the richness of natural soundscapes.
Collapse
Affiliation(s)
- Sandeep Reddy Kothinti
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Nicholas Huang
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
15
|
An Auditory Saliency Pooling-Based LSTM Model for Speech Intelligibility Classification. Symmetry (Basel) 2021. [DOI: 10.3390/sym13091728] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Speech intelligibility is a crucial element in oral communication that can be influenced by multiple elements, such as noise, channel characteristics, or speech disorders. In this paper, we address the task of speech intelligibility classification (SIC) in this last circumstance. Taking our previous works, a SIC system based on an attentional long short-term memory (LSTM) network, as a starting point, we deal with the problem of the inadequate learning of the attention weights due to training data scarcity. For overcoming this issue, the main contribution of this paper is a novel type of weighted pooling (WP) mechanism, called saliency pooling where the WP weights are not automatically learned during the training process of the network, but are obtained from an external source of information, the Kalinli’s auditory saliency model. In this way, it is intended to take advantage of the apparent symmetry between the human auditory attention mechanism and the attentional models integrated into deep learning networks. The developed systems are assessed on the UA-speech dataset that comprises speech uttered by subjects with several dysarthria levels. Results show that all the systems with saliency pooling significantly outperform a reference support vector machine (SVM)-based system and LSTM-based systems with mean pooling and attention pooling, suggesting that Kalinli’s saliency can be successfully incorporated into the LSTM architecture as an external cue for the estimation of the speech intelligibility level.
Collapse
|
16
|
AIM: A network model of attention in auditory cortex. PLoS Comput Biol 2021; 17:e1009356. [PMID: 34449761 PMCID: PMC8462696 DOI: 10.1371/journal.pcbi.1009356] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 09/24/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
Attentional modulation of cortical networks is critical for the cognitive flexibility required to process complex scenes. Current theoretical frameworks for attention are based almost exclusively on studies in visual cortex, where attentional effects are typically modest and excitatory. In contrast, attentional effects in auditory cortex can be large and suppressive. A theoretical framework for explaining attentional effects in auditory cortex is lacking, preventing a broader understanding of cortical mechanisms underlying attention. Here, we present a cortical network model of attention in primary auditory cortex (A1). A key mechanism in our network is attentional inhibitory modulation (AIM) of cortical inhibitory neurons. In this mechanism, top-down inhibitory neurons disinhibit bottom-up cortical circuits, a prominent circuit motif observed in sensory cortex. Our results reveal that the same underlying mechanisms in the AIM network can explain diverse attentional effects on both spatial and frequency tuning in A1. We find that a dominant effect of disinhibition on cortical tuning is suppressive, consistent with experimental observations. Functionally, the AIM network may play a key role in solving the cocktail party problem. We demonstrate how attention can guide the AIM network to monitor an acoustic scene, select a specific target, or switch to a different target, providing flexible outputs for solving the cocktail party problem. Selective attention plays a key role in how we navigate our everyday lives. For example, at a cocktail party, we can attend to friend’s speech amidst other speakers, music, and background noise. In stark contrast, hundreds of millions of people with hearing impairment and other disorders find such environments overwhelming and debilitating. Understanding the mechanisms underlying selective attention may lead to breakthroughs in improving the quality of life for those negatively affected. Here, we propose a mechanistic network model of attention in primary auditory cortex based on attentional inhibitory modulation (AIM). In the AIM model, attention targets specific cortical inhibitory neurons, which then modulate local cortical circuits to emphasize a particular feature of sounds and suppress competing features. We show that the AIM model can account for experimental observations across different species and stimulus domains. We also demonstrate that the same mechanisms can enable listeners to flexibly switch between attending to specific targets sounds and monitoring the environment in complex acoustic scenes, such as a cocktail party. The AIM network provides a theoretical framework which can work in tandem with new experiments to help unravel cortical circuits underlying attention.
Collapse
|
17
|
Cai S, Li P, Su E, Xie L. Auditory Attention Detection via Cross-Modal Attention. Front Neurosci 2021; 15:652058. [PMID: 34366770 PMCID: PMC8333999 DOI: 10.3389/fnins.2021.652058] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/24/2021] [Indexed: 11/13/2022] Open
Abstract
Humans show a remarkable perceptual ability to select the speech stream of interest among multiple competing speakers. Previous studies demonstrated that auditory attention detection (AAD) can infer which speaker is attended by analyzing a listener's electroencephalography (EEG) activities. However, previous AAD approaches perform poorly on short signal segments, more advanced decoding strategies are needed to realize robust real-time AAD. In this study, we propose a novel approach, i.e., cross-modal attention-based AAD (CMAA), to exploit the discriminative features and the correlation between audio and EEG signals. With this mechanism, we hope to dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features, thereby detecting the auditory attention activities manifested in brain signals. We also validate the CMAA model through data visualization and comprehensive experiments on a publicly available database. Experiments show that the CMAA achieves accuracy values of 82.8, 86.4, and 87.6% for 1-, 2-, and 5-s decision windows under anechoic conditions, respectively; for a 2-s decision window, it achieves an average of 84.1% under real-world reverberant conditions. The proposed CMAA network not only achieves better performance than the conventional linear model, but also outperforms the state-of-the-art non-linear approaches. These results and data visualization suggest that the CMAA model can dynamically adapt the interactions and fuse cross-modal information by directly attending to audio and EEG features in order to improve the AAD performance.
Collapse
Affiliation(s)
| | | | | | - Longhan Xie
- Shien-Ming Wu School of Intelligent Engineering, South China University of Technology, Guangzhou, China
| |
Collapse
|
18
|
Holtze B, Jaeger M, Debener S, Adiloğlu K, Mirkovic B. Are They Calling My Name? Attention Capture Is Reflected in the Neural Tracking of Attended and Ignored Speech. Front Neurosci 2021; 15:643705. [PMID: 33828451 PMCID: PMC8019946 DOI: 10.3389/fnins.2021.643705] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 02/19/2021] [Indexed: 11/15/2022] Open
Abstract
Difficulties in selectively attending to one among several speakers have mainly been associated with the distraction caused by ignored speech. Thus, in the current study, we investigated the neural processing of ignored speech in a two-competing-speaker paradigm. For this, we recorded the participant’s brain activity using electroencephalography (EEG) to track the neural representation of the attended and ignored speech envelope. To provoke distraction, we occasionally embedded the participant’s first name in the ignored speech stream. Retrospective reports as well as the presence of a P3 component in response to the name indicate that participants noticed the occurrence of their name. As predicted, the neural representation of the ignored speech envelope increased after the name was presented therein, suggesting that the name had attracted the participant’s attention. Interestingly, in contrast to our hypothesis, the neural tracking of the attended speech envelope also increased after the name occurrence. On this account, we conclude that the name might not have primarily distracted the participants, at most for a brief duration, but that it alerted them to focus to their actual task. These observations remained robust even when the sound intensity of the ignored speech stream, and thus the sound intensity of the name, was attenuated.
Collapse
Affiliation(s)
- Björn Holtze
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Kamil Adiloğlu
- Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,HörTech gGmbH, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
19
|
Williams ZJ, He JL, Cascio CJ, Woynaroski TG. A review of decreased sound tolerance in autism: Definitions, phenomenology, and potential mechanisms. Neurosci Biobehav Rev 2021; 121:1-17. [PMID: 33285160 PMCID: PMC7855558 DOI: 10.1016/j.neubiorev.2020.11.030] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Revised: 11/11/2020] [Accepted: 11/12/2020] [Indexed: 12/23/2022]
Abstract
Atypical behavioral responses to environmental sounds are common in autistic children and adults, with 50-70 % of this population exhibiting decreased sound tolerance (DST) at some point in their lives. This symptom is a source of significant distress and impairment across the lifespan, contributing to anxiety, challenging behaviors, reduced community participation, and school/workplace difficulties. However, relatively little is known about its phenomenology or neurocognitive underpinnings. The present article synthesizes a large body of literature on the phenomenology and pathophysiology of DST-related conditions to generate a comprehensive theoretical account of DST in autism. Notably, we argue against conceptualizing DST as a unified construct, suggesting that it be separated into three phenomenologically distinct conditions: hyperacusis (the perception of everyday sounds as excessively loud or painful), misophonia (an acquired aversive reaction to specific sounds), and phonophobia (a specific phobia of sound), each responsible for a portion of observed DST behaviors. We further elaborate our framework by proposing preliminary neurocognitive models of hyperacusis, misophonia, and phonophobia that incorporate neurophysiologic findings from studies of autism.
Collapse
Affiliation(s)
- Zachary J Williams
- Medical Scientist Training Program, Vanderbilt University School of Medicine, 221 Eskind Biomedical Library and Learning Center, 2209 Garland Ave., Nashville, TN, 37240, United States; Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Avenue South, Medical Center East, Room 8310, Nashville, TN, 37232, United States; Vanderbilt Brain Institute, Vanderbilt University, 7203 Medical Research Building III, 465 21st Avenue South, Nashville, TN, 37232, United States; Frist Center for Autism and Innovation, Vanderbilt University, 2414 Highland Avenue, Suite 115, Nashville, TN, 37212, United States.
| | - Jason L He
- Department of Forensic and Neurodevelopmental Sciences, Sackler Institute for Translational Neurodevelopment, Institute of Psychiatry, Psychology and Neuroscience, King's College London, Strand Building, Strand Campus, Strand, London, WC2R 2LS, London, United Kingdom.
| | - Carissa J Cascio
- Vanderbilt Brain Institute, Vanderbilt University, 7203 Medical Research Building III, 465 21st Avenue South, Nashville, TN, 37232, United States; Frist Center for Autism and Innovation, Vanderbilt University, 2414 Highland Avenue, Suite 115, Nashville, TN, 37212, United States; Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, 2254 Village at Vanderbilt, 1500 21st Ave South, Nashville, TN, 37212, United States; Vanderbilt Kennedy Center, Vanderbilt University Medical Center, 110 Magnolia Cir, Nashville, TN, 37203, United States.
| | - Tiffany G Woynaroski
- Department of Hearing and Speech Sciences, Vanderbilt University Medical Center, 1215 21st Avenue South, Medical Center East, Room 8310, Nashville, TN, 37232, United States; Vanderbilt Brain Institute, Vanderbilt University, 7203 Medical Research Building III, 465 21st Avenue South, Nashville, TN, 37232, United States; Frist Center for Autism and Innovation, Vanderbilt University, 2414 Highland Avenue, Suite 115, Nashville, TN, 37212, United States; Vanderbilt Kennedy Center, Vanderbilt University Medical Center, 110 Magnolia Cir, Nashville, TN, 37203, United States.
| |
Collapse
|
20
|
Attention-Inspired Artificial Neural Networks for Speech Processing: A Systematic Review. Symmetry (Basel) 2021. [DOI: 10.3390/sym13020214] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the human brain and have been widely applied in speech processing. The application areas of ANN include: Speech recognition, speech emotion recognition, language identification, speech enhancement, and speech separation, amongst others. Likewise, given that speech processing performed by humans involves complex cognitive processes known as auditory attention, there has been a growing amount of papers proposing ANNs supported by deep learning algorithms in conjunction with some mechanism to achieve symmetry with the human attention process. However, while these ANN approaches include attention, there is no categorization of attention integrated into the deep learning algorithms and their relation with human auditory attention. Therefore, we consider it necessary to have a review of the different ANN approaches inspired in attention to show both academic and industry experts the available models for a wide variety of applications. Based on the PRISMA methodology, we present a systematic review of the literature published since 2000, in which deep learning algorithms are applied to diverse problems related to speech processing. In this paper 133 research works are selected and the following aspects are described: (i) Most relevant features, (ii) ways in which attention has been implemented, (iii) their hypothetical relationship with human attention, and (iv) the evaluation metrics used. Additionally, the four publications most related with human attention were analyzed and their strengths and weaknesses were determined.
Collapse
|
21
|
Eqlimi E, Bockstael A, De Coensel B, Schönwiesner M, Talsma D, Botteldooren D. EEG Correlates of Learning From Speech Presented in Environmental Noise. Front Psychol 2020; 11:1850. [PMID: 33250798 PMCID: PMC7676901 DOI: 10.3389/fpsyg.2020.01850] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 07/06/2020] [Indexed: 01/07/2023] Open
Abstract
How the human brain retains relevant vocal information while suppressing irrelevant sounds is one of the ongoing challenges in cognitive neuroscience. Knowledge of the underlying mechanisms of this ability can be used to identify whether a person is distracted during listening to a target speech, especially in a learning context. This paper investigates the neural correlates of learning from the speech presented in a noisy environment using an ecologically valid learning context and electroencephalography (EEG). To this end, the following listening tasks were performed while 64-channel EEG signals were recorded: (1) attentive listening to the lectures in background sound, (2) attentive listening to the background sound presented alone, and (3) inattentive listening to the background sound. For the first task, 13 lectures of 5 min in length embedded in different types of realistic background noise were presented to participants who were asked to focus on the lectures. As background noise, multi-talker babble, continuous highway, and fluctuating traffic sounds were used. After the second task, a written exam was taken to quantify the amount of information that participants have acquired and retained from the lectures. In addition to various power spectrum-based EEG features in different frequency bands, the peak frequency and long-range temporal correlations (LRTC) of alpha-band activity were estimated. To reduce these dimensions, a principal component analysis (PCA) was applied to the different listening conditions resulting in the feature combinations that discriminate most between listening conditions and persons. Linear mixed-effect modeling was used to explain the origin of extracted principal components, showing their dependence on listening condition and type of background sound. Following this unsupervised step, a supervised analysis was performed to explain the link between the exam results and the EEG principal component scores using both linear fixed and mixed-effect modeling. Results suggest that the ability to learn from the speech presented in environmental noise can be predicted by the several components over the specific brain regions better than by knowing the background noise type. These components were linked to deterioration in attention, speech envelope following, decreased focusing during listening, cognitive prediction error, and specific inhibition mechanisms.
Collapse
Affiliation(s)
- Ehsan Eqlimi
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium
| | - Annelies Bockstael
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium.,École d'Orthophonie et d'Audiologie, Université de Montréal, Montreal, QC, Canada.,Erasmushogeschool Brussel, Brussels, Belgium
| | - Bert De Coensel
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium.,ASAsense, Bruges, Belgium
| | - Marc Schönwiesner
- Faculty of Biosciences, Pharmacy and Psychology, Institute of Biology, University of Leipzig, Leipzig, Germany.,International Laboratory for Brain, Music and Sound Research (BRAMS), Université de Montréal, Montreal, QC, Canada
| | - Durk Talsma
- Department of Experimental Psychology, Ghent University, Ghent, Belgium
| | - Dick Botteldooren
- WAVES Research Group, Department of Information Technology, Ghent University, Ghent, Belgium
| |
Collapse
|
22
|
Bidirectional Attention for Text-Dependent Speaker Verification. SENSORS 2020; 20:s20236784. [PMID: 33261046 PMCID: PMC7730222 DOI: 10.3390/s20236784] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 11/24/2020] [Accepted: 11/25/2020] [Indexed: 11/16/2022]
Abstract
Automatic speaker verification provides a flexible and effective way for biometric authentication. Previous deep learning-based methods have demonstrated promising results, whereas a few problems still require better solutions. In prior works examining speaker discriminative neural networks, the speaker representation of the target speaker is regarded as a fixed one when comparing with utterances from different speakers, and the joint information between enrollment and evaluation utterances is ignored. In this paper, we propose to combine CNN-based feature learning with a bidirectional attention mechanism to achieve better performance with only one enrollment utterance. The evaluation-enrollment joint information is exploited to provide interactive features through bidirectional attention. In addition, we introduce one individual cost function to identify the phonetic contents, which contributes to calculating the attention score more specifically. These interactive features are complementary to the constant ones, which are extracted from individual speakers separately and do not vary with the evaluation utterances. The proposed method archived a competitive equal error rate of 6.26% on the internal "DAN DAN NI HAO" benchmark dataset with 1250 utterances and outperformed various baseline methods, including the traditional i-vector/PLDA, d-vector, self-attention, and sequence-to-sequence attention models.
Collapse
|
23
|
Paying attention to speech: The role of working memory capacity and professional experience. Atten Percept Psychophys 2020; 82:3594-3605. [PMID: 32676806 DOI: 10.3758/s13414-020-02091-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Managing attention in multispeaker environments is a challenging feat that is critical for human performance. However, why some people are better than others in allocating attention appropriately remains highly unknown. Here, we investigated the contribution of two factors-working memory capacity (WMC) and professional experience-to performance on two different types of attention task: selective attention to one speaker and distributed attention among multiple concurrent speakers. We compared performance across three groups: individuals with low (n = 20) and high (n = 25) WMC, and aircraft pilots (n = 24), whose profession poses extremely high demands for both selective and distributed attention to speech. Results suggests that selective attention is highly effective, with good performance maintained under increasingly adverse conditions, whereas performance decreases substantially with the requirement to distribute attention among a larger number of speakers. Importantly, both types of attention benefit from higher WMC, suggesting reliance on some common capacity-limited resources. However, only selective attention was further improved in the pilots, pointing to its flexible and trainable nature, whereas distributed attention seems to suffer from more fixed and severe processing bottlenecks.
Collapse
|
24
|
Abstract
To ensure that listeners pay attention and do not habituate, emotionally intense vocalizations may be under evolutionary pressure to exploit processing biases in the auditory system by maximising their bottom-up salience. This "salience code" hypothesis was tested using 128 human nonverbal vocalizations representing eight emotions: amusement, anger, disgust, effort, fear, pain, pleasure, and sadness. As expected, within each emotion category salience ratings derived from pairwise comparisons strongly correlated with perceived emotion intensity. For example, while laughs as a class were less salient than screams of fear, salience scores almost perfectly explained the perceived intensity of both amusement and fear considered separately. Validating self-rated salience evaluations, high- vs. low-salience sounds caused 25% more recall errors in a short-term memory task, whereas emotion intensity had no independent effect on recall errors. Furthermore, the acoustic characteristics of salient vocalizations were similar to those previously described for non-emotional sounds (greater duration and intensity, high pitch, bright timbre, rapid modulations, and variable spectral characteristics), confirming that vocalizations were not salient merely because of their emotional content. The acoustic code in nonverbal communication is thus aligned with sensory biases, offering a general explanation for some non-arbitrary properties of human and animal high-arousal vocalizations.
Collapse
Affiliation(s)
- Andrey Anikin
- Division of Cognitive Science, Lund University, Lund, Sweden
| |
Collapse
|
25
|
Bellur A, Elhilali M. Audio object classification using distributed beliefs and attention. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:729-739. [PMID: 33564695 PMCID: PMC7869589 DOI: 10.1109/taslp.2020.2966867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| |
Collapse
|
26
|
Devos P, Aletta F, Thomas P, Petrovic M, Vander Mynsbrugge T, Van de Velde D, De Vriendt P, Botteldooren D. Designing Supportive Soundscapes for Nursing Home Residents with Dementia. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16244904. [PMID: 31817300 PMCID: PMC6950055 DOI: 10.3390/ijerph16244904] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 11/22/2019] [Accepted: 11/28/2019] [Indexed: 12/12/2022]
Abstract
Sound and its resulting soundscape is a major appraisal component of the living environment. Where environmental sounds (e.g., outdoor traffic sounds) are often perceived as negative, a soundscape (e.g., containing natural sounds) can also have a positive effect on health and well-being. This supportive effect of a soundscape is getting increasing attention for use in practice. This paper addresses the design of a supportive sonic environment for persons with dementia in nursing homes. Starting from a review of key mechanisms related to sonic perception, cognitive deficits and related behavior, a framework is derived for the composition of a sonic environment for persons with dementia. The proposed framework is centered around using acoustic stimuli for influencing mood, stimulating the feeling of safety and triggering a response in a person. These stimuli are intended to be deployed as added sounds in a nursing home to improve the well-being and behavior of the residents.
Collapse
Affiliation(s)
- Paul Devos
- Department of Information Technology, Ghent University, 9052 Ghent, Belgium; (F.A.); (P.T.); (D.B.)
- Correspondence:
| | - Francesco Aletta
- Department of Information Technology, Ghent University, 9052 Ghent, Belgium; (F.A.); (P.T.); (D.B.)
- Institute for Environmental Design and Engineering, University College London, London WC1H0NN, UK
| | - Pieter Thomas
- Department of Information Technology, Ghent University, 9052 Ghent, Belgium; (F.A.); (P.T.); (D.B.)
| | - Mirko Petrovic
- Department of Internal Medicine and Paediatrics, Ghent University, 9000 Ghent, Belgium;
| | - Tara Vander Mynsbrugge
- Department of Occupational Therapy, Artevelde University College, 9000 Ghent, Belgium; (T.V.M.); (D.V.d.V.); (P.D.V.)
| | - Dominique Van de Velde
- Department of Occupational Therapy, Artevelde University College, 9000 Ghent, Belgium; (T.V.M.); (D.V.d.V.); (P.D.V.)
- Department of Occupational Therapy, Ghent University, 9000 Ghent, Belgium
| | - Patricia De Vriendt
- Department of Occupational Therapy, Artevelde University College, 9000 Ghent, Belgium; (T.V.M.); (D.V.d.V.); (P.D.V.)
- Department of Occupational Therapy, Ghent University, 9000 Ghent, Belgium
| | - Dick Botteldooren
- Department of Information Technology, Ghent University, 9052 Ghent, Belgium; (F.A.); (P.T.); (D.B.)
| |
Collapse
|
27
|
Kell AJE, McDermott JH. Invariance to background noise as a signature of non-primary auditory cortex. Nat Commun 2019; 10:3958. [PMID: 31477711 PMCID: PMC6718388 DOI: 10.1038/s41467-019-11710-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 07/30/2019] [Indexed: 12/22/2022] Open
Abstract
Despite well-established anatomical differences between primary and non-primary auditory cortex, the associated representational transformations have remained elusive. Here we show that primary and non-primary auditory cortex are differentiated by their invariance to real-world background noise. We measured fMRI responses to natural sounds presented in isolation and in real-world noise, quantifying invariance as the correlation between the two responses for individual voxels. Non-primary areas were substantially more noise-invariant than primary areas. This primary-nonprimary difference occurred both for speech and non-speech sounds and was unaffected by a concurrent demanding visual task, suggesting that the observed invariance is not specific to speech processing and is robust to inattention. The difference was most pronounced for real-world background noise-both primary and non-primary areas were relatively robust to simple types of synthetic noise. Our results suggest a general representational transformation between auditory cortical stages, illustrating a representational consequence of hierarchical organization in the auditory system.
Collapse
Affiliation(s)
- Alexander J E Kell
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, 02139, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, 02139, USA.
- Zuckerman Institute of Mind, Brain, and Behavior, Columbia University, New York, NY, 10027, USA.
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, MIT, Cambridge, MA, 02139, USA.
- McGovern Institute for Brain Research, MIT, Cambridge, MA, 02139, USA.
- Center for Brains, Minds, and Machines, MIT, Cambridge, MA, 02139, USA.
- Program in Speech and Hearing Biosciences and Technology, Harvard University, Boston, MA, USA.
| |
Collapse
|
28
|
Alickovic E, Lunner T, Gustafsson F, Ljung L. A Tutorial on Auditory Attention Identification Methods. Front Neurosci 2019; 13:153. [PMID: 30941002 PMCID: PMC6434370 DOI: 10.3389/fnins.2019.00153] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 02/11/2019] [Indexed: 01/14/2023] Open
Abstract
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
Collapse
Affiliation(s)
- Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Thomas Lunner
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Systems, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Swedish Institute for Disability Research, Linnaeus Centre HEAD, Linkoping University, Linkoping, Sweden
| | - Fredrik Gustafsson
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Lennart Ljung
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| |
Collapse
|
29
|
Hambrook DA, Tata MS. The effects of distractor set-size on neural tracking of attended speech. BRAIN AND LANGUAGE 2019; 190:1-9. [PMID: 30616147 DOI: 10.1016/j.bandl.2018.12.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Revised: 11/19/2018] [Accepted: 12/19/2018] [Indexed: 06/09/2023]
Abstract
Attention is crucial to speech comprehension in real-world, noisy environments. Selective phase-tracking between low-frequency brain dynamics and the envelope of target speech is a proposed mechanism to reject competing distractors. Studies have supported this theory in the case of a single distractor, but have not considered how tracking is systematically affected by varying distractor set sizes. We recorded electroencephalography (EEG) during selective listening to both natural and vocoded speech as distractor set-size varied from two to six voices. Increasing set-size reduced performance and attenuated EEG tracking of target speech. Further, we found that intrusions of distractor speech into perception were not accompanied by sustained tracking of the distractor stream. Our results support the theory that tracking of speech dynamics is a mechanism for selective attention, and that the mechanism of distraction is not simple stimulus-driven capture of sustained entrainment of auditory mechanisms by the acoustics of distracting speech.
Collapse
Affiliation(s)
- Dillon A Hambrook
- The University of Lethbridge, 4401 University Drive, Lethbridge, Alberta T1K 3M4, Canada.
| | - Matthew S Tata
- The University of Lethbridge, 4401 University Drive, Lethbridge, Alberta T1K 3M4, Canada
| |
Collapse
|
30
|
Cuppone AV, Cappagli G, Gori M. Audio Feedback Associated With Body Movement Enhances Audio and Somatosensory Spatial Representation. Front Integr Neurosci 2018; 12:37. [PMID: 30233334 PMCID: PMC6131311 DOI: 10.3389/fnint.2018.00037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 08/15/2018] [Indexed: 11/13/2022] Open
Abstract
In the last years, the positive impact of sensorimotor rehabilitation training on spatial abilities has been taken into account, e.g., providing evidence that combined multimodal compared to unimodal feedback improves responsiveness to spatial stimuli. To date, it still remains unclear to which extent spatial learning is influenced by training conditions. Here we investigated the effects of active and passive audio-motor training on spatial perception in the auditory and proprioceptive domains on 36 healthy young adults. First, to investigate the role of voluntary movements on spatial perception, we compared the effects of active vs. passive multimodal training on auditory and proprioceptive spatial localization. Second, to investigate the effectiveness of unimodal training conditions on spatial perception, we compared the impact of only proprioceptive or only auditory sensory feedback on spatial localization. Finally, to understand whether the positive effects of multimodal and unimodal trainings generalize to the untrained part, both dominant and non-dominant arms were tested. Results indicate that passive multimodal training (guided movement) is more beneficial than active multimodal training (active exploration) and only in passive condition the improvement is generalized also on the untrained hand. Moreover, we found that combined audio-motor training provides the strongest benefit because it significantly affects both auditory and somatosensory localization, while the effect of a single feedback modality is limited to a single domain, indicating a cross-modal influence of the two domains. Therefore, the use of multimodal feedback is more efficient in improving spatial perception. These results indicate that combined sensorimotor signals are effective in recalibrating auditory and proprioceptive spatial perception and that the beneficial effect is mainly due to the combination of auditory and proprioceptive spatial cues.
Collapse
Affiliation(s)
- Anna Vera Cuppone
- Unit for Visually Impaired People (U-VIP), Istituto Italiano di Tecnologia, Genoa, Italy
| | | | | |
Collapse
|
31
|
Lu Y, Wang M, Zhang Q, Han Y. Identification of Auditory Object-Specific Attention from Single-Trial Electroencephalogram Signals via Entropy Measures and Machine Learning. ENTROPY 2018; 20:e20050386. [PMID: 33265476 PMCID: PMC7512905 DOI: 10.3390/e20050386] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Revised: 05/16/2018] [Accepted: 05/16/2018] [Indexed: 01/04/2023]
Abstract
Existing research has revealed that auditory attention can be tracked from ongoing electroencephalography (EEG) signals. The aim of this novel study was to investigate the identification of peoples’ attention to a specific auditory object from single-trial EEG signals via entropy measures and machine learning. Approximate entropy (ApEn), sample entropy (SampEn), composite multiscale entropy (CmpMSE) and fuzzy entropy (FuzzyEn) were used to extract the informative features of EEG signals under three kinds of auditory object-specific attention (Rest, Auditory Object1 Attention (AOA1) and Auditory Object2 Attention (AOA2)). The linear discriminant analysis and support vector machine (SVM), were used to construct two auditory attention classifiers. The statistical results of entropy measures indicated that there were significant differences in the values of ApEn, SampEn, CmpMSE and FuzzyEn between Rest, AOA1 and AOA2. For the SVM-based auditory attention classifier, the auditory object-specific attention of Rest, AOA1 and AOA2 could be identified from EEG signals using ApEn, SampEn, CmpMSE and FuzzyEn as features and the identification rates were significantly different from chance level. The optimal identification was achieved by the SVM-based auditory attention classifier using CmpMSE with the scale factor τ = 10. This study demonstrated a novel solution to identify the auditory object-specific attention from single-trial EEG signals without the need to access the auditory stimulus.
Collapse
|
32
|
Miran S, Akram S, Sheikhattar A, Simon JZ, Zhang T, Babadi B. Real-Time Tracking of Selective Auditory Attention From M/EEG: A Bayesian Filtering Approach. Front Neurosci 2018; 12:262. [PMID: 29765298 PMCID: PMC5938416 DOI: 10.3389/fnins.2018.00262] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2017] [Accepted: 04/05/2018] [Indexed: 11/13/2022] Open
Abstract
Humans are able to identify and track a target speaker amid a cacophony of acoustic interference, an ability which is often referred to as the cocktail party phenomenon. Results from several decades of studying this phenomenon have culminated in recent years in various promising attempts to decode the attentional state of a listener in a competing-speaker environment from non-invasive neuroimaging recordings such as magnetoencephalography (MEG) and electroencephalography (EEG). To this end, most existing approaches compute correlation-based measures by either regressing the features of each speech stream to the M/EEG channels (the decoding approach) or vice versa (the encoding approach). To produce robust results, these procedures require multiple trials for training purposes. Also, their decoding accuracy drops significantly when operating at high temporal resolutions. Thus, they are not well-suited for emerging real-time applications such as smart hearing aid devices or brain-computer interface systems, where training data might be limited and high temporal resolutions are desired. In this paper, we close this gap by developing an algorithmic pipeline for real-time decoding of the attentional state. Our proposed framework consists of three main modules: (1) Real-time and robust estimation of encoding or decoding coefficients, achieved by sparse adaptive filtering, (2) Extracting reliable markers of the attentional state, and thereby generalizing the widely-used correlation-based measures thereof, and (3) Devising a near real-time state-space estimator that translates the noisy and variable attention markers to robust and statistically interpretable estimates of the attentional state with minimal delay. Our proposed algorithms integrate various techniques including forgetting factor-based adaptive filtering, ℓ1-regularization, forward-backward splitting algorithms, fixed-lag smoothing, and Expectation Maximization. We validate the performance of our proposed framework using comprehensive simulations as well as application to experimentally acquired M/EEG data. Our results reveal that the proposed real-time algorithms perform nearly as accurately as the existing state-of-the-art offline techniques, while providing a significant degree of adaptivity, statistical robustness, and computational savings.
Collapse
Affiliation(s)
- Sina Miran
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States
| | | | - Alireza Sheikhattar
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States.,Institute for Systems Research, University of Maryland College Park, MD, United States.,Department of Biology, University of Maryland College Park, MD, United States
| | - Tao Zhang
- Starkey Hearing Technologies Eden Prairie, MN, United States
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland College Park, MD, United States.,Institute for Systems Research, University of Maryland College Park, MD, United States
| |
Collapse
|
33
|
Haghighi M, Moghadamfalahi M, Akcakaya M, Erdogmus D. EEG-assisted Modulation of Sound Sources in the Auditory Scene. Biomed Signal Process Control 2017; 39:263-270. [PMID: 31118975 DOI: 10.1016/j.bspc.2017.08.008] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Noninvasive EEG (electroencephalography) based auditory attention detection could be useful for improved hearing aids in the future. This work is a novel attempt to investigate the feasibility of online modulation of sound sources by probabilistic detection of auditory attention, using a noninvasive EEG-based brain computer interface. Proposed online system modulates the upcoming sound sources through gain adaptation which employs probabilistic decisions (soft decisions) from a classifier trained on offline calibration data. In this work, calibration EEG data were collected in sessions where the participants listened to two sound sources (one attended and one unattended). Cross-correlation coefficients between the EEG measurements and the attended and unattended sound source envelope (estimates) are used to show differences in sharpness and delays of neural responses for attended versus unattended sound source. Salient features to distinguish attended sources from the unattended ones in the correlation patterns have been identified, and later they have been used to train an auditory attention classifier. Using this classifier, we have shown high offline detection performance with single channel EEG measurements compared to the existing approaches in the literature which employ large number of channels. In addition, using the classifier trained offline in the calibration session, we have shown the performance of the online sound source modulation system. We observe that online sound source modulation system is able to keep the level of attended sound source higher than the unattended source.
Collapse
Affiliation(s)
| | | | - Murat Akcakaya
- University of Pittsburgh, 4200 Fifth Ave, Pittsburgh, PA 15260
| | - Deniz Erdogmus
- Northeastern University, 360 Huntington Ave, Boston, MA 02115
| |
Collapse
|
34
|
Huang N, Elhilali M. Auditory salience using natural soundscapes. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:2163. [PMID: 28372080 PMCID: PMC6909985 DOI: 10.1121/1.4979055] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/09/2017] [Accepted: 03/10/2017] [Indexed: 05/26/2023]
Abstract
Salience describes the phenomenon by which an object stands out from a scene. While its underlying processes are extensively studied in vision, mechanisms of auditory salience remain largely unknown. Previous studies have used well-controlled auditory scenes to shed light on some of the acoustic attributes that drive the salience of sound events. Unfortunately, the use of constrained stimuli in addition to a lack of well-established benchmarks of salience judgments hampers the development of comprehensive theories of sensory-driven auditory attention. The present study explores auditory salience in a set of dynamic natural scenes. A behavioral measure of salience is collected by having human volunteers listen to two concurrent scenes and indicate continuously which one attracts their attention. By using natural scenes, the study takes a data-driven rather than experimenter-driven approach to exploring the parameters of auditory salience. The findings indicate that the space of auditory salience is multidimensional (spanning loudness, pitch, spectral shape, as well as other acoustic attributes), nonlinear and highly context-dependent. Importantly, the results indicate that contextual information about the entire scene over both short and long scales needs to be considered in order to properly account for perceptual judgments of salience.
Collapse
Affiliation(s)
- Nicholas Huang
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
35
|
Dykstra AR, Cariani PA, Gutschalk A. A roadmap for the study of conscious audition and its neural basis. Philos Trans R Soc Lond B Biol Sci 2017; 372:20160103. [PMID: 28044014 PMCID: PMC5206271 DOI: 10.1098/rstb.2016.0103] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2016] [Indexed: 12/16/2022] Open
Abstract
How and which aspects of neural activity give rise to subjective perceptual experience-i.e. conscious perception-is a fundamental question of neuroscience. To date, the vast majority of work concerning this question has come from vision, raising the issue of generalizability of prominent resulting theories. However, recent work has begun to shed light on the neural processes subserving conscious perception in other modalities, particularly audition. Here, we outline a roadmap for the future study of conscious auditory perception and its neural basis, paying particular attention to how conscious perception emerges (and of which elements or groups of elements) in complex auditory scenes. We begin by discussing the functional role of the auditory system, particularly as it pertains to conscious perception. Next, we ask: what are the phenomena that need to be explained by a theory of conscious auditory perception? After surveying the available literature for candidate neural correlates, we end by considering the implications that such results have for a general theory of conscious perception as well as prominent outstanding questions and what approaches/techniques can best be used to address them.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Andrew R Dykstra
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
| | | | - Alexander Gutschalk
- Department of Neurology, Ruprecht-Karls-Universität Heidelberg, Heidelberg, Germany
| |
Collapse
|
36
|
Veale R, Hafed ZM, Yoshida M. How is visual salience computed in the brain? Insights from behaviour, neurobiology and modelling. Philos Trans R Soc Lond B Biol Sci 2017; 372:20160113. [PMID: 28044023 PMCID: PMC5206280 DOI: 10.1098/rstb.2016.0113] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/07/2016] [Indexed: 01/07/2023] Open
Abstract
Inherent in visual scene analysis is a bottleneck associated with the need to sequentially sample locations with foveating eye movements. The concept of a 'saliency map' topographically encoding stimulus conspicuity over the visual scene has proven to be an efficient predictor of eye movements. Our work reviews insights into the neurobiological implementation of visual salience computation. We start by summarizing the role that different visual brain areas play in salience computation, whether at the level of feature analysis for bottom-up salience or at the level of goal-directed priority maps for output behaviour. We then delve into how a subcortical structure, the superior colliculus (SC), participates in salience computation. The SC represents a visual saliency map via a centre-surround inhibition mechanism in the superficial layers, which feeds into priority selection mechanisms in the deeper layers, thereby affecting saccadic and microsaccadic eye movements. Lateral interactions in the local SC circuit are particularly important for controlling active populations of neurons. This, in turn, might help explain long-range effects, such as those of peripheral cues on tiny microsaccades. Finally, we show how a combination of in vitro neurophysiology and large-scale computational modelling is able to clarify how salience computation is implemented in the local circuit of the SC.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Richard Veale
- Department of System Neuroscience, National Institute for Physiological Sciences, Okazaki, Japan
| | - Ziad M Hafed
- Physiology of Active Vision Laboratory, Werner Reichardt Centre for Integrative Neuroscience, University of Tuebingen, Tuebingen, Germany
| | - Masatoshi Yoshida
- Department of System Neuroscience, National Institute for Physiological Sciences, Okazaki, Japan
- School of Life Science, The Graduate University for Advanced Studies, Hayama, Japan
| |
Collapse
|
37
|
Kondo HM, van Loon AM, Kawahara JI, Moore BCJ. Auditory and visual scene analysis: an overview. Philos Trans R Soc Lond B Biol Sci 2017; 372:rstb.2016.0099. [PMID: 28044011 DOI: 10.1098/rstb.2016.0099] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/03/2016] [Indexed: 01/23/2023] Open
Abstract
We perceive the world as stable and composed of discrete objects even though auditory and visual inputs are often ambiguous owing to spatial and temporal occluders and changes in the conditions of observation. This raises important questions regarding where and how 'scene analysis' is performed in the brain. Recent advances from both auditory and visual research suggest that the brain does not simply process the incoming scene properties. Rather, top-down processes such as attention, expectations and prior knowledge facilitate scene perception. Thus, scene analysis is linked not only with the extraction of stimulus features and formation and selection of perceptual objects, but also with selective attention, perceptual binding and awareness. This special issue covers novel advances in scene-analysis research obtained using a combination of psychophysics, computational modelling, neuroimaging and neurophysiology, and presents new empirical and theoretical approaches. For integrative understanding of scene analysis beyond and across sensory modalities, we provide a collection of 15 articles that enable comparison and integration of recent findings in auditory and visual scene analysis.This article is part of the themed issue 'Auditory and visual scene analysis'.
Collapse
Affiliation(s)
- Hirohito M Kondo
- Human Information Science Laboratory, NTT Communication Science Laboratories, NTT Corporation, Atsugi, Kanagawa 243-0198, Japan
| | - Anouk M van Loon
- Department of Experimental and Applied Psychology, Vrije Universiteit Amsterdam, Amsterdam 1081 BT, The Netherlands .,Institute of Brain and Behavior Amsterdam, Vrije Universiteit Amsterdam, Amsterdam 1081 BT, The Netherlands
| | - Jun-Ichiro Kawahara
- Department of Psychology, Graduate School of Letters, Hokkaido University, Sapporo 060-0810, Japan
| | - Brian C J Moore
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, UK
| |
Collapse
|
38
|
Cichy RM, Teng S. Resolving the neural dynamics of visual and auditory scene processing in the human brain: a methodological approach. Philos Trans R Soc Lond B Biol Sci 2017; 372:rstb.2016.0108. [PMID: 28044019 PMCID: PMC5206276 DOI: 10.1098/rstb.2016.0108] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/22/2016] [Indexed: 01/06/2023] Open
Abstract
In natural environments, visual and auditory stimulation elicit responses across a large set of brain regions in a fraction of a second, yielding representations of the multimodal scene and its properties. The rapid and complex neural dynamics underlying visual and auditory information processing pose major challenges to human cognitive neuroscience. Brain signals measured non-invasively are inherently noisy, the format of neural representations is unknown, and transformations between representations are complex and often nonlinear. Further, no single non-invasive brain measurement technique provides a spatio-temporally integrated view. In this opinion piece, we argue that progress can be made by a concerted effort based on three pillars of recent methodological development: (i) sensitive analysis techniques such as decoding and cross-classification, (ii) complex computational modelling using models such as deep neural networks, and (iii) integration across imaging methods (magnetoencephalography/electroencephalography, functional magnetic resonance imaging) and models, e.g. using representational similarity analysis. We showcase two recent efforts that have been undertaken in this spirit and provide novel results about visual and auditory scene analysis. Finally, we discuss the limits of this perspective and sketch a concrete roadmap for future research. This article is part of the themed issue ‘Auditory and visual scene analysis’.
Collapse
Affiliation(s)
| | - Santani Teng
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
39
|
Southwell R, Baumann A, Gal C, Barascud N, Friston K, Chait M. Is predictability salient? A study of attentional capture by auditory patterns. Philos Trans R Soc Lond B Biol Sci 2017; 372:rstb.2016.0105. [PMID: 28044016 PMCID: PMC5206273 DOI: 10.1098/rstb.2016.0105] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/28/2016] [Indexed: 01/08/2023] Open
Abstract
In this series of behavioural and electroencephalography (EEG) experiments, we investigate the extent to which repeating patterns of sounds capture attention. Work in the visual domain has revealed attentional capture by statistically predictable stimuli, consistent with predictive coding accounts which suggest that attention is drawn to sensory regularities. Here, stimuli comprised rapid sequences of tone pips, arranged in regular (REG) or random (RAND) patterns. EEG data demonstrate that the brain rapidly recognizes predictable patterns manifested as a rapid increase in responses to REG relative to RAND sequences. This increase is reminiscent of the increase in gain on neural responses to attended stimuli often seen in the neuroimaging literature, and thus consistent with the hypothesis that predictable sequences draw attention. To study potential attentional capture by auditory regularities, we used REG and RAND sequences in two different behavioural tasks designed to reveal effects of attentional capture by regularity. Overall, the pattern of results suggests that regularity does not capture attention. This article is part of the themed issue ‘Auditory and visual scene analysis’.
Collapse
Affiliation(s)
- Rosy Southwell
- Ear Institute, University College London, London WC1X 8EE, UK
| | - Anna Baumann
- Ear Institute, University College London, London WC1X 8EE, UK
| | - Cécile Gal
- Ear Institute, University College London, London WC1X 8EE, UK
| | | | - Karl Friston
- Wellcome Trust Centre for Neuroimaging, University College London, London WC1N 3BG, UK
| | - Maria Chait
- Ear Institute, University College London, London WC1X 8EE, UK
| |
Collapse
|