1
|
Strauss DJ, Francis AL, Vibell J, Corona-Strauss FI. The role of attention in immersion: The two-competitor model. Brain Res Bull 2024; 210:110923. [PMID: 38462137 DOI: 10.1016/j.brainresbull.2024.110923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 11/19/2023] [Accepted: 03/06/2024] [Indexed: 03/12/2024]
Abstract
Currently, we face an exponentially increasing interest in immersion, especially sensory-driven immersion, mainly due to the rapid development of ideas and business models centered around a digital virtual universe as well as the increasing availability of affordable immersive technologies for education, communication, and entertainment. However, a clear definition of 'immersion', in terms of established neurocognitive concepts and measurable properties, remains elusive, slowing research on the human side of immersive interfaces. To address this problem, we propose a conceptual, taxonomic model of attention in immersion. We argue (a) modeling immersion theoretically as well as studying immersion experimentally requires a detailed characterization of the role of attention in immersion, even though (b) attention, while necessary, cannot be a sufficient condition for defining immersion. Our broader goal is to characterize immersion in terms that will be compatible with established psychophysiolgical measures that could then in principle be used for the assessment and eventually the optimization of an immersive experience. We start from the perspective that immersion requires the projection of attention to an induced reality, and build on accepted taxonomies of different modes of attention for the development of our two-competitor model. The two-competitor model allows for a quantitative implementation and has an easy graphical interpretation. It helps to highlight the important link between different modes of attention and affect in studying immersion.
Collapse
Affiliation(s)
- Daniel J Strauss
- Systems Neuroscience & Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany.
| | - Alexander L Francis
- Speech Perception & Cognitive Effort Lab, Dept. of Speech, Language & Hearing Sciences, Purdue University, West Lafayette, IN, USA
| | - Jonas Vibell
- Brain & Behavior Lab, Dept. of Psychology, University of Hawai'i at Manoa, Honololulu, HI, USA
| | - Farah I Corona-Strauss
- Systems Neuroscience & Neurotechnology Unit, Faculty of Medicine, Saarland University & School of Engineering, htw saar, Homburg/Saar, Germany
| |
Collapse
|
2
|
Puschmann S, Regev M, Fakhar K, Zatorre RJ, Thiel CM. Attention-Driven Modulation of Auditory Cortex Activity during Selective Listening in a Multispeaker Setting. J Neurosci 2024; 44:e1157232023. [PMID: 38388426 PMCID: PMC11007309 DOI: 10.1523/jneurosci.1157-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/30/2023] [Accepted: 11/05/2023] [Indexed: 02/24/2024] Open
Abstract
Real-world listening settings often consist of multiple concurrent sound streams. To limit perceptual interference during selective listening, the auditory system segregates and filters the relevant sensory input. Previous work provided evidence that the auditory cortex is critically involved in this process and selectively gates attended input toward subsequent processing stages. We studied at which level of auditory cortex processing this filtering of attended information occurs using functional magnetic resonance imaging (fMRI) and a naturalistic selective listening task. Forty-five human listeners (of either sex) attended to one of two continuous speech streams, presented either concurrently or in isolation. Functional data were analyzed using an inter-subject analysis to assess stimulus-specific components of ongoing auditory cortex activity. Our results suggest that stimulus-related activity in the primary auditory cortex and the adjacent planum temporale are hardly affected by attention, whereas brain responses at higher stages of the auditory cortex processing hierarchy become progressively more selective for the attended input. Consistent with these findings, a complementary analysis of stimulus-driven functional connectivity further demonstrated that information on the to-be-ignored speech stream is shared between the primary auditory cortex and the planum temporale but largely fails to reach higher processing stages. Our findings suggest that the neural processing of ignored speech cannot be effectively suppressed at the level of early cortical processing of acoustic features but is gradually attenuated once the competing speech streams are fully segregated.
Collapse
Affiliation(s)
- Sebastian Puschmann
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| | - Mor Regev
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
| | - Kayson Fakhar
- Institute of Computational Neuroscience, University Medical Center Eppendorf, Hamburg University, Hamburg Center of Neuroscience, Hamburg 20246, Germany
| | - Robert J Zatorre
- Montreal Neurological Institute, McGill University, Montreal, Quebec H3A 2B4, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, Quebec H2V 2S9, Canada
| | - Christiane M Thiel
- Biological Psychology Lab, Department of Psychology, Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
- Cluster of Excellence "Hearing4all", Carl von Ossietzky Universität Oldenburg, 26129 Oldenburg, Germany
| |
Collapse
|
3
|
Simon A, Bech S, Loquet G, Østergaard J. Cortical linear encoding and decoding of sounds: Similarities and differences between naturalistic speech and music listening. Eur J Neurosci 2024; 59:2059-2074. [PMID: 38303522 DOI: 10.1111/ejn.16265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 11/02/2023] [Accepted: 01/12/2024] [Indexed: 02/03/2024]
Abstract
Linear models are becoming increasingly popular to investigate brain activity in response to continuous and naturalistic stimuli. In the context of auditory perception, these predictive models can be 'encoding', when stimulus features are used to reconstruct brain activity, or 'decoding' when neural features are used to reconstruct the audio stimuli. These linear models are a central component of some brain-computer interfaces that can be integrated into hearing assistive devices (e.g., hearing aids). Such advanced neurotechnologies have been widely investigated when listening to speech stimuli but rarely when listening to music. Recent attempts at neural tracking of music show that the reconstruction performances are reduced compared with speech decoding. The present study investigates the performance of stimuli reconstruction and electroencephalogram prediction (decoding and encoding models) based on the cortical entrainment of temporal variations of the audio stimuli for both music and speech listening. Three hypotheses that may explain differences between speech and music stimuli reconstruction were tested to assess the importance of the speech-specific acoustic and linguistic factors. While the results obtained with encoding models suggest different underlying cortical processing between speech and music listening, no differences were found in terms of reconstruction of the stimuli or the cortical data. The results suggest that envelope-based linear modelling can be used to study both speech and music listening, despite the differences in the underlying cortical mechanisms.
Collapse
Affiliation(s)
- Adèle Simon
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Søren Bech
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
- Research Department, Bang & Olufsen A/S, Struer, Denmark
| | - Gérard Loquet
- Department of Audiology and Speech Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - Jan Østergaard
- Artificial Intelligence and Sound, Department of Electronic Systems, Aalborg University, Aalborg, Denmark
| |
Collapse
|
4
|
Gao J, Chen H, Fang M, Ding N. Original speech and its echo are segregated and separately processed in the human brain. PLoS Biol 2024; 22:e3002498. [PMID: 38358954 PMCID: PMC10868781 DOI: 10.1371/journal.pbio.3002498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 01/15/2024] [Indexed: 02/17/2024] Open
Abstract
Speech recognition crucially relies on slow temporal modulations (<16 Hz) in speech. Recent studies, however, have demonstrated that the long-delay echoes, which are common during online conferencing, can eliminate crucial temporal modulations in speech but do not affect speech intelligibility. Here, we investigated the underlying neural mechanisms. MEG experiments demonstrated that cortical activity can effectively track the temporal modulations eliminated by an echo, which cannot be fully explained by basic neural adaptation mechanisms. Furthermore, cortical responses to echoic speech can be better explained by a model that segregates speech from its echo than by a model that encodes echoic speech as a whole. The speech segregation effect was observed even when attention was diverted but would disappear when segregation cues, i.e., speech fine structure, were removed. These results strongly suggested that, through mechanisms such as stream segregation, the auditory system can build an echo-insensitive representation of speech envelope, which can support reliable speech recognition.
Collapse
Affiliation(s)
- Jiaxin Gao
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Honghua Chen
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Mingxuan Fang
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
| | - Nai Ding
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, China
- Nanhu Brain-computer Interface Institute, Hangzhou, China
- The State key Lab of Brain-Machine Intelligence; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou, China
| |
Collapse
|
5
|
Ten Oever S, Martin AE. Interdependence of "What" and "When" in the Brain. J Cogn Neurosci 2024; 36:167-186. [PMID: 37847823 DOI: 10.1162/jocn_a_02067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2023]
Abstract
From a brain's-eye-view, when a stimulus occurs and what it is are interrelated aspects of interpreting the perceptual world. Yet in practice, the putative perceptual inferences about sensory content and timing are often dichotomized and not investigated as an integrated process. We here argue that neural temporal dynamics can influence what is perceived, and in turn, stimulus content can influence the time at which perception is achieved. This computational principle results from the highly interdependent relationship of what and when in the environment. Both brain processes and perceptual events display strong temporal variability that is not always modeled; we argue that understanding-and, minimally, modeling-this temporal variability is key for theories of how the brain generates unified and consistent neural representations and that we ignore temporal variability in our analysis practice at the peril of both data interpretation and theory-building. Here, we review what and when interactions in the brain, demonstrate via simulations how temporal variability can result in misguided interpretations and conclusions, and outline how to integrate and synthesize what and when in theories and models of brain computation.
Collapse
Affiliation(s)
- Sanne Ten Oever
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands
- Maastricht University, The Netherlands
| | - Andrea E Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands
| |
Collapse
|
6
|
Belo J, Clerc M, Schön D. The effect of familiarity on neural tracking of music stimuli is modulated by mind wandering. AIMS Neurosci 2023; 10:319-331. [PMID: 38188009 PMCID: PMC10767062 DOI: 10.3934/neuroscience.2023025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 10/29/2023] [Accepted: 11/06/2023] [Indexed: 01/09/2024] Open
Abstract
One way to investigate the cortical tracking of continuous auditory stimuli is to use the stimulus reconstruction approach. However, the cognitive and behavioral factors impacting this cortical representation remain largely overlooked. Two possible candidates are familiarity with the stimulus and the ability to resist internal distractions. To explore the possible impacts of these two factors on the cortical representation of natural music stimuli, forty-one participants listened to monodic natural music stimuli while we recorded their neural activity. Using the stimulus reconstruction approach and linear mixed models, we found that familiarity positively impacted the reconstruction accuracy of music stimuli and that this effect of familiarity was modulated by mind wandering.
Collapse
Affiliation(s)
- Joan Belo
- Athena Project Team, INRIA, Université Côte d'Azur, Nice, France
- Aix Marseille University, Inserm, INS, Institut de Neurosciences des Systèmes, Marseille, France
| | - Maureen Clerc
- Athena Project Team, INRIA, Université Côte d'Azur, Nice, France
| | - Daniele Schön
- Aix Marseille University, Inserm, INS, Institut de Neurosciences des Systèmes, Marseille, France
- Institute for Language, Communication, and the Brain, Aix-en-Provence, France
| |
Collapse
|
7
|
Tan SHJ, Kalashnikova M, Di Liberto GM, Crosse MJ, Burnham D. Seeing a Talking Face Matters: Gaze Behavior and the Auditory-Visual Speech Benefit in Adults' Cortical Tracking of Infant-directed Speech. J Cogn Neurosci 2023; 35:1741-1759. [PMID: 37677057 DOI: 10.1162/jocn_a_02044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
In face-to-face conversations, listeners gather visual speech information from a speaker's talking face that enhances their perception of the incoming auditory speech signal. This auditory-visual (AV) speech benefit is evident even in quiet environments but is stronger in situations that require greater listening effort such as when the speech signal itself deviates from listeners' expectations. One example is infant-directed speech (IDS) presented to adults. IDS has exaggerated acoustic properties that are easily discriminable from adult-directed speech (ADS). Although IDS is a speech register that adults typically use with infants, no previous neurophysiological study has directly examined whether adult listeners process IDS differently from ADS. To address this, the current study simultaneously recorded EEG and eye-tracking data from adult participants as they were presented with auditory-only (AO), visual-only, and AV recordings of IDS and ADS. Eye-tracking data were recorded because looking behavior to the speaker's eyes and mouth modulates the extent of AV speech benefit experienced. Analyses of cortical tracking accuracy revealed that cortical tracking of the speech envelope was significant in AO and AV modalities for IDS and ADS. However, the AV speech benefit [i.e., AV > (A + V)] was only present for IDS trials. Gaze behavior analyses indicated differences in looking behavior during IDS and ADS trials. Surprisingly, looking behavior to the speaker's eyes and mouth was not correlated with cortical tracking accuracy. Additional exploratory analyses indicated that attention to the whole display was negatively correlated with cortical tracking accuracy of AO and visual-only trials in IDS. Our results underscore the nuances involved in the relationship between neurophysiological AV speech benefit and looking behavior.
Collapse
Affiliation(s)
- Sok Hui Jessica Tan
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
- Science of Learning in Education Centre, Office of Education Research, National Institute of Education, Nanyang Technological University, Singapore
| | - Marina Kalashnikova
- The Basque Center on Cognition, Brain and Language
- IKERBASQUE, Basque Foundation for Science
| | - Giovanni M Di Liberto
- ADAPT Centre, School of Computer Science and Statistics, Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland
| | - Michael J Crosse
- SEGOTIA, Galway, Ireland
- Trinity Center for Biomedical Engineering, Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
| | - Denis Burnham
- The MARCS Institute of Brain, Behaviour and Development, Western Sydney University, Australia
| |
Collapse
|
8
|
Orf M, Wöstmann M, Hannemann R, Obleser J. Target enhancement but not distractor suppression in auditory neural tracking during continuous speech. iScience 2023; 26:106849. [PMID: 37305701 PMCID: PMC10251127 DOI: 10.1016/j.isci.2023.106849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/13/2023] [Accepted: 05/05/2023] [Indexed: 06/13/2023] Open
Abstract
Selective attention modulates the neural tracking of speech in auditory cortical regions. It is unclear whether this attentional modulation is dominated by enhanced target tracking, or suppression of distraction. To settle this long-standing debate, we employed an augmented electroencephalography (EEG) speech-tracking paradigm with target, distractor, and neutral streams. Concurrent target speech and distractor (i.e., sometimes relevant) speech were juxtaposed with a third, never task-relevant speech stream serving as neutral baseline. Listeners had to detect short target repeats and committed more false alarms originating from the distractor than from the neutral stream. Speech tracking revealed target enhancement but no distractor suppression below the neutral baseline. Speech tracking of the target (not distractor or neutral speech) explained single-trial accuracy in repeat detection. In sum, the enhanced neural representation of target speech is specific to processes of attentional gain for behaviorally relevant target speech rather than neural suppression of distraction.
Collapse
Affiliation(s)
- Martin Orf
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | - Malte Wöstmann
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| | | | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center of Brain, Behavior and Metabolism (CBBM), University of Lübeck, Lübeck, Germany
| |
Collapse
|
9
|
Rosenkranz M, Cetin T, Uslar VN, Bleichner MG. Investigating the attentional focus to workplace-related soundscapes in a complex audio-visual-motor task using EEG. FRONTIERS IN NEUROERGONOMICS 2023; 3:1062227. [PMID: 38235454 PMCID: PMC10790850 DOI: 10.3389/fnrgo.2022.1062227] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 12/16/2022] [Indexed: 01/19/2024]
Abstract
Introduction In demanding work situations (e.g., during a surgery), the processing of complex soundscapes varies over time and can be a burden for medical personnel. Here we study, using mobile electroencephalography (EEG), how humans process workplace-related soundscapes while performing a complex audio-visual-motor task (3D Tetris). Specifically, we wanted to know how the attentional focus changes the processing of the soundscape as a whole. Method Participants played a game of 3D Tetris in which they had to use both hands to control falling blocks. At the same time, participants listened to a complex soundscape, similar to what is found in an operating room (i.e., the sound of machinery, people talking in the background, alarm sounds, and instructions). In this within-subject design, participants had to react to instructions (e.g., "place the next block in the upper left corner") and to sounds depending on the experimental condition, either to a specific alarm sound originating from a fixed location or to a beep sound that originated from varying locations. Attention to the alarm reflected a narrow attentional focus, as it was easy to detect and most of the soundscape could be ignored. Attention to the beep reflected a wide attentional focus, as it required the participants to monitor multiple different sound streams. Results and discussion Results show the robustness of the N1 and P3 event related potential response during this dynamic task with a complex auditory soundscape. Furthermore, we used temporal response functions to study auditory processing to the whole soundscape. This work is a step toward studying workplace-related sound processing in the operating room using mobile EEG.
Collapse
Affiliation(s)
- Marc Rosenkranz
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Timur Cetin
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Verena N. Uslar
- Pius-Hospital Oldenburg, University Hospital for Visceral Surgery, University of Oldenburg, Oldenburg, Germany
| | - Martin G. Bleichner
- Neurophysiology of Everyday Life Group, Department of Psychology, University of Oldenburg, Oldenburg, Germany
- Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
10
|
Andreeva IG, Ogorodnikova EA. Auditory Adaptation to Speech Signal Characteristics. J EVOL BIOCHEM PHYS+ 2022. [DOI: 10.1134/s0022093022050027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
11
|
Mai A, Serman M, Best S, Jensen NS, Foellmer J, Schroeer A, Welsch C, Strauss DJ, Corona-Strauss FI. Speech Tracking in Complex Auditory Scenes with Differentiated In- and Out-Field-Of-View Processing in Hearing Aids. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2022; 2022:798-801. [PMID: 36086156 DOI: 10.1109/embc48229.2022.9870826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
In naturalistic auditory scenes, relevant information is rarely concentrated at a single location, but rather unpredictably scattered in- and out-field-of-view (in-/out-FOV). Although the parsing of a complex auditory scene is a fairly simple job for a healthy human auditory system, the uncertainty represents a major issue in the development of effective hearing aid (HA) processing strategies. Whereas traditional omnidirectional microphones (OM) amplify the complete auditory scene without enhancing signal-to-noise-ratio (SNR) between in- and out-FOV streams, directional microphones (DM) may greatly increase SNR at the cost of preventing HA users to perceive out-FOV information. The present study compares the conventional OM and DM HA settings to a split processing (SP) scheme differentiating between in- and out-FOV processing. We recorded electroencephalographic data of ten young, normal-hearing listeners who solved a cocktail-party-scenario-paradigm with continuous auditory streams and analyzed neural tracking of speech with a stimulus reconstruction (SR) approach. While results for all settings exhibited significantly higher SR accuracies for attended in-FOV than unattended out-FOV streams, there were distinct differences between settings. In-FOV SR performance was dominated by DM and SP and out-FOV SR accuracies were significantly higher for SP compared to OM and DM. Our results demonstrate the potential of a SP approach to combine the advantages of traditional OM and DM settings without introduction of significant compromises.
Collapse
|
12
|
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci 2021; 41:8946-8962. [PMID: 34503996 DOI: 10.1523/jneurosci.2891-20.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 11/21/2022] Open
Abstract
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Collapse
|
13
|
Renvall H, Seol J, Tuominen R, Sorger B, Riecke L, Salmelin R. Selective auditory attention within naturalistic scenes modulates reactivity to speech sounds. Eur J Neurosci 2021; 54:7626-7641. [PMID: 34697833 PMCID: PMC9298413 DOI: 10.1111/ejn.15504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Accepted: 10/10/2021] [Indexed: 11/27/2022]
Abstract
Rapid recognition and categorization of sounds are essential for humans and animals alike, both for understanding and reacting to our surroundings and for daily communication and social interaction. For humans, perception of speech sounds is of crucial importance. In real life, this task is complicated by the presence of a multitude of meaningful non‐speech sounds. The present behavioural, magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI) study was set out to address how attention to speech versus attention to natural non‐speech sounds within complex auditory scenes influences cortical processing. The stimuli were superimpositions of spoken words and environmental sounds, with parametric variation of the speech‐to‐environmental sound intensity ratio. The participants' task was to detect a repetition in either the speech or the environmental sound. We found that specifically when participants attended to speech within the superimposed stimuli, higher speech‐to‐environmental sound ratios resulted in shorter sustained MEG responses and stronger BOLD fMRI signals especially in the left supratemporal auditory cortex and in improved behavioural performance. No such effects of speech‐to‐environmental sound ratio were observed when participants attended to the environmental sound part within the exact same stimuli. These findings suggest stronger saliency of speech compared with other meaningful sounds during processing of natural auditory scenes, likely linked to speech‐specific top‐down and bottom‐up mechanisms activated during speech perception that are needed for tracking speech in real‐life‐like auditory environments.
Collapse
Affiliation(s)
- Hanna Renvall
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland.,BioMag Laboratory, HUS Diagnostic Center, Helsinki University Hospital, University of Helsinki and Aalto University School of Science, Helsinki, Finland
| | - Jaeho Seol
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Riku Tuominen
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| | - Bettina Sorger
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, The Netherlands
| | - Riitta Salmelin
- Department of Neuroscience and Biomedical Engineering, Aalto University, Espoo, Finland.,Aalto NeuroImaging, Aalto University, Espoo, Finland
| |
Collapse
|
14
|
Hausfeld L, Disbergen NR, Valente G, Zatorre RJ, Formisano E. Modulating Cortical Instrument Representations During Auditory Stream Segregation and Integration With Polyphonic Music. Front Neurosci 2021; 15:635937. [PMID: 34630007 PMCID: PMC8498193 DOI: 10.3389/fnins.2021.635937] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 08/24/2021] [Indexed: 11/13/2022] Open
Abstract
Numerous neuroimaging studies demonstrated that the auditory cortex tracks ongoing speech and that, in multi-speaker environments, tracking of the attended speaker is enhanced compared to the other irrelevant speakers. In contrast to speech, multi-instrument music can be appreciated by attending not only on its individual entities (i.e., segregation) but also on multiple instruments simultaneously (i.e., integration). We investigated the neural correlates of these two modes of music listening using electroencephalography (EEG) and sound envelope tracking. To this end, we presented uniquely composed music pieces played by two instruments, a bassoon and a cello, in combination with a previously validated music auditory scene analysis behavioral paradigm (Disbergen et al., 2018). Similar to results obtained through selective listening tasks for speech, relevant instruments could be reconstructed better than irrelevant ones during the segregation task. A delay-specific analysis showed higher reconstruction for the relevant instrument during a middle-latency window for both the bassoon and cello and during a late window for the bassoon. During the integration task, we did not observe significant attentional modulation when reconstructing the overall music envelope. Subsequent analyses indicated that this null result might be due to the heterogeneous strategies listeners employ during the integration task. Overall, our results suggest that subsequent to a common processing stage, top-down modulations consistently enhance the relevant instrument's representation during an instrument segregation task, whereas such an enhancement is not observed during an instrument integration task. These findings extend previous results from speech tracking to the tracking of multi-instrument music and, furthermore, inform current theories on polyphonic music perception.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Niels R Disbergen
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Giancarlo Valente
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
| | - Robert J Zatorre
- Cognitive Neuroscience Unit, Montreal Neurological Institute, McGill University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS), Montreal, QC, Canada
| | - Elia Formisano
- Department of Cognitive Neuroscience, Maastricht University, Maastricht, Netherlands
- Maastricht Brain Imaging Centre (MBIC), Maastricht University, Maastricht, Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, Netherlands
- Brightlands Institute for Smart Society (BISS), Maastricht University, Maastricht, Netherlands
| |
Collapse
|
15
|
Keshavarzi M, Varano E, Reichenbach T. Cortical Tracking of a Background Speaker Modulates the Comprehension of a Foreground Speech Signal. J Neurosci 2021; 41:5093-5101. [PMID: 33926996 PMCID: PMC8197648 DOI: 10.1523/jneurosci.3200-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/23/2021] [Accepted: 04/12/2021] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore providing evidence of a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we use a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech. Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope provides evidence that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.SIGNIFICANCE STATEMENT Loud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we use transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results provide evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| |
Collapse
|
16
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
17
|
de Cheveigné A, Slaney M, Fuglsang SA, Hjortkjaer J. Auditory stimulus-response modeling with a match-mismatch task. J Neural Eng 2021; 18. [PMID: 33849003 DOI: 10.1088/1741-2552/abf771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 04/13/2021] [Indexed: 11/12/2022]
Abstract
Objective.An auditory stimulus can be related to the brain response that it evokes by a stimulus-response model fit to the data. This offers insight into perceptual processes within the brain and is also of potential use for devices such as brain computer interfaces (BCIs). The quality of the model can be quantified by measuring the fit with a regression problem, or by applying it to a classification task and measuring its performance.Approach.Here we focus on amatch-mismatch(MM) task that entails deciding whether a segment of brain signal matches, via a model, the auditory stimulus that evoked it.Main results. Using these metrics, we describe a range of models of increasing complexity that we compare to methods in the literature, showing state-of-the-art performance. We document in detail one particular implementation, calibrated on a publicly-available database, that can serve as a robust reference to evaluate future developments.Significance.The MM task allows stimulus-response models to be evaluated in the limit of very high model accuracy, making it an attractive alternative to the more commonly used task of auditory attention detection. The MM task does not require class labels, so it is immune to mislabeling, and it is applicable to data recorded in listening scenarios with only one sound source, thus it is cheap to obtain large quantities of training and testing data. Performance metrics from this task, associated with regression accuracy, provide complementary insights into the relation between stimulus and response, as well as information about discriminatory power directly applicable to BCI applications.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Paris, CNRS UMR 8248, France.,Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, PSL, France.,UCL Ear Institute, London, United Kingdom.,Audition, DEC, ENS, 29 rue d'Ulm, 75230 Paris, France
| | - Malcolm Slaney
- Google Research, Machine Hearing Group, Mountain View, CA, United States of America
| | - Søren A Fuglsang
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| | - Jens Hjortkjaer
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| |
Collapse
|
18
|
Roswandowitz C, Swanborough H, Frühholz S. Categorizing human vocal signals depends on an integrated auditory-frontal cortical network. Hum Brain Mapp 2021; 42:1503-1517. [PMID: 33615612 PMCID: PMC7927295 DOI: 10.1002/hbm.25309] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 11/20/2020] [Accepted: 11/25/2020] [Indexed: 11/30/2022] Open
Abstract
Voice signals are relevant for auditory communication and suggested to be processed in dedicated auditory cortex (AC) regions. While recent reports highlighted an additional role of the inferior frontal cortex (IFC), a detailed description of the integrated functioning of the AC-IFC network and its task relevance for voice processing is missing. Using neuroimaging, we tested sound categorization while human participants either focused on the higher-order vocal-sound dimension (voice task) or feature-based intensity dimension (loudness task) while listening to the same sound material. We found differential involvements of the AC and IFC depending on the task performed and whether the voice dimension was of task relevance or not. First, when comparing neural vocal-sound processing of our task-based with previously reported passive listening designs we observed highly similar cortical activations in the AC and IFC. Second, during task-based vocal-sound processing we observed voice-sensitive responses in the AC and IFC whereas intensity processing was restricted to distinct AC regions. Third, the IFC flexibly adapted to the vocal-sounds' task relevance, being only active when the voice dimension was task relevant. Forth and finally, connectivity modeling revealed that vocal signals independent of their task relevance provided significant input to bilateral AC. However, only when attention was on the voice dimension, we found significant modulations of auditory-frontal connections. Our findings suggest an integrated auditory-frontal network to be essential for behaviorally relevant vocal-sounds processing. The IFC seems to be an important hub of the extended voice network when representing higher-order vocal objects and guiding goal-directed behavior.
Collapse
Affiliation(s)
- Claudia Roswandowitz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Huw Swanborough
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
| | - Sascha Frühholz
- Department of PsychologyUniversity of ZurichZurichSwitzerland
- Neuroscience Center ZurichUniversity of Zurich and ETH ZurichZurichSwitzerland
- Center for Integrative Human Physiology (ZIHP)University of ZurichZurichSwitzerland
| |
Collapse
|
19
|
Liberto GMD, Nie J, Yeaton J, Khalighinejad B, Shamma SA, Mesgarani N. Neural representation of linguistic feature hierarchy reflects second-language proficiency. Neuroimage 2020; 227:117586. [PMID: 33346131 PMCID: PMC8527895 DOI: 10.1016/j.neuroimage.2020.117586] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 11/15/2020] [Accepted: 11/18/2020] [Indexed: 12/03/2022] Open
Abstract
Acquiring a new language requires individuals to simultaneously and gradually learn linguistic attributes on multiple levels. Here, we investigated how this learning process changes the neural encoding of natural speech by assessing the encoding of the linguistic feature hierarchy in second-language listeners. Electroencephalography (EEG) signals were recorded from native Mandarin speakers with varied English proficiency and from native English speakers while they listened to audio-stories in English. We measured the temporal response functions (TRFs) for acoustic, phonemic, phonotactic, and semantic features in individual participants and found a main effect of proficiency on linguistic encoding. This effect of second-language proficiency was particularly prominent on the neural encoding of phonemes, showing stronger encoding of “new” phonemic contrasts (i.e., English contrasts that do not exist in Mandarin) with increasing proficiency. Overall, we found that the nonnative listeners with higher proficiency levels had a linguistic feature representation more similar to that of native listeners, which enabled the accurate decoding of language proficiency. This result advances our understanding of the cortical processing of linguistic information in second-language learners and provides an objective measure of language proficiency.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France.
| | - Jingping Nie
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States
| | - Jeremy Yeaton
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France; Laboratoire de Psychologie Cognitive, UMR 7290, CNRS, France. Aix-Marseille Université, France
| | - Bahar Khalighinejad
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States
| | - Shihab A Shamma
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France; Institute for Systems Research, Electrical and Computer Engineering, University of Maryland, College Park, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States.
| |
Collapse
|
20
|
Hausfeld L, Shiell M, Formisano E, Riecke L. Cortical processing of distracting speech in noisy auditory scenes depends on perceptual demand. Neuroimage 2020; 228:117670. [PMID: 33359352 DOI: 10.1016/j.neuroimage.2020.117670] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 11/15/2022] Open
Abstract
Selective attention is essential for the processing of multi-speaker auditory scenes because they require the perceptual segregation of the relevant speech ("target") from irrelevant speech ("distractors"). For simple sounds, it has been suggested that the processing of multiple distractor sounds depends on bottom-up factors affecting task performance. However, it remains unclear whether such dependency applies to naturalistic multi-speaker auditory scenes. In this study, we tested the hypothesis that increased perceptual demand (the processing requirement posed by the scene to separate the target speech) reduces the cortical processing of distractor speech thus decreasing their perceptual segregation. Human participants were presented with auditory scenes including three speakers and asked to selectively attend to one speaker while their EEG was acquired. The perceptual demand of this selective listening task was varied by introducing an auditory cue (interaural time differences, ITDs) for segregating the target from the distractor speakers, while acoustic differences between the distractors were matched in ITD and loudness. We obtained a quantitative measure of the cortical segregation of distractor speakers by assessing the difference in how accurately speech-envelope following EEG responses could be predicted by models of averaged distractor speech versus models of individual distractor speech. In agreement with our hypothesis, results show that interaural segregation cues led to improved behavioral word-recognition performance and stronger cortical segregation of the distractor speakers. The neural effect was strongest in the δ-band and at early delays (0 - 200 ms). Our results indicate that during low perceptual demand, the human cortex represents individual distractor speech signals as more segregated. This suggests that, in addition to purely acoustical properties, the cortical processing of distractor speakers depends on factors like perceptual demand.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands.
| | - Martha Shiell
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands; Maastricht Centre for Systems Biology, 6200MD Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| |
Collapse
|
21
|
Cortical voice processing is grounded in elementary sound analyses for vocalization relevant sound patterns. Prog Neurobiol 2020; 200:101982. [PMID: 33338555 DOI: 10.1016/j.pneurobio.2020.101982] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 12/05/2020] [Accepted: 12/11/2020] [Indexed: 01/31/2023]
Abstract
A subregion of the auditory cortex (AC) was proposed to selectively process voices. This selectivity of the temporal voice area (TVA) and its role in processing non-voice sounds however have remained elusive. For a better functional description of the TVA, we investigated its neural responses both to voice and non-voice sounds, and critically also to textural sound patterns (TSPs) that share basic features with natural sounds but that are perceptually very distant from voices. Listening to these TSPs, first, elicited activity in large subregions of the TVA, which was mainly driven by perpetual ratings of TSPs along a voice similarity scale. This similar TVA activity in response to TSPs might partially explain activation patterns typically observed during voice processing. Second, we reconstructed the TVA activity that is usually observed in voice processing with a linear combination of activation patterns from TSPs. An analysis of the reconstruction model weights demonstrated that the TVA similarly processes both natural voice and non-voice sounds as well as TSPs along their acoustic and perceptual features. The predominant factor in reconstructing the TVA pattern by TSPs were the perceptual voice similarity ratings. Third, a multi-voxel pattern analysis confirms that the TSPs contain sufficient sound information to explain TVA activity for voice processing. Altogether, rather than being restricted to higher-order voice processing only, the human "voice area" uses mechanisms to evaluate the perceptual and acoustic quality of non-voice sounds, and responds to the latter with a "voice-like" processing pattern when detecting some rudimentary perceptual similarity with voices.
Collapse
|
22
|
Brodbeck C, Jiao A, Hong LE, Simon JZ. Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers. PLoS Biol 2020; 18:e3000883. [PMID: 33091003 PMCID: PMC7644085 DOI: 10.1371/journal.pbio.3000883] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 11/05/2020] [Accepted: 09/14/2020] [Indexed: 01/09/2023] Open
Abstract
Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers’ spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech. How do humans focus on one speaker when several are talking? MEG responses to a continuous two-talker mixture suggest that, even though listeners attend only to one of the talkers, their auditory cortex tracks acoustic features from both speakers. This occurs even when those features are locally masked by the other speaker.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| | - Alex Jiao
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
| | - L. Elliot Hong
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland, United States of America
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland, United States of America
- Department of Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
23
|
Alickovic E, Lunner T, Wendt D, Fiedler L, Hietkamp R, Ng EHN, Graversen C. Neural Representation Enhanced for Speech and Reduced for Background Noise With a Hearing Aid Noise Reduction Scheme During a Selective Attention Task. Front Neurosci 2020; 14:846. [PMID: 33071722 PMCID: PMC7533612 DOI: 10.3389/fnins.2020.00846] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Accepted: 07/20/2020] [Indexed: 12/23/2022] Open
Abstract
Objectives Selectively attending to a target talker while ignoring multiple interferers (competing talkers and background noise) is more difficult for hearing-impaired (HI) individuals compared to normal-hearing (NH) listeners. Such tasks also become more difficult as background noise levels increase. To overcome these difficulties, hearing aids (HAs) offer noise reduction (NR) schemes. The objective of this study was to investigate the effect of NR processing (inactive, where the NR feature was switched off, vs. active, where the NR feature was switched on) on the neural representation of speech envelopes across two different background noise levels [+3 dB signal-to-noise ratio (SNR) and +8 dB SNR] by using a stimulus reconstruction (SR) method. Design To explore how NR processing supports the listeners’ selective auditory attention, we recruited 22 HI participants fitted with HAs. To investigate the interplay between NR schemes, background noise, and neural representation of the speech envelopes, we used electroencephalography (EEG). The participants were instructed to listen to a target talker in front while ignoring a competing talker in front in the presence of multi-talker background babble noise. Results The results show that the neural representation of the attended speech envelope was enhanced by the active NR scheme for both background noise levels. The neural representation of the attended speech envelope at lower (+3 dB) SNR was shifted, approximately by 5 dB, toward the higher (+8 dB) SNR when the NR scheme was turned on. The neural representation of the ignored speech envelope was modulated by the NR scheme and was mostly enhanced in the conditions with more background noise. The neural representation of the background noise was modulated (i.e., reduced) by the NR scheme and was significantly reduced in the conditions with more background noise. The neural representation of the net sum of the ignored acoustic scene (ignored talker and background babble) was not modulated by the NR scheme but was significantly reduced in the conditions with a reduced level of background noise. Taken together, we showed that the active NR scheme enhanced the neural representation of both the attended and the ignored speakers and reduced the neural representation of background noise, while the net sum of the ignored acoustic scene was not enhanced. Conclusion Altogether our results support the hypothesis that the NR schemes in HAs serve to enhance the neural representation of speech and reduce the neural representation of background noise during a selective attention task. We contend that these results provide a neural index that could be useful for assessing the effects of HAs on auditory and cognitive processing in HI populations.
Collapse
Affiliation(s)
- Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden
| | - Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Electrical Engineering, Linkoping University, Linköping, Sweden.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark.,Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Lorenz Fiedler
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | | | - Elaine Hoi Ning Ng
- Department of Behavioral Sciences and Learning, Linkoping University, Linköping, Sweden.,Oticon A/S, Smørum, Denmark
| | | |
Collapse
|
24
|
Abstract
Being able to pick out particular sounds, such as speech, against a background of other sounds represents one of the key tasks performed by the auditory system. Understanding how this happens is important because speech recognition in noise is particularly challenging for older listeners and for people with hearing impairments. Central to this ability is the capacity of neurons to adapt to the statistics of sounds reaching the ears, which helps to generate noise-tolerant representations of sounds in the brain. In more complex auditory scenes, such as a cocktail party — where the background noise comprises other voices, sound features associated with each source have to be grouped together and segregated from those belonging to other sources. This depends on precise temporal coding and modulation of cortical response properties when attending to a particular speaker in a multi-talker environment. Furthermore, the neural processing underlying auditory scene analysis is shaped by experience over multiple timescales.
Collapse
|
25
|
Paying attention to speech: The role of working memory capacity and professional experience. Atten Percept Psychophys 2020; 82:3594-3605. [PMID: 32676806 DOI: 10.3758/s13414-020-02091-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Managing attention in multispeaker environments is a challenging feat that is critical for human performance. However, why some people are better than others in allocating attention appropriately remains highly unknown. Here, we investigated the contribution of two factors-working memory capacity (WMC) and professional experience-to performance on two different types of attention task: selective attention to one speaker and distributed attention among multiple concurrent speakers. We compared performance across three groups: individuals with low (n = 20) and high (n = 25) WMC, and aircraft pilots (n = 24), whose profession poses extremely high demands for both selective and distributed attention to speech. Results suggests that selective attention is highly effective, with good performance maintained under increasingly adverse conditions, whereas performance decreases substantially with the requirement to distribute attention among a larger number of speakers. Importantly, both types of attention benefit from higher WMC, suggesting reliance on some common capacity-limited resources. However, only selective attention was further improved in the pilots, pointing to its flexible and trainable nature, whereas distributed attention seems to suffer from more fixed and severe processing bottlenecks.
Collapse
|
26
|
Formisano E, Hausfeld L. The Dialog of Primary and Non-primary Auditory Cortex at the 'Cocktail Party'. Neuron 2020; 104:1029-1031. [PMID: 31951534 DOI: 10.1016/j.neuron.2019.11.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In this issue of Neuron, O'Sullivan et al. (2019) measured electro-cortical responses to "cocktail party" speech mixtures in neurosurgical patients and demonstrated that the selective enhancement of attended speech is achieved through the adaptive weighting of primary auditory cortex output by non-primary auditory cortex.
Collapse
Affiliation(s)
- Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, PO Box 616, 6200 Maastricht, the Netherlands; Maastricht Brain Imaging Centre, 6200 Maastricht, the Netherlands; Maastricht Centre for Systems Biology, 6200, Maastricht, the Netherlands.
| | - Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, PO Box 616, 6200 Maastricht, the Netherlands; Maastricht Brain Imaging Centre, 6200 Maastricht, the Netherlands
| |
Collapse
|
27
|
Cortical auditory responses index the contributions of different RMS-level-dependent segments to speech intelligibility. Hear Res 2019; 383:107808. [DOI: 10.1016/j.heares.2019.107808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 09/17/2019] [Accepted: 10/01/2019] [Indexed: 10/25/2022]
|
28
|
Riecke L, Snipes S, van Bree S, Kaas A, Hausfeld L. Audio-tactile enhancement of cortical speech-envelope tracking. Neuroimage 2019; 202:116134. [DOI: 10.1016/j.neuroimage.2019.116134] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Revised: 08/07/2019] [Accepted: 08/26/2019] [Indexed: 11/25/2022] Open
|
29
|
Alickovic E, Lunner T, Gustafsson F, Ljung L. A Tutorial on Auditory Attention Identification Methods. Front Neurosci 2019; 13:153. [PMID: 30941002 PMCID: PMC6434370 DOI: 10.3389/fnins.2019.00153] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2018] [Accepted: 02/11/2019] [Indexed: 01/14/2023] Open
Abstract
Auditory attention identification methods attempt to identify the sound source of a listener's interest by analyzing measurements of electrophysiological data. We present a tutorial on the numerous techniques that have been developed in recent decades, and we present an overview of current trends in multivariate correlation-based and model-based learning frameworks. The focus is on the use of linear relations between electrophysiological and audio data. The way in which these relations are computed differs. For example, canonical correlation analysis (CCA) finds a linear subset of electrophysiological data that best correlates to audio data and a similar subset of audio data that best correlates to electrophysiological data. Model-based (encoding and decoding) approaches focus on either of these two sets. We investigate the similarities and differences between these linear model philosophies. We focus on (1) correlation-based approaches (CCA), (2) encoding/decoding models based on dense estimation, and (3) (adaptive) encoding/decoding models based on sparse estimation. The specific focus is on sparsity-driven adaptive encoding models and comparing the methodology in state-of-the-art models found in the auditory literature. Furthermore, we outline the main signal processing pipeline for how to identify the attended sound source in a cocktail party environment from the raw electrophysiological data with all the necessary steps, complemented with the necessary MATLAB code and the relevant references for each step. Our main aim is to compare the methodology of the available methods, and provide numerical illustrations to some of them to get a feeling for their potential. A thorough performance comparison is outside the scope of this tutorial.
Collapse
Affiliation(s)
- Emina Alickovic
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
| | - Thomas Lunner
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Hearing Systems, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
- Swedish Institute for Disability Research, Linnaeus Centre HEAD, Linkoping University, Linkoping, Sweden
| | - Fredrik Gustafsson
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| | - Lennart Ljung
- Department of Electrical Engineering, Linkoping University, Linkoping, Sweden
| |
Collapse
|