201
|
Chiang CH, Lee J, Wang C, Williams AJ, Lucas TH, Cohen YE, Viventi J. A modular high-density μECoG system on macaque vlPFC for auditory cognitive decoding. J Neural Eng 2020; 17:046008. [PMID: 32498058 DOI: 10.1088/1741-2552/ab9986] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
OBJECTIVE A fundamental goal of the auditory system is to parse the auditory environment into distinct perceptual representations. Auditory perception is mediated by the ventral auditory pathway, which includes the ventrolateral prefrontal cortex (vlPFC). Because large-scale recordings of auditory signals are quite rare, the spatiotemporal resolution of the neuronal code that underlies vlPFC's contribution to auditory perception has not been fully elucidated. Therefore, we developed a modular, chronic, high-resolution, multi-electrode array system with long-term viability in order to identify the information that could be decoded from μECoG vlPFC signals. APPROACH We molded three separate μECoG arrays into one and implanted this system in a non-human primate. A custom 3D-printed titanium chamber was mounted on the left hemisphere. The molded 294-contact μECoG array was implanted subdurally over the vlPFC. μECoG activity was recorded while the monkey participated in a 'hearing-in-noise' task in which they reported hearing a 'target' vocalization from a background 'chorus' of vocalizations. We titrated task difficulty by varying the sound level of the target vocalization, relative to the chorus (target-to-chorus ratio, TCr). MAIN RESULTS We decoded the TCr and the monkey's behavioral choices from the μECoG signal. We analyzed decoding accuracy as a function of number of electrodes, spatial resolution, and time from implantation. Over a one-year period, we found significant decoding with individual electrodes that increased significantly as we decoded simultaneously more electrodes. Further, we found that the decoding for behavioral choice was better than the decoding of TCr. Finally, because the decoding accuracy of individual electrodes varied on a day-by-day basis, electrode arrays with high channel counts ensure robust decoding in the long term. SIGNIFICANCE Our results demonstrate the utility of high-resolution and high-channel-count, chronic µECoG recording. We developed a surface electrode array that can be scaled to cover larger cortical areas without increasing the chamber footprint.
Collapse
Affiliation(s)
- Chia-Han Chiang
- Department of Biomedical Engineering, Duke University, Durham, NC, United States of America. These authors contributed equally to this work
| | | | | | | | | | | | | |
Collapse
|
202
|
Erkens J, Schulte M, Vormann M, Herrmann CS. Lacking Effects of Envelope Transcranial Alternating Current Stimulation Indicate the Need to Revise Envelope Transcranial Alternating Current Stimulation Methods. Neurosci Insights 2020; 15:2633105520936623. [PMID: 32685924 PMCID: PMC7343360 DOI: 10.1177/2633105520936623] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2020] [Accepted: 05/28/2020] [Indexed: 12/13/2022] Open
Abstract
In recent years, several studies have reported beneficial effects of transcranial alternating current stimulation (tACS) in experiments regarding sound and speech perception. A new development in this field is envelope-tACS: The goal of this method is to improve cortical entrainment to the speech signal by stimulating with a waveform based on the speech envelope. One challenge of this stimulation method is timing; the electrical stimulation needs to be phase-aligned with the naturally occurring cortical entrainment to the auditory stimuli. Due to individual differences in anatomy and processing speed, the optimal time-lag between presentation of sound and applying envelope-tACS varies between participants. To better investigate the effects of envelope-tACS, we performed a speech comprehension task with a larger amount of time-lags than previous experiments, as well as an equal amount of sham conditions. No significant difference between optimal stimulation time-lag condition and best sham condition was found. Further investigation of the data revealed a significant difference between the positive and negative half-cycles of the stimulation conditions but not for sham. However, we also found a significant learning effect over the course of the experiment which was of comparable size to the effects of envelope-tACS found in previous auditory tACS studies. In this article, we discuss possible explanations for why our findings did not match up with those of previous studies and the issues that come with researching and developing envelope-tACS.
Collapse
Affiliation(s)
- Jules Erkens
- Experimental Psychology Lab, Department of Psychology, Cluster of Excellence 'Hearing4All', European Medical School, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| | | | | | - Christoph S Herrmann
- Experimental Psychology Lab, Department of Psychology, Cluster of Excellence 'Hearing4All', European Medical School, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany.,Research Center Neurosensory Science, Carl von Ossietzky University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
203
|
Greenlaw KM, Puschmann S, Coffey EBJ. Decoding of Envelope vs. Fundamental Frequency During Complex Auditory Stream Segregation. NEUROBIOLOGY OF LANGUAGE (CAMBRIDGE, MASS.) 2020; 1:268-287. [PMID: 37215227 PMCID: PMC10158587 DOI: 10.1162/nol_a_00013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 04/25/2020] [Indexed: 05/24/2023]
Abstract
Hearing-in-noise perception is a challenging task that is critical to human function, but how the brain accomplishes it is not well understood. A candidate mechanism proposes that the neural representation of an attended auditory stream is enhanced relative to background sound via a combination of bottom-up and top-down mechanisms. To date, few studies have compared neural representation and its task-related enhancement across frequency bands that carry different auditory information, such as a sound's amplitude envelope (i.e., syllabic rate or rhythm; 1-9 Hz), and the fundamental frequency of periodic stimuli (i.e., pitch; >40 Hz). Furthermore, hearing-in-noise in the real world is frequently both messier and richer than the majority of tasks used in its study. In the present study, we use continuous sound excerpts that simultaneously offer predictive, visual, and spatial cues to help listeners separate the target from four acoustically similar simultaneously presented sound streams. We show that while both lower and higher frequency information about the entire sound stream is represented in the brain's response, the to-be-attended sound stream is strongly enhanced only in the slower, lower frequency sound representations. These results are consistent with the hypothesis that attended sound representations are strengthened progressively at higher level, later processing stages, and that the interaction of multiple brain systems can aid in this process. Our findings contribute to our understanding of auditory stream separation in difficult, naturalistic listening conditions and demonstrate that pitch and envelope information can be decoded from single-channel EEG data.
Collapse
Affiliation(s)
- Keelin M. Greenlaw
- Department of Psychology, Concordia University, Montreal, QC, Canada
- International Laboratory for Brain, Music and Sound Research (BRAMS)
- The Centre for Research on Brain, Language and Music (CRBLM)
| | | | | |
Collapse
|
204
|
Song J, Martin L, Iverson P. Auditory neural tracking and lexical processing of speech in noise: Masker type, spatial location, and language experience. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:253. [PMID: 32752786 DOI: 10.1121/10.0001477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 06/09/2020] [Indexed: 06/11/2023]
Abstract
The present study investigated how single-talker and babble maskers affect auditory and lexical processing during native (L1) and non-native (L2) speech recognition. Electroencephalogram (EEG) recordings were made while L1 and L2 (Korean) English speakers listened to sentences in the presence of single-talker and babble maskers that were colocated or spatially separated from the target. The predictability of the sentences was manipulated to measure lexical-semantic processing (N400), and selective auditory processing of the target was assessed using neural tracking measures. The results demonstrate that intelligible single-talker maskers cause listeners to attend more to the semantic content of the targets (i.e., greater context-related N400 changes) than when targets are in babble, and that listeners track the acoustics of the target less accurately with single-talker maskers. L1 and L2 listeners both modulated their processing in this way, although L2 listeners had more difficulty with the materials overall (i.e., lower behavioral accuracy, less context-related N400 variation, more listening effort). The results demonstrate that auditory and lexical processing can be simultaneously assessed within a naturalistic speech listening task, and listeners can adjust lexical processing to more strongly track the meaning of a sentence in order to help ignore competing lexical content.
Collapse
Affiliation(s)
- Jieun Song
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Luke Martin
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Paul Iverson
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
205
|
Yuriko Santos Kawata N, Hashimoto T, Kawashima R. Neural mechanisms underlying concurrent listening of simultaneous speech. Brain Res 2020; 1738:146821. [PMID: 32259518 DOI: 10.1016/j.brainres.2020.146821] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 03/31/2020] [Accepted: 04/03/2020] [Indexed: 10/24/2022]
Abstract
Can we identify what two people are saying at the same time? Although it is difficult to perfectly repeat two or more simultaneous messages, listeners can report information from both speakers. In a concurrent/divided listening task, enhanced attention and segregation of speech can be required rather than selection and suppression. However, the neural mechanisms of concurrent listening to multi-speaker concurrent speech has yet to be clarified. The present study utilized functional magnetic resonance imaging to examine the neural responses of healthy young adults listening to concurrent male and female speakers in an attempt to reveal the mechanism of concurrent listening. After practice and multiple trials testing concurrent listening, 31 participants achieved performance comparable with that of selective listening. Furthermore, compared to selective listening, concurrent listening induced greater activation in the anterior cingulate cortex, bilateral anterior insula, frontoparietal regions, and the periaqueductal gray region. In addition to the salience network for multi-speaker listening, attentional modulation and enhanced segregation of these signals could be used to achieve successful concurrent listening. These results indicate the presence of a potential mechanism by which one can listen to two voices with enhanced attention to saliency signals.
Collapse
Affiliation(s)
- Natasha Yuriko Santos Kawata
- Department of Functional Brain Imaging, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan
| | - Teruo Hashimoto
- Division of Developmental Cognitive Neuroscience, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan.
| | - Ryuta Kawashima
- Department of Functional Brain Imaging, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan; Division of Developmental Cognitive Neuroscience, Institute of Development, Aging and Cancer (IDAC), Tohoku University, Japan
| |
Collapse
|
206
|
Lo CY, Looi V, Thompson WF, McMahon CM. Music Training for Children With Sensorineural Hearing Loss Improves Speech-in-Noise Perception. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2020; 63:1990-2015. [PMID: 32543961 DOI: 10.1044/2020_jslhr-19-00391] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Purpose A growing body of evidence suggests that long-term music training provides benefits to auditory abilities for typical-hearing adults and children. The purpose of this study was to evaluate how music training may provide perceptual benefits (such as speech-in-noise, spectral resolution, and prosody) for children with hearing loss. Method Fourteen children aged 6-9 years with prelingual sensorineural hearing loss using bilateral cochlear implants, bilateral hearing aids, or bimodal configuration participated in a 12-week music training program, with nine participants completing the full testing requirements of the music training. Activities included weekly group-based music therapy and take-home music apps three times a week. The design was a pseudorandomized, longitudinal study (half the cohort was wait-listed, initially serving as a passive control group prior to music training). The test battery consisted of tasks related to music perception, music appreciation, and speech perception. As a comparison, 16 age-matched children with typical hearing also completed this test battery, but without participation in the music training. Results There were no changes for any outcomes for the passive control group. After music training, perception of speech-in-noise, question/statement prosody, musical timbre, and spectral resolution improved significantly, as did measures of music appreciation. There were no benefits for emotional prosody or pitch perception. Conclusion The findings suggest even a modest amount of music training has benefits for music and speech outcomes. These preliminary results provide further evidence that music training is a suitable complementary means of habilitation to improve the outcomes for children with hearing loss.
Collapse
Affiliation(s)
- Chi Yhun Lo
- Department of Linguistics, Macquarie University, Sydney, New South Wales, Australia
- The HEARing CRC, Melbourne, Victoria, Australia
- ARC Centre of Excellence in Cognition and its Disorders, Sydney, New South Wales, Australia
| | - Valerie Looi
- SCIC Cochlear Implant Program-An RIDBC Service, Sydney, New South Wales, Australia
| | - William Forde Thompson
- ARC Centre of Excellence in Cognition and its Disorders, Sydney, New South Wales, Australia
- Department of Psychology, Macquarie University, Sydney, New South Wales, Australia
| | - Catherine M McMahon
- Department of Linguistics, Macquarie University, Sydney, New South Wales, Australia
- The HEARing CRC, Melbourne, Victoria, Australia
| |
Collapse
|
207
|
Jaeger M, Mirkovic B, Bleichner MG, Debener S. Decoding the Attended Speaker From EEG Using Adaptive Evaluation Intervals Captures Fluctuations in Attentional Listening. Front Neurosci 2020; 14:603. [PMID: 32612507 PMCID: PMC7308709 DOI: 10.3389/fnins.2020.00603] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2019] [Accepted: 05/15/2020] [Indexed: 11/13/2022] Open
Abstract
Listeners differ in their ability to attend to a speech stream in the presence of a competing sound. Differences in speech intelligibility in noise cannot be fully explained by the hearing ability which suggests the involvement of additional cognitive factors. A better understanding of the temporal fluctuations in the ability to pay selective auditory attention to a desired speech stream may help in explaining these variabilities. In order to better understand the temporal dynamics of selective auditory attention, we developed an online auditory attention decoding (AAD) processing pipeline based on speech envelope tracking in the electroencephalogram (EEG). Participants had to attend to one audiobook story while a second one had to be ignored. Online AAD was applied to track the attention toward the target speech signal. Individual temporal attention profiles were computed by combining an established AAD method with an adaptive staircase procedure. The individual decoding performance over time was analyzed and linked to behavioral performance as well as subjective ratings of listening effort, motivation, and fatigue. The grand average attended speaker decoding profile derived in the online experiment indicated performance above chance level. Parameters describing the individual AAD performance in each testing block indicated significant differences in decoding performance over time to be closely related to the behavioral performance in the selective listening task. Further, an exploratory analysis indicated that subjects with poor decoding performance reported higher listening effort and fatigue compared to good performers. Taken together our results show that online EEG based AAD in a complex listening situation is feasible. Adaptive attended speaker decoding profiles over time could be used as an objective measure of behavioral performance and listening effort. The developed online processing pipeline could also serve as a basis for future EEG based near real-time auditory neurofeedback systems.
Collapse
Affiliation(s)
- Manuela Jaeger
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Fraunhofer Institute for Digital Media Technology IDMT, Division Hearing, Speech and Audio Technology, Oldenburg, Germany
| | - Bojana Mirkovic
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany
| | - Martin G Bleichner
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Neurophysiology of Everyday Life Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany
| | - Stefan Debener
- Neuropsychology Lab, Department of Psychology, University of Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, University of Oldenburg, Oldenburg, Germany.,Research Center for Neurosensory Science, University of Oldenburg, Oldenburg, Germany
| |
Collapse
|
208
|
Tian Y, Ma L. Auditory attention tracking states in a cocktail party environment can be decoded by deep convolutional neural networks. J Neural Eng 2020; 17:036013. [DOI: 10.1088/1741-2552/ab92b2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
209
|
Makov S, Zion Golumbic E. Irrelevant Predictions: Distractor Rhythmicity Modulates Neural Encoding in Auditory Cortex. Cereb Cortex 2020; 30:5792-5805. [DOI: 10.1093/cercor/bhaa153] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Revised: 04/10/2020] [Accepted: 05/02/2020] [Indexed: 12/12/2022] Open
Abstract
Abstract
Dynamic attending theory suggests that predicting the timing of upcoming sounds can assist in focusing attention toward them. However, whether similar predictive processes are also applied to background noises and assist in guiding attention “away” from potential distractors, remains an open question. Here we address this question by manipulating the temporal predictability of distractor sounds in a dichotic listening selective attention task. We tested the influence of distractors’ temporal predictability on performance and on the neural encoding of sounds, by comparing the effects of Rhythmic versus Nonrhythmic distractors. Using magnetoencephalography we found that, indeed, the neural responses to both attended and distractor sounds were affected by distractors’ rhythmicity. Baseline activity preceding the onset of Rhythmic distractor sounds was enhanced relative to nonrhythmic distractor sounds, and sensory response to them was suppressed. Moreover, detection of nonmasked targets improved when distractors were Rhythmic, an effect accompanied by stronger lateralization of the neural responses to attended sounds to contralateral auditory cortex. These combined behavioral and neural results suggest that not only are temporal predictions formed for task-irrelevant sounds, but that these predictions bear functional significance for promoting selective attention and reducing distractibility.
Collapse
Affiliation(s)
- Shiri Makov
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel
| | - Elana Zion Golumbic
- Gonda Multidisciplinary Brain Research Center, Bar-Ilan University, Ramat-Gan 5290002, Israel
| |
Collapse
|
210
|
Lu Z, Bassett DS. Invertible generalized synchronization: A putative mechanism for implicit learning in neural systems. CHAOS (WOODBURY, N.Y.) 2020; 30:063133. [PMID: 32611103 DOI: 10.1063/5.0004344] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2020] [Accepted: 05/25/2020] [Indexed: 06/11/2023]
Abstract
Regardless of the marked differences between biological and artificial neural systems, one fundamental similarity is that they are essentially dynamical systems that can learn to imitate other dynamical systems whose governing equations are unknown. The brain is able to learn the dynamic nature of the physical world via experience; analogously, artificial neural systems such as reservoir computing networks (RCNs) can learn the long-term behavior of complex dynamical systems from data. Recent work has shown that the mechanism of such learning in RCNs is invertible generalized synchronization (IGS). Yet, whether IGS is also the mechanism of learning in biological systems remains unclear. To shed light on this question, we draw inspiration from features of the human brain to propose a general and biologically feasible learning framework that utilizes IGS. To evaluate the framework's relevance, we construct several distinct neural network models as instantiations of the proposed framework. Regardless of their particularities, these neural network models can consistently learn to imitate other dynamical processes with a biologically feasible adaptation rule that modulates the strength of synapses. Further, we observe and theoretically explain the spontaneous emergence of four distinct phenomena reminiscent of cognitive functions: (i) learning multiple dynamics; (ii) switching among the imitations of multiple dynamical systems, either spontaneously or driven by external cues; (iii) filling-in missing variables from incomplete observations; and (iv) deciphering superimposed input from different dynamical systems. Collectively, our findings support the notion that biological neural networks can learn the dynamic nature of their environment through the mechanism of IGS.
Collapse
Affiliation(s)
- Zhixin Lu
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| | - Danielle S Bassett
- Department of Bioengineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
211
|
Keshavarzi M, Reichenbach T. Transcranial Alternating Current Stimulation With the Theta-Band Portion of the Temporally-Aligned Speech Envelope Improves Speech-in-Noise Comprehension. Front Hum Neurosci 2020; 14:187. [PMID: 32547377 PMCID: PMC7273508 DOI: 10.3389/fnhum.2020.00187] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Accepted: 04/27/2020] [Indexed: 11/13/2022] Open
Abstract
Transcranial alternating current stimulation with the speech envelope can modulate the comprehension of speech in noise. The modulation stems from the theta- but not the delta-band portion of the speech envelope, and likely reflects the entrainment of neural activity in the theta frequency band, which may aid the parsing of the speech stream. The influence of the current stimulation on speech comprehension can vary with the time delay between the current waveform and the audio signal. While this effect has been investigated for current stimulation based on the entire speech envelope, it has not yet been measured when the current waveform follows the theta-band portion of the speech envelope. Here, we show that transcranial current stimulation with the speech envelope filtered in the theta frequency band improves speech comprehension as compared to a sham stimulus. The improvement occurs when there is no time delay between the current and the speech stimulus, as well as when the temporal delay is comparatively short, 90 ms. In contrast, longer delays, as well as negative delays, do not impact speech-in-noise comprehension. Moreover, we find that the improvement of speech comprehension at no or small delays of the current stimulation is consistent across participants. Our findings suggest that cortical entrainment to speech is most influenced through current stimulation that follows the speech envelope with at most a small delay. They also open a path to enhancing the perception of speech in noise, an issue that is particularly important for people with hearing impairment.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Tobias Reichenbach
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| |
Collapse
|
212
|
Baroni F, Morillon B, Trébuchon A, Liégeois-Chauvel C, Olasagasti I, Giraud AL. Converging intracortical signatures of two separated processing timescales in human early auditory cortex. Neuroimage 2020; 218:116882. [PMID: 32439539 DOI: 10.1016/j.neuroimage.2020.116882] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Revised: 03/30/2020] [Accepted: 04/23/2020] [Indexed: 11/15/2022] Open
Abstract
Neural oscillations in auditory cortex are argued to support parsing and representing speech constituents at their corresponding temporal scales. Yet, how incoming sensory information interacts with ongoing spontaneous brain activity, what features of the neuronal microcircuitry underlie spontaneous and stimulus-evoked spectral fingerprints, and what these fingerprints entail for stimulus encoding, remain largely open questions. We used a combination of human invasive electrophysiology, computational modeling and decoding techniques to assess the information encoding properties of brain activity and to relate them to a plausible underlying neuronal microarchitecture. We analyzed intracortical auditory EEG activity from 10 patients while they were listening to short sentences. Pre-stimulus neural activity in early auditory cortical regions often exhibited power spectra with a shoulder in the delta range and a small bump in the beta range. Speech decreased power in the beta range, and increased power in the delta-theta and gamma ranges. Using multivariate machine learning techniques, we assessed the spectral profile of information content for two aspects of speech processing: detection and discrimination. We obtained better phase than power information decoding, and a bimodal spectral profile of information content with better decoding at low (delta-theta) and high (gamma) frequencies than at intermediate (beta) frequencies. These experimental data were reproduced by a simple rate model made of two subnetworks with different timescales, each composed of coupled excitatory and inhibitory units, and connected via a negative feedback loop. Modeling and experimental results were similar in terms of pre-stimulus spectral profile (except for the iEEG beta bump), spectral modulations with speech, and spectral profile of information content. Altogether, we provide converging evidence from both univariate spectral analysis and decoding approaches for a dual timescale processing infrastructure in human auditory cortex, and show that it is consistent with the dynamics of a simple rate model.
Collapse
Affiliation(s)
- Fabiano Baroni
- Department of Fundamental Neuroscience, University of Geneva, Geneva, Switzerland; School of Engineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Benjamin Morillon
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Neurosciences des Systémes (INS), Marseille, France
| | - Agnès Trébuchon
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Neurosciences des Systémes (INS), Marseille, France; Clinical Neurophysiology and Epileptology Department, Timone Hospital, Assistance Publique Hôpitaux de Marseille, Marseille, France
| | - Catherine Liégeois-Chauvel
- Aix Marseille Université, Institut National de la Santé et de la Recherche Médicale (INSERM), Institut de Neurosciences des Systémes (INS), Marseille, France; Department of Neurological Surgery, University of Pittsburgh, PA, 15213, USA
| | - Itsaso Olasagasti
- Department of Fundamental Neuroscience, University of Geneva, Geneva, Switzerland
| | - Anne-Lise Giraud
- Department of Fundamental Neuroscience, University of Geneva, Geneva, Switzerland
| |
Collapse
|
213
|
Wang Y, Zhang J, Zou J, Luo H, Ding N. Prior Knowledge Guides Speech Segregation in Human Auditory Cortex. Cereb Cortex 2020; 29:1561-1571. [PMID: 29788144 DOI: 10.1093/cercor/bhy052] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/22/2018] [Accepted: 02/15/2018] [Indexed: 11/12/2022] Open
Abstract
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Collapse
Affiliation(s)
- Yuanye Wang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jianfeng Zhang
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiajie Zou
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Huan Luo
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, Zhejiang, China.,State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, Zhejiang, China.,Interdisciplinary Center for Social Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
214
|
Koroma M, Lacaux C, Andrillon T, Legendre G, Léger D, Kouider S. Sleepers Selectively Suppress Informative Inputs during Rapid Eye Movements. Curr Biol 2020; 30:2411-2417.e3. [PMID: 32413310 DOI: 10.1016/j.cub.2020.04.047] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/20/2020] [Accepted: 04/20/2020] [Indexed: 10/24/2022]
Abstract
Sleep leads to a disconnection from the external world. Even when sleepers regain consciousness during rapid eye movement (REM) sleep, little, if any, external information is incorporated into dream content [1-3]. While gating mechanisms might be at play to avoid interference on dreaming activity [4], a total disconnection from an ever-changing environment may prevent the sleeper from promptly responding to informative events (e.g., threat signals). In fact, a whole range of neural responses to external events turns out to be preserved during REM sleep [5-9]. Thus, it remains unclear whether external inputs are either processed or, conversely, gated during REM sleep. One way to resolve this issue is to consider the specific impact of eye movements (EMs) characterizing REM sleep. EMs are a reliable predictor of reporting a dream upon awakening [10, 11], and their absence is associated with a lower arousal threshold to external stimuli [12]. We thus hypothesized that the presence of EMs would selectively prevent the processing of informative stimuli, whereas periods of REM sleep devoid of EMs would be associated with the monitoring of external signals. By reconstructing speech in a multi-talker environment from electrophysiological responses, we show that informative speech is amplified over meaningless speech during REM sleep. Yet, at the precise timing of EMs, informative speech is, on the contrary, selectively suppressed. These results demonstrate the flexible amplification and suppression of sensory information during REM sleep and reveal the impact of EMs on the selective gating of informative stimuli during sleep.
Collapse
Affiliation(s)
- Matthieu Koroma
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d'Études Cognitives, École Normale Supérieure - PSL Research University, 75005 Paris, France; Sorbonne Université, École Doctorale Cerveau Cognition Comportement, Université Pierre et Marie Curie, 9 Quai Saint Bernard, 75005 Paris, France
| | - Célia Lacaux
- Sorbonne Université, École Doctorale Cerveau Cognition Comportement, Université Pierre et Marie Curie, 9 Quai Saint Bernard, 75005 Paris, France; Sorbonne University, IHU@ICM, INSERM, CNRS UMR7225, 75013 Paris, France; AP-HP, Hôpital Pitié-Salpêtrière, Service des Pathologies du Sommeil, 75013 Paris, France
| | - Thomas Andrillon
- School of Psychological Sciences and Turner Institute for Brain and Mental Health, Monash University, Melbourne, VIC 3800, Australia
| | - Guillaume Legendre
- Department of Neuroscience, Faculty of Medicine, University of Geneva, 1211 Geneva, Switzerland
| | - Damien Léger
- Université de Paris, Paris Descartes, APHP, Hôtel Dieu, Centre du Sommeil et de la Vigilance et EA 7330 VIFASOM, 75004 Paris, France
| | - Sid Kouider
- Brain and Consciousness Group (ENS, EHESS, CNRS), Département d'Études Cognitives, École Normale Supérieure - PSL Research University, 75005 Paris, France.
| |
Collapse
|
215
|
Decruy L, Lesenfants D, Vanthornhout J, Francart T. Top-down modulation of neural envelope tracking: The interplay with behavioral, self-report and neural measures of listening effort. Eur J Neurosci 2020; 52:3375-3393. [PMID: 32306466 DOI: 10.1111/ejn.14753] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 04/09/2020] [Accepted: 04/11/2020] [Indexed: 11/27/2022]
Abstract
When listening to natural speech, our brain activity tracks the slow amplitude modulations of speech, also called the speech envelope. Moreover, recent research has demonstrated that this neural envelope tracking can be affected by top-down processes. The present study was designed to examine if neural envelope tracking is modulated by the effort that a person expends during listening. Five measures were included to quantify listening effort: two behavioral measures based on a novel dual-task paradigm, a self-report effort measure and two neural measures related to phase synchronization and alpha power. Electroencephalography responses to sentences, presented at a wide range of subject-specific signal-to-noise ratios, were recorded in thirteen young, normal-hearing adults. A comparison of the five measures revealed different effects of listening effort as a function of speech understanding. Reaction times on the primary task and self-reported effort decreased with increasing speech understanding. In contrast, reaction times on the secondary task and alpha power showed a peak-shaped behavior with highest effort at intermediate speech understanding levels. With regard to neural envelope tracking, we found that the reaction times on the secondary task and self-reported effort explained a small part of the variability in theta-band envelope tracking. Speech understanding was found to strongly modulate neural envelope tracking. More specifically, our results demonstrated a robust increase in envelope tracking with increasing speech understanding. The present study provides new insights in the relations among different effort measures and highlights the potential of neural envelope tracking to objectively measure speech understanding in young, normal-hearing adults.
Collapse
Affiliation(s)
- Lien Decruy
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Damien Lesenfants
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Jonas Vanthornhout
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences Research, Group Experimental Oto-rhino-laryngology (ExpORL), KU Leuven, Leuven, Belgium
| |
Collapse
|
216
|
Decruy L, Vanthornhout J, Francart T. Hearing impairment is associated with enhanced neural tracking of the speech envelope. Hear Res 2020; 393:107961. [PMID: 32470864 DOI: 10.1016/j.heares.2020.107961] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Revised: 02/07/2020] [Accepted: 03/31/2020] [Indexed: 10/24/2022]
Abstract
Elevated hearing thresholds in hearing impaired adults are usually compensated by providing amplification through a hearing aid. In spite of restoring hearing sensitivity, difficulties with understanding speech in noisy environments often remain. One main reason is that sensorineural hearing loss not only causes loss of audibility but also other deficits, including peripheral distortion but also central temporal processing deficits. To investigate the neural consequences of hearing impairment in the brain underlying speech-in-noise difficulties, we compared EEG responses to natural speech of 14 hearing impaired adults with those of 14 age-matched normal-hearing adults. We measured neural envelope tracking to sentences and a story masked by different levels of a stationary noise or competing talker. Despite their sensorineural hearing loss, hearing impaired adults showed higher neural envelope tracking of the target than the competing talker, similar to their normal-hearing peers. Furthermore, hearing impairment was related to an additional increase in neural envelope tracking of the target talker, suggesting that hearing impaired adults may have an enhanced sensitivity to envelope modulations or require a larger differential neural tracking of target versus competing talker to segregate speech from noise. Lastly, both normal-hearing and hearing impaired participants showed an increase in neural envelope tracking with increasing speech understanding. Hence, our results open avenues towards new clinical applications, such as neuro-steered prostheses as well as objective and automatic measurements of speech understanding performance.
Collapse
Affiliation(s)
- Lien Decruy
- KU Leuven, Department of Neurosciences, ExpORL, Herestraat 49 Bus 721, B-3000, Leuven, Belgium.
| | - Jonas Vanthornhout
- KU Leuven, Department of Neurosciences, ExpORL, Herestraat 49 Bus 721, B-3000, Leuven, Belgium.
| | - Tom Francart
- KU Leuven, Department of Neurosciences, ExpORL, Herestraat 49 Bus 721, B-3000, Leuven, Belgium.
| |
Collapse
|
217
|
Poeppel D, Assaneo MF. Speech rhythms and their neural foundations. Nat Rev Neurosci 2020; 21:322-334. [PMID: 32376899 DOI: 10.1038/s41583-020-0304-4] [Citation(s) in RCA: 164] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/30/2020] [Indexed: 12/26/2022]
Abstract
The recognition of spoken language has typically been studied by focusing on either words or their constituent elements (for example, low-level features or phonemes). More recently, the 'temporal mesoscale' of speech has been explored, specifically regularities in the envelope of the acoustic signal that correlate with syllabic information and that play a central role in production and perception processes. The temporal structure of speech at this scale is remarkably stable across languages, with a preferred range of rhythmicity of 2- 8 Hz. Importantly, this rhythmicity is required by the processes underlying the construction of intelligible speech. A lot of current work focuses on audio-motor interactions in speech, highlighting behavioural and neural evidence that demonstrates how properties of perceptual and motor systems, and their relation, can underlie the mesoscale speech rhythms. The data invite the hypothesis that the speech motor cortex is best modelled as a neural oscillator, a conjecture that aligns well with current proposals highlighting the fundamental role of neural oscillations in perception and cognition. The findings also show motor theories (of speech) in a different light, placing new mechanistic constraints on accounts of the action-perception interface.
Collapse
Affiliation(s)
- David Poeppel
- Department of Neuroscience, Max Planck Institute, Frankfurt, Germany. .,Department of Psychology, New York University, New York, NY, USA.
| | - M Florencia Assaneo
- Department of Psychology, New York University, New York, NY, USA.,Instituto de Neurobiologia, Universidad Nacional Autónoma de México Juriquilla, Querétaro, México
| |
Collapse
|
218
|
Synigal SR, Teoh ES, Lalor EC. Including Measures of High Gamma Power Can Improve the Decoding of Natural Speech From EEG. Front Hum Neurosci 2020; 14:130. [PMID: 32410969 PMCID: PMC7200998 DOI: 10.3389/fnhum.2020.00130] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 03/20/2020] [Indexed: 01/22/2023] Open
Abstract
The human auditory system is highly skilled at extracting and processing information from speech in both single-speaker and multi-speaker situations. A commonly studied speech feature is the amplitude envelope which can also be used to determine which speaker a listener is attending to in those multi-speaker situations. Non-invasive brain imaging (electro-/magnetoencephalography [EEG/MEG]) has shown that the phase of neural activity below 16 Hz tracks the dynamics of speech, whereas invasive brain imaging (electrocorticography [ECoG]) has shown that such processing is strongly reflected in the power of high frequency neural activity (around 70-150 Hz; known as high gamma). The first aim of this study was to determine if high gamma power scalp recorded EEG carries useful stimulus-related information, despite its reputation for having a poor signal to noise ratio. Specifically, linear regression was used to investigate speech envelope and attention decoding in low frequency EEG, high gamma power EEG, and in both EEG signals combined. The second aim was to assess whether the information reflected in high gamma power EEG may be complementary to that reflected in well-established low frequency EEG indices of speech processing. Exploratory analyses were also completed to examine how low frequency and high gamma power EEG may be sensitive to different features of the speech envelope. While low frequency speech tracking was evident for almost all subjects as expected, high gamma power also showed robust speech tracking in some subjects. This same pattern was true for attention decoding using a separate group of subjects who participated in a cocktail party attention experiment. For the subjects who showed speech tracking in high gamma power EEG, the spatiotemporal characteristics of that high gamma tracking differed from that of low-frequency EEG. Furthermore, combining the two neural measures led to improved measures of speech tracking for several subjects. Our results indicated that high gamma power EEG can carry useful information regarding speech processing and attentional selection in some subjects. Combining high gamma power and low frequency EEG can improve the mapping between natural speech and the resulting neural responses.
Collapse
Affiliation(s)
- Shyanthony R. Synigal
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
| | - Emily S. Teoh
- Trinity Centre for Biomedical Engineering, School of Engineering, Trinity College Dublin, University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College Dublin, University of Dublin, Dublin, Ireland
| | - Edmund C. Lalor
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
- Trinity Centre for Biomedical Engineering, School of Engineering, Trinity College Dublin, University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College Dublin, University of Dublin, Dublin, Ireland
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, NY, United States
| |
Collapse
|
219
|
Eipert L, Klump GM. Uncertainty-based informational masking in a vowel discrimination task for young and old Mongolian gerbils. Hear Res 2020; 392:107959. [PMID: 32330738 DOI: 10.1016/j.heares.2020.107959] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/13/2020] [Accepted: 04/01/2020] [Indexed: 11/25/2022]
Abstract
Informational masking emerges with processing of complex sounds in the central auditory system and can be affected by uncertainty emerging from trial-to-trial variation of stimulus features. Uncertainty can be non-informative but confusing and thus mask otherwise salient stimulus changes resulting in increased discrimination thresholds. With increasing age, the ability for processing of such complex sound scenes degrades. Here, 6 young and 4 old gerbils were tested behaviorally in a vowel discrimination task. Animals were trained to discriminate between sequentially presented target and reference vowels of the vowel pair/I/-/i/. Reference and target vowels were generated shifting the three formants of the reference vowel in steps towards the formants of the target vowels. Non-informative but distracting uncertainty was introduced by random changes in location, level, fundamental frequency or all three features combined. Young gerbils tested with uncertainty for the target or target and reference vowels showed similar informational masking effects for both conditions. Young and old gerbils were tested with uncertainty for the target vowels only. Old gerbils showed no threshold increase discriminating vowels without uncertainty in comparison with young gerbils. Introducing uncertainty, vowel discrimination thresholds increased for young and old gerbils and vowel discrimination thresholds increased most when presenting all three uncertainty features combined. Old gerbils were more susceptible to non-informative uncertainty and their thresholds increased more than thresholds of young gerbils. Gerbils' vowel discrimination thresholds are compared to human performance in the same task (Eipert et al., 2019).
Collapse
Affiliation(s)
- Lena Eipert
- Cluster of Excellence Hearing4all, Division Animal Physiology and Behavior, Department of Neuroscience, School of Medicine and Health Sciences, University of Oldenburg, D-26111, Oldenburg, Germany
| | - Georg M Klump
- Cluster of Excellence Hearing4all, Division Animal Physiology and Behavior, Department of Neuroscience, School of Medicine and Health Sciences, University of Oldenburg, D-26111, Oldenburg, Germany.
| |
Collapse
|
220
|
The interplay of top-down focal attention and the cortical tracking of speech. Sci Rep 2020; 10:6922. [PMID: 32332791 PMCID: PMC7181730 DOI: 10.1038/s41598-020-63587-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 04/02/2020] [Indexed: 12/29/2022] Open
Abstract
Many active neuroimaging paradigms rely on the assumption that the participant sustains attention to a task. However, in practice, there will be momentary distractions, potentially influencing the results. We investigated the effect of focal attention, objectively quantified using a measure of brain signal entropy, on cortical tracking of the speech envelope. The latter is a measure of neural processing of naturalistic speech. We let participants listen to 44 minutes of natural speech, while their electroencephalogram was recorded, and quantified both entropy and cortical envelope tracking. Focal attention affected the later brain responses to speech, between 100 and 300 ms latency. By only taking into account periods with higher attention, the measured cortical speech tracking improved by 47%. This illustrates the impact of the participant’s active engagement in the modeling of the brain-speech response and the importance of accounting for it. Our results suggest a cortico-cortical loop that initiates during the early-stages of the auditory processing, then propagates through the parieto-occipital and frontal areas, and finally impacts the later-latency auditory processes in a top-down fashion. The proposed framework could be transposed to other active electrophysiological paradigms (visual, somatosensory, etc) and help to control the impact of participants’ engagement on the results.
Collapse
|
221
|
Geravanchizadeh M, Bakhshalipour Gavgani S. Selective auditory attention detection based on effective connectivity by single-trial EEG. J Neural Eng 2020; 17:026021. [PMID: 32131059 DOI: 10.1088/1741-2552/ab7c8d] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Focusing attention on one speaker in an environment with lots of speakers is one of the important abilities of the human auditory system. The temporal dynamics of the attention process and how the brain precisely performs this task are yet unknown. This paper proposes a new method for the selective auditory attention detection (SAAD) from single-trial EEG signals using the brain effective connectivity and complex network analysis for two groups of listeners attending to the left or right ear. APPROACH Here, the connectivity matrices of all subjects obtained from the Granger causality method are used to extract different features. Then, by employing the processes of feature selection and optimization, an optimized feature set is determined for the train of a classifier. MAIN RESULTS Among different measures of brain connectivity (i.e. segregation, integration, and centrality), the evaluation results show that the optimized feature set obtained by the combination of the centrality measures contain the most discriminative features for the classification process. The proposed SAAD method as compared with state-of-the-art attention detection approaches from the literature yields the best performance in terms of various measures. SIGNIFICANCE The new SAAD approach is advantageous, in the sense that the detection of attention is performed from single-trial EEG signals of each subject, without reconstructing the speech stimuli. This means that the proposed method could be employed for real-time applications such as smart hearing aid devices or brain-computer interface (BCI) systems.
Collapse
|
222
|
NeuroVAD: Real-Time Voice Activity Detection from Non-Invasive Neuromagnetic Signals. SENSORS 2020; 20:s20082248. [PMID: 32316162 PMCID: PMC7218843 DOI: 10.3390/s20082248] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/19/2020] [Revised: 04/11/2020] [Accepted: 04/14/2020] [Indexed: 11/26/2022]
Abstract
Neural speech decoding-driven brain-computer interface (BCI) or speech-BCI is a novel paradigm for exploring communication restoration for locked-in (fully paralyzed but aware) patients. Speech-BCIs aim to map a direct transformation from neural signals to text or speech, which has the potential for a higher communication rate than the current BCIs. Although recent progress has demonstrated the potential of speech-BCIs from either invasive or non-invasive neural signals, the majority of the systems developed so far still assume knowing the onset and offset of the speech utterances within the continuous neural recordings. This lack of real-time voice/speech activity detection (VAD) is a current obstacle for future applications of neural speech decoding wherein BCI users can have a continuous conversation with other speakers. To address this issue, in this study, we attempted to automatically detect the voice/speech activity directly from the neural signals recorded using magnetoencephalography (MEG). First, we classified the whole segments of pre-speech, speech, and post-speech in the neural signals using a support vector machine (SVM). Second, for continuous prediction, we used a long short-term memory-recurrent neural network (LSTM-RNN) to efficiently decode the voice activity at each time point via its sequential pattern-learning mechanism. Experimental results demonstrated the possibility of real-time VAD directly from the non-invasive neural signals with about 88% accuracy.
Collapse
|
223
|
Abstract
Natural sounds contain acoustic dynamics ranging from tens to hundreds of milliseconds. How does the human auditory system encode acoustic information over wide-ranging timescales to achieve sound recognition? Previous work (Teng et al. 2017) demonstrated a temporal coding preference for the theta and gamma ranges, but it remains unclear how acoustic dynamics between these two ranges are coded. Here, we generated artificial sounds with temporal structures over timescales from ~200 to ~30 ms and investigated temporal coding on different timescales. Participants discriminated sounds with temporal structures at different timescales while undergoing magnetoencephalography recording. Although considerable intertrial phase coherence can be induced by acoustic dynamics of all the timescales, classification analyses reveal that the acoustic information of all timescales is preferentially differentiated through the theta and gamma bands, but not through the alpha and beta bands; stimulus reconstruction shows that the acoustic dynamics in the theta and gamma ranges are preferentially coded. We demonstrate that the theta and gamma bands show the generality of temporal coding with comparable capacity. Our findings provide a novel perspective-acoustic information of all timescales is discretised into two discrete temporal chunks for further perceptual analysis.
Collapse
Affiliation(s)
- Xiangbin Teng
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, 60322 Frankfurt, Germany
| | - David Poeppel
- Department of Neuroscience, Max Planck Institute for Empirical Aesthetics, 60322 Frankfurt, Germany
- Department of Psychology, New York University, New York, NY 10003, USA
| |
Collapse
|
224
|
Guo Y, Bufacchi RJ, Novembre G, Kilintari M, Moayedi M, Hu L, Iannetti GD. Ultralow-frequency neural entrainment to pain. PLoS Biol 2020; 18:e3000491. [PMID: 32282798 PMCID: PMC7179945 DOI: 10.1371/journal.pbio.3000491] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Revised: 04/23/2020] [Accepted: 03/13/2020] [Indexed: 01/08/2023] Open
Abstract
Nervous systems exploit regularities in the sensory environment to predict sensory input, adjust behavior, and thereby maximize fitness. Entrainment of neural oscillations allows retaining temporal regularities of sensory information, a prerequisite for prediction. Entrainment has been extensively described at the frequencies of periodic inputs most commonly present in visual and auditory landscapes (e.g., >0.5 Hz). An open question is whether neural entrainment also occurs for regularities at much longer timescales. Here, we exploited the fact that the temporal dynamics of thermal stimuli in natural environment can unfold very slowly. We show that ultralow-frequency neural oscillations preserved a long-lasting trace of sensory information through neural entrainment to periodic thermo-nociceptive input as low as 0.1 Hz. Importantly, revealing the functional significance of this phenomenon, both power and phase of the entrainment predicted individual pain sensitivity. In contrast, periodic auditory input at the same ultralow frequency did not entrain ultralow-frequency oscillations. These results demonstrate that a functionally significant neural entrainment can occur at temporal scales far longer than those commonly explored. The non-supramodal nature of our results suggests that ultralow-frequency entrainment might be tuned to the temporal scale of the statistical regularities characteristic of different sensory modalities.
Collapse
Affiliation(s)
- Yifei Guo
- Neuroscience and Behaviour Laboratory, Istituto Italiano di Tecnologia, Rome, Italy
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Rory John Bufacchi
- Neuroscience and Behaviour Laboratory, Istituto Italiano di Tecnologia, Rome, Italy
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Giacomo Novembre
- Neuroscience and Behaviour Laboratory, Istituto Italiano di Tecnologia, Rome, Italy
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Marina Kilintari
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
| | - Massieh Moayedi
- Faculty of Dentistry, University of Toronto, Toronto, Canada
| | - Li Hu
- CAS Key Laboratory of Mental Health, Institute of Psychology, Chinese Academy of Sciences, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Gian Domenico Iannetti
- Neuroscience and Behaviour Laboratory, Istituto Italiano di Tecnologia, Rome, Italy
- Department of Neuroscience, Physiology and Pharmacology, University College London, London, United Kingdom
- * E-mail:
| |
Collapse
|
225
|
Streaming of Repeated Noise in Primary and Secondary Fields of Auditory Cortex. J Neurosci 2020; 40:3783-3798. [PMID: 32273487 DOI: 10.1523/jneurosci.2105-19.2020] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 02/06/2020] [Accepted: 02/11/2020] [Indexed: 11/21/2022] Open
Abstract
Statistical regularities in natural sounds facilitate the perceptual segregation of auditory sources, or streams. Repetition is one cue that drives stream segregation in humans, but the neural basis of this perceptual phenomenon remains unknown. We demonstrated a similar perceptual ability in animals by training ferrets of both sexes to detect a stream of repeating noise samples (foreground) embedded in a stream of random samples (background). During passive listening, we recorded neural activity in primary auditory cortex (A1) and secondary auditory cortex (posterior ectosylvian gyrus, PEG). We used two context-dependent encoding models to test for evidence of streaming of the repeating stimulus. The first was based on average evoked activity per noise sample and the second on the spectro-temporal receptive field. Both approaches tested whether differences in neural responses to repeating versus random stimuli were better modeled by scaling the response to both streams equally (global gain) or by separately scaling the response to the foreground versus background stream (stream-specific gain). Consistent with previous observations of adaptation, we found an overall reduction in global gain when the stimulus began to repeat. However, when we measured stream-specific changes in gain, responses to the foreground were enhanced relative to the background. This enhancement was stronger in PEG than A1. In A1, enhancement was strongest in units with low sparseness (i.e., broad sensory tuning) and with tuning selective for the repeated sample. Enhancement of responses to the foreground relative to the background provides evidence for stream segregation that emerges in A1 and is refined in PEG.SIGNIFICANCE STATEMENT To interact with the world successfully, the brain must parse behaviorally important information from a complex sensory environment. Complex mixtures of sounds often arrive at the ears simultaneously or in close succession, yet they are effortlessly segregated into distinct perceptual sources. This process breaks down in hearing-impaired individuals and speech recognition devices. By identifying the underlying neural mechanisms that facilitate perceptual segregation, we can develop strategies for ameliorating hearing loss and improving speech recognition technology in the presence of background noise. Here, we present evidence to support a hierarchical process, present in primary auditory cortex and refined in secondary auditory cortex, in which sound repetition facilitates segregation.
Collapse
|
226
|
Paul BT, Uzelac M, Chan E, Dimitrijevic A. Poor early cortical differentiation of speech predicts perceptual difficulties of severely hearing-impaired listeners in multi-talker environments. Sci Rep 2020; 10:6141. [PMID: 32273536 PMCID: PMC7145807 DOI: 10.1038/s41598-020-63103-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Accepted: 03/24/2020] [Indexed: 11/23/2022] Open
Abstract
Hearing impairment disrupts processes of selective attention that help listeners attend to one sound source over competing sounds in the environment. Hearing prostheses (hearing aids and cochlear implants, CIs), do not fully remedy these issues. In normal hearing, mechanisms of selective attention arise through the facilitation and suppression of neural activity that represents sound sources. However, it is unclear how hearing impairment affects these neural processes, which is key to understanding why listening difficulty remains. Here, severely-impaired listeners treated with a CI, and age-matched normal-hearing controls, attended to one of two identical but spatially separated talkers while multichannel EEG was recorded. Whereas neural representations of attended and ignored speech were differentiated at early (~ 150 ms) cortical processing stages in controls, differentiation of talker representations only occurred later (~250 ms) in CI users. CI users, but not controls, also showed evidence for spatial suppression of the ignored talker through lateralized alpha (7-14 Hz) oscillations. However, CI users' perceptual performance was only predicted by early-stage talker differentiation. We conclude that multi-talker listening difficulty remains for impaired listeners due to deficits in early-stage separation of cortical speech representations, despite neural evidence that they use spatial information to guide selective attention.
Collapse
Affiliation(s)
- Brandon T Paul
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
| | - Mila Uzelac
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Emmanuel Chan
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada
| | - Andrew Dimitrijevic
- Evaluative Clinical Sciences Platform, Sunnybrook Research Institute, Toronto, ON, M4N 3M5, Canada.
- Otolaryngology-Head and Neck Surgery, Sunnybrook Health Sciences Centre, Toronto, ON, M4N 3M5, Canada.
- Faculty of Medicine, Otolaryngology-Head and Neck Surgery, University of Toronto, Toronto, ON, M5S 1A1, Canada.
| |
Collapse
|
227
|
Vanheusden FJ, Kegler M, Ireland K, Georga C, Simpson DM, Reichenbach T, Bell SL. Hearing Aids Do Not Alter Cortical Entrainment to Speech at Audible Levels in Mild-to-Moderately Hearing-Impaired Subjects. Front Hum Neurosci 2020; 14:109. [PMID: 32317951 PMCID: PMC7147120 DOI: 10.3389/fnhum.2020.00109] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2019] [Accepted: 03/11/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Cortical entrainment to speech correlates with speech intelligibility and attention to a speech stream in noisy environments. However, there is a lack of data on whether cortical entrainment can help in evaluating hearing aid fittings for subjects with mild to moderate hearing loss. One particular problem that may arise is that hearing aids may alter the speech stimulus during (pre-)processing steps, which might alter cortical entrainment to the speech. Here, the effect of hearing aid processing on cortical entrainment to running speech in hearing impaired subjects was investigated. METHODOLOGY Seventeen native English-speaking subjects with mild-to-moderate hearing loss participated in the study. Hearing function and hearing aid fitting were evaluated using standard clinical procedures. Participants then listened to a 25-min audiobook under aided and unaided conditions at 70 dBA sound pressure level (SPL) in quiet conditions. EEG data were collected using a 32-channel system. Cortical entrainment to speech was evaluated using decoders reconstructing the speech envelope from the EEG data. Null decoders, obtained from EEG and the time-reversed speech envelope, were used to assess the chance level reconstructions. Entrainment in the delta- (1-4 Hz) and theta- (4-8 Hz) band, as well as wideband (1-20 Hz) EEG data was investigated. RESULTS Significant cortical responses could be detected for all but one subject in all three frequency bands under both aided and unaided conditions. However, no significant differences could be found between the two conditions in the number of responses detected, nor in the strength of cortical entrainment. The results show that the relatively small change in speech input provided by the hearing aid was not sufficient to elicit a detectable change in cortical entrainment. CONCLUSION For subjects with mild to moderate hearing loss, cortical entrainment to speech in quiet at an audible level is not affected by hearing aids. These results clear the pathway for exploring the potential to use cortical entrainment to running speech for evaluating hearing aid fitting at lower speech intensities (which could be inaudible when unaided), or using speech in noise conditions.
Collapse
Affiliation(s)
- Frederique J. Vanheusden
- Department of Engineering, School of Science and Technology, Nottingham Trent University, Nottingham, United Kingdom
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Mikolaj Kegler
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Katie Ireland
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - Constantina Georga
- Audiology Department, Royal Berkshire NHS Foundation Trust, Reading, United Kingdom
| | - David M. Simpson
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, United Kingdom
| | - Steven L. Bell
- Institute of Sound and Vibration Research, Faculty of Engineering and Physical Sciences, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
228
|
Bosker HR, Sjerps MJ, Reinisch E. Temporal contrast effects in human speech perception are immune to selective attention. Sci Rep 2020; 10:5607. [PMID: 32221376 PMCID: PMC7101381 DOI: 10.1038/s41598-020-62613-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 03/16/2020] [Indexed: 11/09/2022] Open
Abstract
Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|
229
|
Huang N, Elhilali M. Push-pull competition between bottom-up and top-down auditory attention to natural soundscapes. eLife 2020; 9:52984. [PMID: 32196457 PMCID: PMC7083598 DOI: 10.7554/elife.52984] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Accepted: 02/13/2020] [Indexed: 12/17/2022] Open
Abstract
In everyday social environments, demands on attentional resources dynamically shift to balance our attention to targets of interest while alerting us to important objects in our surrounds. The current study uses electroencephalography to explore how the push-pull interaction between top-down and bottom-up attention manifests itself in dynamic auditory scenes. Using natural soundscapes as distractors while subjects attend to a controlled rhythmic sound sequence, we find that salient events in background scenes significantly suppress phase-locking and gamma responses to the attended sequence, countering enhancement effects observed for attended targets. In line with a hypothesis of limited attentional resources, the modulation of neural activity by bottom-up attention is graded by degree of salience of ambient events. The study also provides insights into the interplay between endogenous and exogenous attention during natural soundscapes, with both forms of attention engaging a common fronto-parietal network at different time lags.
Collapse
Affiliation(s)
- Nicholas Huang
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| | - Mounya Elhilali
- Laboratory for Computational Audio Perception, Department of Electrical Engineering, Johns Hopkins University, Baltimore, United States
| |
Collapse
|
230
|
Formisano E, Hausfeld L. The Dialog of Primary and Non-primary Auditory Cortex at the 'Cocktail Party'. Neuron 2020; 104:1029-1031. [PMID: 31951534 DOI: 10.1016/j.neuron.2019.11.031] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In this issue of Neuron, O'Sullivan et al. (2019) measured electro-cortical responses to "cocktail party" speech mixtures in neurosurgical patients and demonstrated that the selective enhancement of attended speech is achieved through the adaptive weighting of primary auditory cortex output by non-primary auditory cortex.
Collapse
Affiliation(s)
- Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, PO Box 616, 6200 Maastricht, the Netherlands; Maastricht Brain Imaging Centre, 6200 Maastricht, the Netherlands; Maastricht Centre for Systems Biology, 6200, Maastricht, the Netherlands.
| | - Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, PO Box 616, 6200 Maastricht, the Netherlands; Maastricht Brain Imaging Centre, 6200 Maastricht, the Netherlands
| |
Collapse
|
231
|
Di Liberto GM, Pelofi C, Bianco R, Patel P, Mehta AD, Herrero JL, de Cheveigné A, Shamma S, Mesgarani N. Cortical encoding of melodic expectations in human temporal cortex. eLife 2020; 9:e51784. [PMID: 32122465 PMCID: PMC7053998 DOI: 10.7554/elife.51784] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 01/20/2020] [Indexed: 01/14/2023] Open
Abstract
Humans engagement in music rests on underlying elements such as the listeners' cultural background and interest in music. These factors modulate how listeners anticipate musical events, a process inducing instantaneous neural responses as the music confronts these expectations. Measuring such neural correlates would represent a direct window into high-level brain processing. Here we recorded cortical signals as participants listened to Bach melodies. We assessed the relative contributions of acoustic versus melodic components of the music to the neural signal. Melodic features included information on pitch progressions and their tempo, which were extracted from a predictive model of musical structure based on Markov chains. We related the music to brain activity with temporal response functions demonstrating, for the first time, distinct cortical encoding of pitch and note-onset expectations during naturalistic music listening. This encoding was most pronounced at response latencies up to 350 ms, and in both planum temporale and Heschl's gyrus.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
| | - Claire Pelofi
- Department of Psychology, New York UniversityNew YorkUnited States
- Institut de Neurosciences des Système, UMR S 1106, INSERM, Aix Marseille UniversitéMarseilleFrance
| | | | - Prachi Patel
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Ashesh D Mehta
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/NorthwellManhassetUnited States
- Feinstein Institute of Medical Research, Northwell HealthManhassetUnited States
| | - Jose L Herrero
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/NorthwellManhassetUnited States
- Feinstein Institute of Medical Research, Northwell HealthManhassetUnited States
| | - Alain de Cheveigné
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
- UCL Ear InstituteLondonUnited Kingdom
| | - Shihab Shamma
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
- Institute for Systems Research, Electrical and Computer Engineering, University of MarylandCollege ParkUnited States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
232
|
Sutojo S, Par S, Schoenmaker E. Contribution of binaural masking release to improved speech intelligibility for different masker types. Eur J Neurosci 2020; 51:1339-1352. [DOI: 10.1111/ejn.13980] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Revised: 04/23/2018] [Accepted: 05/18/2018] [Indexed: 11/28/2022]
Affiliation(s)
- Sarinah Sutojo
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| | - Steven Par
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| | - Esther Schoenmaker
- Acoustics Group, Cluster of Excellence Hearing4all Carl von Ossietzky University Oldenburg Germany
| |
Collapse
|
233
|
Fu D, Weber C, Yang G, Kerzel M, Nan W, Barros P, Wu H, Liu X, Wermter S. What Can Computational Models Learn From Human Selective Attention? A Review From an Audiovisual Unimodal and Crossmodal Perspective. Front Integr Neurosci 2020; 14:10. [PMID: 32174816 PMCID: PMC7056875 DOI: 10.3389/fnint.2020.00010] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 02/11/2020] [Indexed: 11/13/2022] Open
Abstract
Selective attention plays an essential role in information acquisition and utilization from the environment. In the past 50 years, research on selective attention has been a central topic in cognitive science. Compared with unimodal studies, crossmodal studies are more complex but necessary to solve real-world challenges in both human experiments and computational modeling. Although an increasing number of findings on crossmodal selective attention have shed light on humans' behavioral patterns and neural underpinnings, a much better understanding is still necessary to yield the same benefit for intelligent computational agents. This article reviews studies of selective attention in unimodal visual and auditory and crossmodal audiovisual setups from the multidisciplinary perspectives of psychology and cognitive neuroscience, and evaluates different ways to simulate analogous mechanisms in computational models and robotics. We discuss the gaps between these fields in this interdisciplinary review and provide insights about how to use psychological findings and theories in artificial intelligence from different perspectives.
Collapse
Affiliation(s)
- Di Fu
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Cornelius Weber
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Guochun Yang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Matthias Kerzel
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Weizhi Nan
- Department of Psychology, Center for Brain and Cognitive Sciences, School of Education, Guangzhou University, Guangzhou, China
| | - Pablo Barros
- Department of Informatics, University of Hamburg, Hamburg, Germany
| | - Haiyan Wu
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Xun Liu
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing, China
- Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Stefan Wermter
- Department of Informatics, University of Hamburg, Hamburg, Germany
| |
Collapse
|
234
|
Effects of Sensorineural Hearing Loss on Cortical Synchronization to Competing Speech during Selective Attention. J Neurosci 2020; 40:2562-2572. [PMID: 32094201 PMCID: PMC7083526 DOI: 10.1523/jneurosci.1936-19.2020] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 01/17/2020] [Accepted: 01/30/2020] [Indexed: 11/21/2022] Open
Abstract
When selectively attending to a speech stream in multi-talker scenarios, low-frequency cortical activity is known to synchronize selectively to fluctuations in the attended speech signal. Older listeners with age-related sensorineural hearing loss (presbycusis) often struggle to understand speech in such situations, even when wearing a hearing aid. Yet, it is unclear whether a peripheral hearing loss degrades the attentional modulation of cortical speech tracking. Here, we used psychoacoustics and electroencephalography (EEG) in male and female human listeners to examine potential effects of hearing loss on EEG correlates of speech envelope synchronization in cortex. Behaviorally, older hearing-impaired (HI) listeners showed degraded speech-in-noise recognition and reduced temporal acuity compared with age-matched normal-hearing (NH) controls. During EEG recordings, we used a selective attention task with two spatially separated simultaneous speech streams where NH and HI listeners both showed high speech recognition performance. Low-frequency (<10 Hz) envelope-entrained EEG responses were enhanced in the HI listeners, both for the attended speech, but also for tone sequences modulated at slow rates (4 Hz) during passive listening. Compared with the attended speech, responses to the ignored stream were found to be reduced in both HI and NH listeners, allowing for the attended target to be classified from single-trial EEG data with similar high accuracy in the two groups. However, despite robust attention-modulated speech entrainment, the HI listeners rated the competing speech task to be more difficult. These results suggest that speech-in-noise problems experienced by older HI listeners are not necessarily associated with degraded attentional selection. SIGNIFICANCE STATEMENT People with age-related sensorineural hearing loss often struggle to follow speech in the presence of competing talkers. It is currently unclear whether hearing impairment may impair the ability to use selective attention to suppress distracting speech in situations when the distractor is well segregated from the target. Here, we report amplified envelope-entrained cortical EEG responses to attended speech and to simple tones modulated at speech rates (4 Hz) in listeners with age-related hearing loss. Critically, despite increased self-reported listening difficulties, cortical synchronization to speech mixtures was robustly modulated by selective attention in listeners with hearing loss. This allowed the attended talker to be classified from single-trial EEG responses with high accuracy in both older hearing-impaired listeners and age-matched normal-hearing controls.
Collapse
|
235
|
Bosker HR, Cooke M. Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:721. [PMID: 32113258 DOI: 10.1121/10.0000646] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 01/10/2020] [Indexed: 06/10/2023]
Abstract
Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech produce more pronounced amplitude modulations in noise vs in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Psychology of Language department, Max Planck Institute for Psycholinguistics, Wundtlaan 1, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Martin Cooke
- Language and Speech Laboratory, Universidad del País Vasco, calle Justo Vélez de Elorriaga 1, Vitoria, 01006, Spain
| |
Collapse
|
236
|
Donhauser PW, Baillet S. Two Distinct Neural Timescales for Predictive Speech Processing. Neuron 2020; 105:385-393.e9. [PMID: 31806493 PMCID: PMC6981026 DOI: 10.1016/j.neuron.2019.10.019] [Citation(s) in RCA: 72] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Revised: 08/21/2019] [Accepted: 10/10/2019] [Indexed: 11/29/2022]
Abstract
During speech listening, the brain could use contextual predictions to optimize sensory sampling and processing. We asked if such predictive processing is organized dynamically into separate oscillatory timescales. We trained a neural network that uses context to predict speech at the phoneme level. Using this model, we estimated contextual uncertainty and surprise of natural speech as factors to explain neurophysiological activity in human listeners. We show, first, that speech-related activity is hierarchically organized into two timescales: fast responses (theta: 4-10 Hz), restricted to early auditory regions, and slow responses (delta: 0.5-4 Hz), dominating in downstream auditory regions. Neural activity in these bands is selectively modulated by predictions: the gain of early theta responses varies according to the contextual uncertainty of speech, while later delta responses are selective to surprising speech inputs. We conclude that theta sensory sampling is tuned to maximize expected information gain, while delta encodes only non-redundant information. VIDEO ABSTRACT.
Collapse
Affiliation(s)
- Peter W Donhauser
- McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montréal, QC H3A 2B4, Canada.
| | - Sylvain Baillet
- McConnell Brain Imaging Centre, Montreal Neurological Institute, McGill University, Montréal, QC H3A 2B4, Canada.
| |
Collapse
|
237
|
Parthasarathy A, Hancock KE, Bennett K, DeGruttola V, Polley DB. Bottom-up and top-down neural signatures of disordered multi-talker speech perception in adults with normal hearing. eLife 2020; 9:e51419. [PMID: 31961322 PMCID: PMC6974362 DOI: 10.7554/elife.51419] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Accepted: 12/15/2019] [Indexed: 12/16/2022] Open
Abstract
In social settings, speech waveforms from nearby speakers mix together in our ear canals. Normally, the brain unmixes the attended speech stream from the chorus of background speakers using a combination of fast temporal processing and cognitive active listening mechanisms. Of >100,000 patient records,~10% of adults visited our clinic because of reduced hearing, only to learn that their hearing was clinically normal and should not cause communication difficulties. We found that multi-talker speech intelligibility thresholds varied widely in normal hearing adults, but could be predicted from neural phase-locking to frequency modulation (FM) cues measured with ear canal EEG recordings. Combining neural temporal fine structure processing, pupil-indexed listening effort, and behavioral FM thresholds accounted for 78% of the variability in multi-talker speech intelligibility. The disordered bottom-up and top-down markers of poor multi-talker speech perception identified here could inform the design of next-generation clinical tests for hidden hearing disorders.
Collapse
Affiliation(s)
- Aravindakshan Parthasarathy
- Eaton-Peabody LaboratoriesMassachusetts Eye and Ear InfirmaryBostonUnited States
- Department of Otolaryngology – Head and Neck SurgeryHarvard Medical SchoolBostonUnited States
| | - Kenneth E Hancock
- Eaton-Peabody LaboratoriesMassachusetts Eye and Ear InfirmaryBostonUnited States
- Department of Otolaryngology – Head and Neck SurgeryHarvard Medical SchoolBostonUnited States
| | - Kara Bennett
- Bennett Statistical Consulting IncBallstonUnited States
| | - Victor DeGruttola
- Department of BiostatisticsHarvard TH Chan School of Public HealthBostonUnited States
| | - Daniel B Polley
- Eaton-Peabody LaboratoriesMassachusetts Eye and Ear InfirmaryBostonUnited States
- Department of Otolaryngology – Head and Neck SurgeryHarvard Medical SchoolBostonUnited States
| |
Collapse
|
238
|
Kaneshiro B, Nguyen DT, Norcia AM, Dmochowski JP, Berger J. Natural music evokes correlated EEG responses reflecting temporal structure and beat. Neuroimage 2020; 214:116559. [PMID: 31978543 DOI: 10.1016/j.neuroimage.2020.116559] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Revised: 12/23/2019] [Accepted: 01/14/2020] [Indexed: 11/17/2022] Open
Abstract
The brain activity of multiple subjects has been shown to synchronize during salient moments of natural stimuli, suggesting that correlation of neural responses indexes a brain state operationally termed 'engagement'. While past electroencephalography (EEG) studies have considered both auditory and visual stimuli, the extent to which these results generalize to music-a temporally structured stimulus for which the brain has evolved specialized circuitry-is less understood. Here we investigated neural correlation during natural music listening by recording EEG responses from N=48 adult listeners as they heard real-world musical works, some of which were temporally disrupted through shuffling of short-term segments (measures), reversal, or randomization of phase spectra. We measured correlation between multiple neural responses (inter-subject correlation) and between neural responses and stimulus envelope fluctuations (stimulus-response correlation) in the time and frequency domains. Stimuli retaining basic musical features, such as rhythm and melody, elicited significantly higher behavioral ratings and neural correlation than did phase-scrambled controls. However, while unedited songs were self-reported as most pleasant, time-domain correlations were highest during measure-shuffled versions. Frequency-domain measures of correlation (coherence) peaked at frequencies related to the musical beat, although the magnitudes of these spectral peaks did not explain the observed temporal correlations. Our findings show that natural music evokes significant inter-subject and stimulus-response correlations, and suggest that the neural correlates of musical 'engagement' may be distinct from those of enjoyment.
Collapse
Affiliation(s)
- Blair Kaneshiro
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA, USA; Center for the Study of Language and Information, Stanford University, Stanford, CA, USA; Department of Otolaryngology Head & Neck Surgery, Stanford University School of Medicine, Palo Alto, CA, USA.
| | - Duc T Nguyen
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA, USA; Center for the Study of Language and Information, Stanford University, Stanford, CA, USA; Department of Biomedical Engineering, City College of New York, New York, NY, USA
| | - Anthony M Norcia
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Jacek P Dmochowski
- Department of Biomedical Engineering, City College of New York, New York, NY, USA; Department of Psychology, Stanford University, Stanford, CA, USA
| | - Jonathan Berger
- Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA, USA
| |
Collapse
|
239
|
Keshavarzi M, Kegler M, Kadir S, Reichenbach T. Transcranial alternating current stimulation in the theta band but not in the delta band modulates the comprehension of naturalistic speech in noise. Neuroimage 2020; 210:116557. [PMID: 31968233 DOI: 10.1016/j.neuroimage.2020.116557] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 01/13/2020] [Accepted: 01/14/2020] [Indexed: 01/26/2023] Open
Abstract
Auditory cortical activity entrains to speech rhythms and has been proposed as a mechanism for online speech processing. In particular, neural activity in the theta frequency band (4-8 Hz) tracks the onset of syllables which may aid the parsing of a speech stream. Similarly, cortical activity in the delta band (1-4 Hz) entrains to the onset of words in natural speech and has been found to encode both syntactic as well as semantic information. Such neural entrainment to speech rhythms is not merely an epiphenomenon of other neural processes, but plays a functional role in speech processing: modulating the neural entrainment through transcranial alternating current stimulation influences the speech-related neural activity and modulates the comprehension of degraded speech. However, the distinct functional contributions of the delta- and of the theta-band entrainment to the modulation of speech comprehension have not yet been investigated. Here we use transcranial alternating current stimulation with waveforms derived from the speech envelope and filtered in the delta and theta frequency bands to alter cortical entrainment in both bands separately. We find that transcranial alternating current stimulation in the theta band but not in the delta band impacts speech comprehension. Moreover, we find that transcranial alternating current stimulation with the theta-band portion of the speech envelope can improve speech-in-noise comprehension beyond sham stimulation. Our results show a distinct contribution of the theta- but not of the delta-band stimulation to the modulation of speech comprehension. In addition, our findings open up a potential avenue of enhancing the comprehension of speech in noise.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ, London, UK
| | - Mikolaj Kegler
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ, London, UK
| | - Shabnam Kadir
- School of Engineering and Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, AL10 9AB, UK
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, SW7 2AZ, London, UK.
| |
Collapse
|
240
|
Presacco A, Miran S, Babadi B, Simon JZ. Real-Time Tracking of Magnetoencephalographic Neuromarkers during a Dynamic Attention-Switching Task. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:4148-4151. [PMID: 31946783 DOI: 10.1109/embc.2019.8857953] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In the last few years, a large number of experiments have been focused on exploring the possibility of using non-invasive techniques, such as electroencephalography (EEG) and magnetoencephalography (MEG), to identify auditory-related neuromarkers which are modulated by attention. Results from several studies where participants listen to a story narrated by one speaker, while trying to ignore a different story narrated by a competing speaker, suggest the feasibility of extracting neuromarkers that demonstrate enhanced phase locking to the attended speech stream. These promising findings have the potential to be used in clinical applications, such as EEG-driven hearing aids. One major challenge in achieving this goal is the need to devise an algorithm capable of tracking these neuromarkers in real-time when individuals are given the freedom to repeatedly switch attention among speakers at will. Here we present an algorithm pipeline that is designed to efficiently recognize changes of neural speech tracking during a dynamic-attention switching task and to use them as an input for a near real-time state-space model that translates these neuromarkers into attentional state estimates with a minimal delay. This algorithm pipeline was tested with MEG data collected from participants who had the freedom to change the focus of their attention between two speakers at will. Results suggest the feasibility of using our algorithm pipeline to track changes of attention in near-real time in a dynamic auditory scene.
Collapse
|
241
|
Bellur A, Elhilali M. Audio object classification using distributed beliefs and attention. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2020; 28:729-739. [PMID: 33564695 PMCID: PMC7869589 DOI: 10.1109/taslp.2020.2966867] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
One of the unique characteristics of human hearing is its ability to recognize acoustic objects even in presence of severe noise and distortions. In this work, we explore two mechanisms underlying this ability: 1) redundant mapping of acoustic waveforms along distributed latent representations and 2) adaptive feedback based on prior knowledge to selectively attend to targets of interest. We propose a bio-mimetic account of acoustic object classification by developing a novel distributed deep belief network validated for the task of robust acoustic object classification using the UrbanSound database. The proposed distributed belief network (DBN) encompasses an array of independent sub-networks trained generatively to capture different abstractions of natural sounds. A supervised classifier then performs a readout of this distributed mapping. The overall architecture not only matches the state of the art system for acoustic object classification but leads to significant improvement over the baseline in mismatched noisy conditions (31.4% relative improvement in 0dB conditions). Furthermore, we incorporate mechanisms of attentional feedback that allows the DBN to deploy local memories of sounds targets estimated at multiple views to bias network activation when attending to a particular object. This adaptive feedback results in further improvement of object classification in unseen noise conditions (relative improvement of 54% over the baseline in 0dB conditions).
Collapse
Affiliation(s)
- Ashwin Bellur
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| | - Mounya Elhilali
- Department of Electrical and Computer Engineering, Laboratory for Computational Audio Perception, Johns Hopkins University
| |
Collapse
|
242
|
Das P, Brodbeck C, Simon JZ, Babadi B. Neuro-current response functions: A unified approach to MEG source analysis under the continuous stimuli paradigm. Neuroimage 2020; 211:116528. [PMID: 31945510 DOI: 10.1016/j.neuroimage.2020.116528] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Revised: 11/16/2019] [Accepted: 01/07/2020] [Indexed: 11/25/2022] Open
Abstract
Characterizing the neural dynamics underlying sensory processing is one of the central areas of investigation in systems and cognitive neuroscience. Neuroimaging techniques such as magnetoencephalography (MEG) and Electroencephalography (EEG) have provided significant insights into the neural processing of continuous stimuli, such as speech, thanks to their high temporal resolution. Existing work in the context of auditory processing suggests that certain features of speech, such as the acoustic envelope, can be used as reliable linear predictors of the neural response manifested in M/EEG. The corresponding linear filters are referred to as temporal response functions (TRFs). While the functional roles of specific components of the TRF are well-studied and linked to behavioral attributes such as attention, the cortical origins of the underlying neural processes are not as well understood. In this work, we address this issue by estimating a linear filter representation of cortical sources directly from neuroimaging data in the context of continuous speech processing. To this end, we introduce Neuro-Current Response Functions (NCRFs), a set of linear filters, spatially distributed throughout the cortex, that predict the cortical currents giving rise to the observed ongoing MEG (or EEG) data in response to continuous speech. NCRF estimation is cast within a Bayesian framework, which allows unification of the TRF and source estimation problems, and also facilitates the incorporation of prior information on the structural properties of the NCRFs. To generalize this analysis to M/EEG recordings which lack individual structural magnetic resonance (MR) scans, NCRFs are extended to free-orientation dipoles and a novel regularizing scheme is put forward to lessen reliance on fine-tuned coordinate co-registration. We present a fast estimation algorithm, which we refer to as the Champ-Lasso algorithm, by leveraging recent advances in optimization, and demonstrate its utility through application to simulated and experimentally recorded MEG data under auditory experiments. Our simulation studies reveal significant improvements over existing methods that typically operate in a two-stage fashion, in terms of spatial resolution, response function reconstruction, and recovering dipole orientations. The analysis of experimentally-recorded MEG data without MR scans corroborates existing findings, but also delineates the distinct cortical distribution of the underlying neural processes at high spatiotemporal resolution. In summary, we provide a principled modeling and estimation paradigm for MEG source analysis tailored to extracting the cortical origin of electrophysiological responses to continuous stimuli.
Collapse
Affiliation(s)
- Proloy Das
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA; Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA.
| | - Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA.
| | - Jonathan Z Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA; Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA; Department of Biology, University of Maryland, College Park, MD, 20742, USA.
| | - Behtash Babadi
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD, 20742, USA; Institute for Systems Research, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
243
|
Weissbart H, Kandylaki KD, Reichenbach T. Cortical Tracking of Surprisal during Continuous Speech Comprehension. J Cogn Neurosci 2020; 32:155-166. [DOI: 10.1162/jocn_a_01467] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Abstract
Speech comprehension requires rapid online processing of a continuous acoustic signal to extract structure and meaning. Previous studies on sentence comprehension have found neural correlates of the predictability of a word given its context, as well as of the precision of such a prediction. However, they have focused on single sentences and on particular words in those sentences. Moreover, they compared neural responses to words with low and high predictability, as well as with low and high precision. However, in speech comprehension, a listener hears many successive words whose predictability and precision vary over a large range. Here, we show that cortical activity in different frequency bands tracks word surprisal in continuous natural speech and that this tracking is modulated by precision. We obtain these results through quantifying surprisal and precision from naturalistic speech using a deep neural network and through relating these speech features to EEG responses of human volunteers acquired during auditory story comprehension. We find significant cortical tracking of surprisal at low frequencies, including the delta band as well as in the higher frequency beta and gamma bands, and observe that the tracking is modulated by the precision. Our results pave the way to further investigate the neurobiology of natural speech comprehension.
Collapse
|
244
|
Das N, Vanthornhout J, Francart T, Bertrand A. Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research. Neuroimage 2020; 204:116211. [PMID: 31546052 PMCID: PMC7355237 DOI: 10.1016/j.neuroimage.2019.116211] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 08/30/2019] [Accepted: 09/17/2019] [Indexed: 12/21/2022] Open
Abstract
A common problem in neural recordings is the low signal-to-noise ratio (SNR), particularly when using non-invasive techniques like magneto- or electroencephalography (M/EEG). To address this problem, experimental designs often include repeated trials, which are then averaged to improve the SNR or to infer statistics that can be used in the design of a denoising spatial filter. However, collecting enough repeated trials is often impractical and even impossible in some paradigms, while analyses on existing data sets may be hampered when these do not contain such repeated trials. Therefore, we present a data-driven method that takes advantage of the knowledge of the presented stimulus, to achieve a joint noise reduction and dimensionality reduction without the need for repeated trials. The method first estimates the stimulus-driven neural response using the given stimulus, which is then used to find a set of spatial filters that maximize the SNR based on a generalized eigenvalue decomposition. As the method is fully data-driven, the dimensionality reduction enables researchers to perform their analyses without having to rely on their knowledge of brain regions of interest, which increases accuracy and reduces the human factor in the results. In the context of neural tracking of a speech stimulus using EEG, our method resulted in more accurate short-term temporal response function (TRF) estimates, higher correlations between predicted and actual neural responses, and higher attention decoding accuracies compared to existing TRF-based decoding methods. We also provide an extensive discussion on the central role played by the generalized eigenvalue decomposition in various denoising methods in the literature, and address the conceptual similarities and differences with our proposed method.
Collapse
Affiliation(s)
- Neetha Das
- Dept. Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium; Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium.
| | - Jonas Vanthornhout
- Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium
| | - Tom Francart
- Dept. Neurosciences, ExpORL, KU Leuven, Herestraat 49 Bus 721, B-3000, Leuven, Belgium
| | - Alexander Bertrand
- Dept. Electrical Engineering (ESAT), Stadius Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven, Kasteelpark Arenberg 10, B-3001, Leuven, Belgium.
| |
Collapse
|
245
|
Kadir S, Kaza C, Weissbart H, Reichenbach T. Modulation of Speech-in-Noise Comprehension Through Transcranial Current Stimulation With the Phase-Shifted Speech Envelope. IEEE Trans Neural Syst Rehabil Eng 2019; 28:23-31. [PMID: 31751277 PMCID: PMC7001147 DOI: 10.1109/tnsre.2019.2939671] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
Neural activity tracks the envelope of a speech signal at latencies from 50 ms to 300 ms. Modulating this neural tracking through transcranial alternating current stimulation influences speech comprehension. Two important variables that can affect this modulation are the latency and the phase of the stimulation with respect to the sound. While previous studies have found an influence of both variables on speech comprehension, the interaction between both has not yet been measured. We presented 17 subjects with speech in noise coupled with simultaneous transcranial alternating current stimulation. The currents were based on the envelope of the target speech but shifted by different phases, as well as by two temporal delays of 100 ms and 250 ms. We also employed various control stimulations, and assessed the signal-to-noise ratio at which the subject understood half of the speech. We found that, at both latencies, speech comprehension is modulated by the phase of the current stimulation. However, the form of the modulation differed between the two latencies. Phase and latency of neurostimulation have accordingly distinct influences on speech comprehension. The different effects at the latencies of 100 ms and 250 ms hint at distinct neural processes for speech processing.
Collapse
|
246
|
Martin S, Mikutta C, Leonard MK, Hungate D, Koelsch S, Shamma S, Chang EF, Millán JDR, Knight RT, Pasley BN. Neural Encoding of Auditory Features during Music Perception and Imagery. Cereb Cortex 2019; 28:4222-4233. [PMID: 29088345 DOI: 10.1093/cercor/bhx277] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Indexed: 11/12/2022] Open
Abstract
Despite many behavioral and neuroimaging investigations, it remains unclear how the human cortex represents spectrotemporal sound features during auditory imagery, and how this representation compares to auditory perception. To assess this, we recorded electrocorticographic signals from an epileptic patient with proficient music ability in 2 conditions. First, the participant played 2 piano pieces on an electronic piano with the sound volume of the digital keyboard on. Second, the participant replayed the same piano pieces, but without auditory feedback, and the participant was asked to imagine hearing the music in his mind. In both conditions, the sound output of the keyboard was recorded, thus allowing precise time-locking between the neural activity and the spectrotemporal content of the music imagery. This novel task design provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during auditory mental imagery. In both conditions, we built encoding models to predict high gamma neural activity (70-150 Hz) from the spectrogram representation of the recorded sound. We found robust spectrotemporal receptive fields during auditory imagery with substantial, but not complete overlap in frequency tuning and cortical location compared to receptive fields measured during auditory perception.
Collapse
Affiliation(s)
- Stephanie Martin
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| | - Christian Mikutta
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.,Translational Research Center and Division of Clinical Research Support, Psychiatric Services University of Bern (UPD), University Hospital of Psychiatry, Bern, Switzerland.,Department of Neurology, Inselspital, Bern, University Hospital, University of Bern, Bern, Switzerland
| | - Matthew K Leonard
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | - Dylan Hungate
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | | | - Shihab Shamma
- Département d'études cognitives, École normale supérieure, PSL Research University, Paris, France.,Electrical and Computer Engineering & Institute for Systems Research, Univ. of Maryland in College Park, MD, USA
| | - Edward F Chang
- Department of Neurological Surgery, Department of Physiology, and Center for Integrative Neuroscience, University of California, San Francisco, CA, USA
| | - José Del R Millán
- Defitech Chair in Brain-Machine Interface, Center for Neuroprosthetics, Ecole Polytechnique Fe´de´rale de Lausanne, Lausanne, Switzerland
| | - Robert T Knight
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA.,Department of Psychology, University of California, Berkeley, CA, USA
| | - Brian N Pasley
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA, USA
| |
Collapse
|
247
|
Shavit-Cohen K, Zion Golumbic E. The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment. Front Hum Neurosci 2019; 13:386. [PMID: 31780911 PMCID: PMC6857110 DOI: 10.3389/fnhum.2019.00386] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Accepted: 10/16/2019] [Indexed: 12/18/2022] Open
Abstract
Focusing attention on one speaker on the background of other irrelevant speech can be a challenging feat. A longstanding question in attention research is whether and how frequently individuals shift their attention towards task-irrelevant speech, arguably leading to occasional detection of words in a so-called unattended message. However, this has been difficult to gauge empirically, particularly when participants attend to continuous natural speech, due to the lack of appropriate metrics for detecting shifts in internal attention. Here we introduce a new experimental platform for studying the dynamic deployment of attention among concurrent speakers, utilizing a unique combination of Virtual Reality (VR) and Eye-Tracking technology. We created a Virtual Café in which participants sit across from and attend to the narrative of a target speaker. We manipulated the number and location of distractor speakers by placing additional characters throughout the Virtual Café. By monitoring participant's eye-gaze dynamics, we studied the patterns of overt attention-shifts among concurrent speakers as well as the consequences of these shifts on speech comprehension. Our results reveal important individual differences in the gaze-pattern displayed during selective attention to speech. While some participants stayed fixated on a target speaker throughout the entire experiment, approximately 30% of participants frequently shifted their gaze toward distractor speakers or other locations in the environment, regardless of the severity of audiovisual distraction. Critically, preforming frequent gaze-shifts negatively impacted the comprehension of target speech, and participants made more mistakes when looking away from the target speaker. We also found that gaze-shifts occurred primarily during gaps in the acoustic input, suggesting that momentary reductions in acoustic masking prompt attention-shifts between competing speakers, in line with "glimpsing" theories of processing speech in noise. These results open a new window into understanding the dynamics of attention as they wax and wane over time, and the different listening patterns employed for dealing with the influx of sensory input in multisensory environments. Moreover, the novel approach developed here for tracking the locus of momentary attention in a naturalistic virtual-reality environment holds high promise for extending the study of human behavior and cognition and bridging the gap between the laboratory and real-life.
Collapse
Affiliation(s)
| | - Elana Zion Golumbic
- The Gonda Multidisciplinary Brain Research Center, Bar Ilan University, Ramat Gan, Israel
| |
Collapse
|
248
|
Fu Z, Wu X, Chen J. Congruent audiovisual speech enhances auditory attention decoding with EEG. J Neural Eng 2019; 16:066033. [PMID: 31505476 DOI: 10.1088/1741-2552/ab4340] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
OBJECTIVE The auditory attention decoding (AAD) approach can be used to determine the identity of the attended speaker during an auditory selective attention task, by analyzing measurements of electroencephalography (EEG) data. The AAD approach has the potential to guide the design of speech enhancement algorithms in hearing aids, i.e. to identify the speech stream of listener's interest so that hearing aids algorithms can amplify the target speech and attenuate other distracting sounds. This would consequently result in improved speech understanding and communication and reduced cognitive load, etc. The present work aimed to investigate whether additional visual input (i.e. lipreading) would enhance the AAD performance for normal-hearing listeners. APPROACH In a two-talker scenario, where auditory stimuli of audiobooks narrated by two speakers were presented, multi-channel EEG signals were recorded while participants were selectively attending to one speaker and ignoring the other one. Speakers' mouth movements were recorded during narrating for providing visual stimuli. Stimulus conditions included audio-only, visual input congruent with either (i.e. attended or unattended) speaker, and visual input incongruent with either speaker. The AAD approach was performed separately for each condition to evaluate the effect of additional visual input on AAD. MAIN RESULTS Relative to the audio-only condition, the AAD performance was found improved by visual input only when it was congruent with the attended speech stream, and the improvement was about 14 percentage points on decoding accuracy. Cortical envelope tracking activities in both auditory and visual cortex were demonstrated stronger for the congruent audiovisual speech condition than other conditions. In addition, a higher AAD robustness was revealed for the congruent audiovisual condition, with reduced channel number and trial duration achieving higher accuracy than the audio-only condition. SIGNIFICANCE The present work complements previous studies and further manifests the feasibility of the AAD-guided design of hearing aids in daily face-to-face conversations. The present work also has a directive significance for designing a low-density EEG setup for the AAD approach.
Collapse
Affiliation(s)
- Zhen Fu
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception (Ministry of Education), Peking University, Beijing 100871, People's Republic of China
| | | | | |
Collapse
|
249
|
Coffey EBJ, Nicol T, White-Schwoch T, Chandrasekaran B, Krizman J, Skoe E, Zatorre RJ, Kraus N. Evolving perspectives on the sources of the frequency-following response. Nat Commun 2019; 10:5036. [PMID: 31695046 PMCID: PMC6834633 DOI: 10.1038/s41467-019-13003-w] [Citation(s) in RCA: 103] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2019] [Accepted: 10/14/2019] [Indexed: 11/09/2022] Open
Abstract
The auditory frequency-following response (FFR) is a non-invasive index of the fidelity of sound encoding in the brain, and is used to study the integrity, plasticity, and behavioral relevance of the neural encoding of sound. In this Perspective, we review recent evidence suggesting that, in humans, the FFR arises from multiple cortical and subcortical sources, not just subcortically as previously believed, and we illustrate how the FFR to complex sounds can enhance the wider field of auditory neuroscience. Far from being of use only to study basic auditory processes, the FFR is an uncommonly multifaceted response yielding a wealth of information, with much yet to be tapped.
Collapse
Affiliation(s)
- Emily B J Coffey
- Department of Psychology, Concordia University, 1455 Boulevard de Maisonneuve Ouest, Montréal, QC, H3G 1M8, Canada.
- International Laboratory for Brain, Music, and Sound Research (BRAMS), Montréal, QC, Canada.
- Centre for Research on Brain, Language and Music (CRBLM), McGill University, 3640 de la Montagne, Montréal, QC, H3G 2A8, Canada.
| | - Trent Nicol
- Auditory Neuroscience Laboratory, Department of Communication Sciences, Northwestern University, 2240 Campus Dr., Evanston, IL, 60208, USA
| | - Travis White-Schwoch
- Auditory Neuroscience Laboratory, Department of Communication Sciences, Northwestern University, 2240 Campus Dr., Evanston, IL, 60208, USA
| | - Bharath Chandrasekaran
- Communication Sciences and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh, Forbes Tower, 3600 Atwood St, Pittsburgh, PA, 15260, USA
| | - Jennifer Krizman
- Auditory Neuroscience Laboratory, Department of Communication Sciences, Northwestern University, 2240 Campus Dr., Evanston, IL, 60208, USA
| | - Erika Skoe
- Department of Speech, Language, and Hearing Sciences, The Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 2 Alethia Drive, Unit 1085, Storrs, CT, 06269, USA
| | - Robert J Zatorre
- International Laboratory for Brain, Music, and Sound Research (BRAMS), Montréal, QC, Canada
- Centre for Research on Brain, Language and Music (CRBLM), McGill University, 3640 de la Montagne, Montréal, QC, H3G 2A8, Canada
- Montreal Neurological Institute, McGill University, 3801 rue Université, Montréal, QC, H3A 2B4, Canada
| | - Nina Kraus
- Auditory Neuroscience Laboratory, Department of Communication Sciences, Northwestern University, 2240 Campus Dr., Evanston, IL, 60208, USA
- Department of Neurobiology, Northwestern University, 2205 Tech Dr., Evanston, IL, 60208, USA
- Department of Otolaryngology, Northwestern University, 420 E Superior St., Chicago, IL, 6011, USA
| |
Collapse
|
250
|
Cortical auditory responses index the contributions of different RMS-level-dependent segments to speech intelligibility. Hear Res 2019; 383:107808. [DOI: 10.1016/j.heares.2019.107808] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Revised: 09/17/2019] [Accepted: 10/01/2019] [Indexed: 10/25/2022]
|