51
|
Kulasingham JP, Simon JZ. Algorithms for Estimating Time-Locked Neural Response Components in Cortical Processing of Continuous Speech. IEEE Trans Biomed Eng 2023; 70:88-96. [PMID: 35727788 PMCID: PMC9946293 DOI: 10.1109/tbme.2022.3185005] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
OBJECTIVE The Temporal Response Function (TRF) is a linear model of neural activity time-locked to continuous stimuli, including continuous speech. TRFs based on speech envelopes typically have distinct components that have provided remarkable insights into the cortical processing of speech. However, current methods may lead to less than reliable estimates of single-subject TRF components. Here, we compare two established methods, in TRF component estimation, and also propose novel algorithms that utilize prior knowledge of these components, bypassing the full TRF estimation. METHODS We compared two established algorithms, ridge and boosting, and two novel algorithms based on Subspace Pursuit (SP) and Expectation Maximization (EM), which directly estimate TRF components given plausible assumptions regarding component characteristics. Single-channel, multi-channel, and source-localized TRFs were fit on simulations and real magnetoencephalographic data. Performance metrics included model fit and component estimation accuracy. RESULTS Boosting and ridge have comparable performance in component estimation. The novel algorithms outperformed the others in simulations, but not on real data, possibly due to the plausible assumptions not actually being met. Ridge had slightly better model fits on real data compared to boosting, but also more spurious TRF activity. CONCLUSION Results indicate that both smooth (ridge) and sparse (boosting) algorithms perform comparably at TRF component estimation. The SP and EM algorithms may be accurate, but rely on assumptions of component characteristics. SIGNIFICANCE This systematic comparison establishes the suitability of widely used and novel algorithms for estimating robust TRF components, which is essential for improved subject-specific investigations into the cortical processing of speech.
Collapse
|
52
|
Luo C, Gao Y, Fan J, Liu Y, Yu Y, Zhang X. Compromised word-level neural tracking in the high-gamma band for children with attention deficit hyperactivity disorder. Front Hum Neurosci 2023; 17:1174720. [PMID: 37213926 PMCID: PMC10196181 DOI: 10.3389/fnhum.2023.1174720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Accepted: 04/18/2023] [Indexed: 05/23/2023] Open
Abstract
Children with attention deficit hyperactivity disorder (ADHD) exhibit pervasive difficulties in speech perception. Given that speech processing involves both acoustic and linguistic stages, it remains unclear which stage of speech processing is impaired in children with ADHD. To investigate this issue, we measured neural tracking of speech at syllable and word levels using electroencephalography (EEG), and evaluated the relationship between neural responses and ADHD symptoms in 6-8 years old children. Twenty-three children participated in the current study, and their ADHD symptoms were assessed with SNAP-IV questionnaires. In the experiment, the children listened to hierarchical speech sequences in which syllables and words were, respectively, repeated at 2.5 and 1.25 Hz. Using frequency domain analyses, reliable neural tracking of syllables and words was observed in both the low-frequency band (<4 Hz) and the high-gamma band (70-160 Hz). However, the neural tracking of words in the high-gamma band showed an anti-correlation with the ADHD symptom scores of the children. These results indicate that ADHD prominently impairs cortical encoding of linguistic information (e.g., words) in speech perception.
Collapse
Affiliation(s)
- Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou, China
- Cheng Luo,
| | - Yayue Gao
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
- *Correspondence: Yayue Gao,
| | - Jianing Fan
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yang Liu
- Department of Psychology, School of Humanities and Social Sciences, Beihang University, Beijing, China
| | - Yonglin Yu
- Department of Rehabilitation, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Yonglin Yu,
| | - Xin Zhang
- Department of Neurology, The Children’s Hospital, Zhejiang University School of Medicine, National Clinical Research Center for Child Health, Hangzhou, China
- Xin Zhang,
| |
Collapse
|
53
|
Simon JZ, Commuri V, Kulasingham JP. Time-locked auditory cortical responses in the high-gamma band: A window into primary auditory cortex. Front Neurosci 2022; 16:1075369. [PMID: 36570848 PMCID: PMC9773383 DOI: 10.3389/fnins.2022.1075369] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Accepted: 11/24/2022] [Indexed: 12/13/2022] Open
Abstract
Primary auditory cortex is a critical stage in the human auditory pathway, a gateway between subcortical and higher-level cortical areas. Receiving the output of all subcortical processing, it sends its output on to higher-level cortex. Non-invasive physiological recordings of primary auditory cortex using electroencephalography (EEG) and magnetoencephalography (MEG), however, may not have sufficient specificity to separate responses generated in primary auditory cortex from those generated in underlying subcortical areas or neighboring cortical areas. This limitation is important for investigations of effects of top-down processing (e.g., selective-attention-based) on primary auditory cortex: higher-level areas are known to be strongly influenced by top-down processes, but subcortical areas are often assumed to perform strictly bottom-up processing. Fortunately, recent advances have made it easier to isolate the neural activity of primary auditory cortex from other areas. In this perspective, we focus on time-locked responses to stimulus features in the high gamma band (70-150 Hz) and with early cortical latency (∼40 ms), intermediate between subcortical and higher-level areas. We review recent findings from physiological studies employing either repeated simple sounds or continuous speech, obtaining either a frequency following response (FFR) or temporal response function (TRF). The potential roles of top-down processing are underscored, and comparisons with invasive intracranial EEG (iEEG) and animal model recordings are made. We argue that MEG studies employing continuous speech stimuli may offer particular benefits, in that only a few minutes of speech generates robust high gamma responses from bilateral primary auditory cortex, and without measurable interference from subcortical or higher-level areas.
Collapse
Affiliation(s)
- Jonathan Z. Simon
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Vrishab Commuri
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| | | |
Collapse
|
54
|
Youssofzadeh V, Conant L, Stout J, Ustine C, Humphries C, Gross WL, Shah-Basak P, Mathis J, Awe E, Allen L, DeYoe EA, Carlson C, Anderson CT, Maganti R, Hermann B, Nair VA, Prabhakaran V, Meyerand B, Binder JR, Raghavan M. Late dominance of the right hemisphere during narrative comprehension. Neuroimage 2022; 264:119749. [PMID: 36379420 PMCID: PMC9772156 DOI: 10.1016/j.neuroimage.2022.119749] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Revised: 10/12/2022] [Accepted: 11/11/2022] [Indexed: 11/15/2022] Open
Abstract
PET and fMRI studies suggest that auditory narrative comprehension is supported by a bilateral multilobar cortical network. The superior temporal resolution of magnetoencephalography (MEG) makes it an attractive tool to investigate the dynamics of how different neuroanatomic substrates engage during narrative comprehension. Using beta-band power changes as a marker of cortical engagement, we studied MEG responses during an auditory story comprehension task in 31 healthy adults. The protocol consisted of two runs, each interleaving 7 blocks of the story comprehension task with 15 blocks of an auditorily presented math task as a control for phonological processing, working memory, and attention processes. Sources at the cortical surface were estimated with a frequency-resolved beamformer. Beta-band power was estimated in the frequency range of 16-24 Hz over 1-sec epochs starting from 400 msec after stimulus onset until the end of a story or math problem presentation. These power estimates were compared to 1-second epochs of data before the stimulus block onset. The task-related cortical engagement was inferred from beta-band power decrements. Group-level source activations were statistically compared using non-parametric permutation testing. A story-math contrast of beta-band power changes showed greater bilateral cortical engagement within the fusiform gyrus, inferior and middle temporal gyri, parahippocampal gyrus, and left inferior frontal gyrus (IFG) during story comprehension. A math-story contrast of beta power decrements showed greater bilateral but left-lateralized engagement of the middle frontal gyrus and superior parietal lobule. The evolution of cortical engagement during five temporal windows across the presentation of stories showed significant involvement during the first interval of the narrative of bilateral opercular and insular regions as well as the ventral and lateral temporal cortex, extending more posteriorly on the left and medially on the right. Over time, there continued to be sustained right anterior ventral temporal engagement, with increasing involvement of the right anterior parahippocampal gyrus, STG, MTG, posterior superior temporal sulcus, inferior parietal lobule, frontal operculum, and insula, while left hemisphere engagement decreased. Our findings are consistent with prior imaging studies of narrative comprehension, but in addition, they demonstrate increasing right-lateralized engagement over the course of narratives, suggesting an important role for these right-hemispheric regions in semantic integration as well as social and pragmatic inference processing.
Collapse
Affiliation(s)
- Vahab Youssofzadeh
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Corresponding author. (V. Youssofzadeh)
| | - Lisa Conant
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Jeffrey Stout
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Candida Ustine
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - William L. Gross
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Anesthesiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Jed Mathis
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA,Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Elizabeth Awe
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Linda Allen
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Edgar A. DeYoe
- Radiology, Medical College of Wisconsin, Milwaukee, WI, USA
| | - Chad Carlson
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| | | | - Rama Maganti
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Bruce Hermann
- Neurology, University of Wisconsin-Madison, Madison, WI, USA
| | - Veena A. Nair
- Radiology, University of Wisconsin-Madison, Madison, WI, USA
| | - Vivek Prabhakaran
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Psychiatry, University of Wisconsin-Madison, Madison, WI, USA
| | - Beth Meyerand
- Radiology, University of Wisconsin-Madison, Madison, WI, USA,Medical Physics, University of Wisconsin-Madison, Madison, WI, USA,Biomedical Engineering, University of Wisconsin-Madison, Madison, WI, USA
| | | | - Manoj Raghavan
- Neurology, Medical College of Wisconsin, Milwaukee, WI, USA
| |
Collapse
|
55
|
Liu Y, Luo C, Zheng J, Liang J, Ding N. Working memory asymmetrically modulates auditory and linguistic processing of speech. Neuroimage 2022; 264:119698. [PMID: 36270622 DOI: 10.1016/j.neuroimage.2022.119698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 10/11/2022] [Accepted: 10/17/2022] [Indexed: 11/09/2022] Open
Abstract
Working memory load can modulate speech perception. However, since speech perception and working memory are both complex functions, it remains elusive how each component of the working memory system interacts with each speech processing stage. To investigate this issue, we concurrently measure how the working memory load modulates neural activity tracking three levels of linguistic units, i.e., syllables, phrases, and sentences, using a multiscale frequency-tagging approach. Participants engage in a sentence comprehension task and the working memory load is manipulated by asking them to memorize either auditory verbal sequences or visual patterns. It is found that verbal and visual working memory load modulate speech processing in similar manners: Higher working memory load attenuates neural activity tracking of phrases and sentences but enhances neural activity tracking of syllables. Since verbal and visual WM load similarly influence the neural responses to speech, such influences may derive from the domain-general component of WM system. More importantly, working memory load asymmetrically modulates lower-level auditory encoding and higher-level linguistic processing of speech, possibly reflecting reallocation of attention induced by mnemonic load.
Collapse
Affiliation(s)
- Yiguang Liu
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Cheng Luo
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China
| | - Jing Zheng
- Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China
| | - Junying Liang
- Department of Linguistics, School of International Studies, Zhejiang University, Hangzhou 310058, China
| | - Nai Ding
- Research Center for Applied Mathematics and Machine Intelligence, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311121, China; Key Laboratory for Biomedical Engineering of Ministry of Education, College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou 310027, China; The MOE Frontier Science Center for Brain Science & Brain-machine Integration, Zhejiang University, Hangzhou 310012, China.
| |
Collapse
|
56
|
Gwilliams L, King JR, Marantz A, Poeppel D. Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nat Commun 2022; 13:6606. [PMID: 36329058 PMCID: PMC9633780 DOI: 10.1038/s41467-022-34326-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
Speech consists of a continuously-varying acoustic signal. Yet human listeners experience it as sequences of discrete speech sounds, which are used to recognise discrete words. To examine how the human brain appropriately sequences the speech signal, we recorded two-hour magnetoencephalograms from 21 participants listening to short narratives. Our analyses show that the brain continuously encodes the three most recently heard speech sounds in parallel, and maintains this information long past its dissipation from the sensory input. Each speech sound representation evolves over time, jointly encoding both its phonetic features and the amount of time elapsed since onset. As a result, this dynamic neural pattern encodes both the relative order and phonetic content of the speech sequence. These representations are active earlier when phonemes are more predictable, and are sustained longer when lexical identity is uncertain. Our results show how phonetic sequences in natural speech are represented at the level of populations of neurons, providing insight into what intermediary representations exist between the sensory input and sub-lexical units. The flexibility in the dynamics of these representations paves the way for further understanding of how such sequences may be used to interface with higher order structure such as lexical identity.
Collapse
Affiliation(s)
- Laura Gwilliams
- Department of Neurological Surgery, University of California, San Francisco, USA.
- Department of Psychology, New York University, New York, USA.
- NYU Abu Dhabi Institute, Abu Dhabi, UAE.
| | - Jean-Remi King
- Department of Psychology, New York University, New York, USA
- École normale supérieure, PSL University, CNRS, Paris, France
| | - Alec Marantz
- Department of Psychology, New York University, New York, USA
- NYU Abu Dhabi Institute, Abu Dhabi, UAE
- Department of Linguistics, New York University, New York, USA
| | - David Poeppel
- Department of Psychology, New York University, New York, USA
- Ernst Strüngmann Institute for Neuroscience, Frankfurt, Germany
| |
Collapse
|
57
|
McMurray B, Sarrett ME, Chiu S, Black AK, Wang A, Canale R, Aslin RN. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 2022; 260:119457. [PMID: 35842096 PMCID: PMC10875705 DOI: 10.1016/j.neuroimage.2022.119457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 07/02/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
The efficiency of spoken word recognition is essential for real-time communication. There is consensus that this efficiency relies on an implicit process of activating multiple word candidates that compete for recognition as the acoustic signal unfolds in real-time. However, few methods capture the neural basis of this dynamic competition on a msec-by-msec basis. This is crucial for understanding the neuroscience of language, and for understanding hearing, language and cognitive disorders in people for whom current behavioral methods are not suitable. We applied machine-learning techniques to standard EEG signals to decode which word was heard on each trial and analyzed the patterns of confusion over time. Results mirrored psycholinguistic findings: Early on, the decoder was equally likely to report the target (e.g., baggage) or a similar sounding competitor (badger), but by around 500 msec, competitors were suppressed. Follow up analyses show that this is robust across EEG systems (gel and saline), with fewer channels, and with fewer trials. Results are robust within individuals and show high reliability. This suggests a powerful and simple paradigm that can assess the neural dynamics of speech decoding, with potential applications for understanding lexical development in a variety of clinical disorders.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics and Dept. of Otolaryngology, University of Iowa.
| | - McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, Unviersity of Iowa
| | - Samantha Chiu
- Dept. of Psychological and Brain Sciences, University of Iowa
| | - Alexis K Black
- School of Audiology and Speech Sciences, University of British Columbia, Haskins Laboratories
| | - Alice Wang
- Dept. of Psychology, University of Oregon, Haskins Laboratories
| | - Rebecca Canale
- Dept. of Psychological Sciences, University of Connecticut, Haskins Laboratories
| | - Richard N Aslin
- Haskins Laboratories, Department of Psychology and Child Study Center, Yale University, Department of Psychology, University of Connecticut
| |
Collapse
|
58
|
Verschueren E, Gillis M, Decruy L, Vanthornhout J, Francart T. Speech Understanding Oppositely Affects Acoustic and Linguistic Neural Tracking in a Speech Rate Manipulation Paradigm. J Neurosci 2022; 42:7442-7453. [PMID: 36041851 PMCID: PMC9525161 DOI: 10.1523/jneurosci.0259-22.2022] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/29/2022] [Accepted: 07/17/2022] [Indexed: 11/21/2022] Open
Abstract
When listening to continuous speech, the human brain can track features of the presented speech signal. It has been shown that neural tracking of acoustic features is a prerequisite for speech understanding and can predict speech understanding in controlled circumstances. However, the brain also tracks linguistic features of speech, which may be more directly related to speech understanding. We investigated acoustic and linguistic speech processing as a function of varying speech understanding by manipulating the speech rate. In this paradigm, acoustic and linguistic speech processing is affected simultaneously but in opposite directions: When the speech rate increases, more acoustic information per second is present. In contrast, the tracking of linguistic information becomes more challenging when speech is less intelligible at higher speech rates. We measured the EEG of 18 participants (4 male) who listened to speech at various speech rates. As expected and confirmed by the behavioral results, speech understanding decreased with increasing speech rate. Accordingly, linguistic neural tracking decreased with increasing speech rate, but acoustic neural tracking increased. This indicates that neural tracking of linguistic representations can capture the gradual effect of decreasing speech understanding. In addition, increased acoustic neural tracking does not necessarily imply better speech understanding. This suggests that, although more challenging to measure because of the low signal-to-noise ratio, linguistic neural tracking may be a more direct predictor of speech understanding.SIGNIFICANCE STATEMENT An increasingly popular method to investigate neural speech processing is to measure neural tracking. Although much research has been done on how the brain tracks acoustic speech features, linguistic speech features have received less attention. In this study, we disentangled acoustic and linguistic characteristics of neural speech tracking via manipulating the speech rate. A proper way of objectively measuring auditory and language processing paves the way toward clinical applications: An objective measure of speech understanding would allow for behavioral-free evaluation of speech understanding, which allows to evaluate hearing loss and adjust hearing aids based on brain responses. This objective measure would benefit populations from whom obtaining behavioral measures may be complex, such as young children or people with cognitive impairments.
Collapse
Affiliation(s)
- Eline Verschueren
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Marlies Gillis
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742
| | - Jonas Vanthornhout
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| | - Tom Francart
- Research Group Experimental Oto-rhino-laryngology, Department of Neurosciences, KU Leuven-University of Leuven, Leuven, 3000, Belgium
| |
Collapse
|
59
|
Kim SG. On the encoding of natural music in computational models and human brains. Front Neurosci 2022; 16:928841. [PMID: 36203808 PMCID: PMC9531138 DOI: 10.3389/fnins.2022.928841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This article discusses recent developments and advances in the neuroscience of music to understand the nature of musical emotion. In particular, it highlights how system identification techniques and computational models of music have advanced our understanding of how the human brain processes the textures and structures of music and how the processed information evokes emotions. Musical models relate physical properties of stimuli to internal representations called features, and predictive models relate features to neural or behavioral responses and test their predictions against independent unseen data. The new frameworks do not require orthogonalized stimuli in controlled experiments to establish reproducible knowledge, which has opened up a new wave of naturalistic neuroscience. The current review focuses on how this trend has transformed the domain of the neuroscience of music.
Collapse
|
60
|
Gillis M, Van Canneyt J, Francart T, Vanthornhout J. Neural tracking as a diagnostic tool to assess the auditory pathway. Hear Res 2022; 426:108607. [PMID: 36137861 DOI: 10.1016/j.heares.2022.108607] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Revised: 08/11/2022] [Accepted: 09/12/2022] [Indexed: 11/20/2022]
Abstract
When a person listens to sound, the brain time-locks to specific aspects of the sound. This is called neural tracking and it can be investigated by analysing neural responses (e.g., measured by electroencephalography) to continuous natural speech. Measures of neural tracking allow for an objective investigation of a range of auditory and linguistic processes in the brain during natural speech perception. This approach is more ecologically valid than traditional auditory evoked responses and has great potential for research and clinical applications. This article reviews the neural tracking framework and highlights three prominent examples of neural tracking analyses: neural tracking of the fundamental frequency of the voice (f0), the speech envelope and linguistic features. Each of these analyses provides a unique point of view into the human brain's hierarchical stages of speech processing. F0-tracking assesses the encoding of fine temporal information in the early stages of the auditory pathway, i.e., from the auditory periphery up to early processing in the primary auditory cortex. Envelope tracking reflects bottom-up and top-down speech-related processes in the auditory cortex and is likely necessary but not sufficient for speech intelligibility. Linguistic feature tracking (e.g. word or phoneme surprisal) relates to neural processes more directly related to speech intelligibility. Together these analyses form a multi-faceted objective assessment of an individual's auditory and linguistic processing.
Collapse
Affiliation(s)
- Marlies Gillis
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium.
| | - Jana Van Canneyt
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Tom Francart
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| | - Jonas Vanthornhout
- Experimental Oto-Rhino-Laryngology, Department of Neurosciences, Leuven Brain Institute, KU Leuven, Belgium
| |
Collapse
|
61
|
Gugnowska K, Novembre G, Kohler N, Villringer A, Keller PE, Sammler D. Endogenous sources of interbrain synchrony in duetting pianists. Cereb Cortex 2022; 32:4110-4127. [PMID: 35029645 PMCID: PMC9476614 DOI: 10.1093/cercor/bhab469] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 11/16/2021] [Accepted: 11/17/2021] [Indexed: 11/12/2022] Open
Abstract
When people interact with each other, their brains synchronize. However, it remains unclear whether interbrain synchrony (IBS) is functionally relevant for social interaction or stems from exposure of individual brains to identical sensorimotor information. To disentangle these views, the current dual-EEG study investigated amplitude-based IBS in pianists jointly performing duets containing a silent pause followed by a tempo change. First, we manipulated the similarity of the anticipated tempo change and measured IBS during the pause, hence, capturing the alignment of purely endogenous, temporal plans without sound or movement. Notably, right posterior gamma IBS was higher when partners planned similar tempi, it predicted whether partners' tempi matched after the pause, and it was modulated only in real, not in surrogate pairs. Second, we manipulated the familiarity with the partner's actions and measured IBS during joint performance with sound. Although sensorimotor information was similar across conditions, gamma IBS was higher when partners were unfamiliar with each other's part and had to attend more closely to the sound of the performance. These combined findings demonstrate that IBS is not merely an epiphenomenon of shared sensorimotor information but can also hinge on endogenous, cognitive processes crucial for behavioral synchrony and successful social interaction.
Collapse
Affiliation(s)
- Katarzyna Gugnowska
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main 60322, Germany
| | - Giacomo Novembre
- Neuroscience of Perception and Action Lab, Italian Institute of Technology (IIT), Rome 00161, Italy
| | - Natalie Kohler
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main 60322, Germany
| | - Arno Villringer
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Peter E Keller
- Department of Clinical Medicine, Center for Music in the Brain, Aarhus University, Aarhus 8000, Denmark
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW 2751, Australia
| | - Daniela Sammler
- Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main 60322, Germany
| |
Collapse
|
62
|
Stewart HJ, Cash EK, Hunter LL, Maloney T, Vannest J, Moore DR. Speech cortical activation and connectivity in typically developing children and those with listening difficulties. Neuroimage Clin 2022; 36:103172. [PMID: 36087559 PMCID: PMC9467868 DOI: 10.1016/j.nicl.2022.103172] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 08/23/2022] [Accepted: 08/24/2022] [Indexed: 12/14/2022]
Abstract
Listening difficulties (LiD) in people who have normal audiometry are a widespread but poorly understood form of hearing impairment. Recent research suggests that childhood LiD are cognitive rather than auditory in origin. We examined decoding of sentences using a novel combination of behavioral testing and fMRI with 43 typically developing children and 42 age matched (6-13 years old) children with LiD, categorized by caregiver report (ECLiPS). Both groups had clinically normal hearing. For sentence listening tasks, we found no group differences in fMRI brain cortical activation by increasingly complex speech stimuli that progressed in emphasis from phonology to intelligibility to semantics. Using resting state fMRI, we examined the temporal connectivity of cortical auditory and related speech perception networks. We found significant group differences only in cortical connections engaged when processing more complex speech stimuli. The strength of the affected connections was related to the children's performance on tests of dichotic listening, speech-in-noise, attention, memory and verbal vocabulary. Together, these results support the novel hypothesis that childhood LiD reflects difficulties in language rather than in auditory or phonological processing.
Collapse
Affiliation(s)
- Hannah J Stewart
- Division of Psychology and Language Sciences, University College London, London, UK; Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Psychology, Lancaster University, Lancaster, UK.
| | - Erin K Cash
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Lisa L Hunter
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Thomas Maloney
- Pediatric Neuroimaging Research Consortium, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
| | - Jennifer Vannest
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Communication Sciences and Disorders, University of Cincinnati, OH, USA
| | - David R Moore
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA; Department of Otolaryngology, College of Medicine, University of Cincinnati, Cincinnati, OH, USA; Manchester Centre for Audiology and Deafness, University of Manchester, Manchester M13 9PL, UK
| |
Collapse
|
63
|
Chai X, Liu M, Huang T, Wu M, Li J, Zhao X, Yan T, Song Y, Zhang YX. Neurophysiological evidence for goal-oriented modulation of speech perception. Cereb Cortex 2022; 33:3910-3921. [PMID: 35972410 DOI: 10.1093/cercor/bhac315] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/20/2022] [Accepted: 07/21/2022] [Indexed: 11/14/2022] Open
Abstract
Speech perception depends on the dynamic interplay of bottom-up and top-down information along a hierarchically organized cortical network. Here, we test, for the first time in the human brain, whether neural processing of attended speech is dynamically modulated by task demand using a context-free discrimination paradigm. Electroencephalographic signals were recorded during 3 parallel experiments that differed only in the phonological feature of discrimination (word, vowel, and lexical tone, respectively). The event-related potentials (ERPs) revealed the task modulation of speech processing at approximately 200 ms (P2) after stimulus onset, probably influencing what phonological information to retain in memory. For the phonological comparison of sequential words, task modulation occurred later at approximately 300 ms (N3 and P3), reflecting the engagement of task-specific cognitive processes. The ERP results were consistent with the changes in delta-theta neural oscillations, suggesting the involvement of cortical tracking of speech envelopes. The study thus provides neurophysiological evidence for goal-oriented modulation of attended speech and calls for speech perception models incorporating limited memory capacity and goal-oriented optimization mechanisms.
Collapse
Affiliation(s)
- Xiaoke Chai
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Min Liu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Ting Huang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Meiyun Wu
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Jinhong Li
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Xue Zhao
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Tingting Yan
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Yan Song
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| | - Yu-Xuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning, IDG/McGovern Institute for Brain Research, Beijing Normal University, Beijing 100875, China
| |
Collapse
|
64
|
Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception. Curr Biol 2022; 32:3971-3986.e4. [PMID: 35973430 DOI: 10.1016/j.cub.2022.07.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 06/08/2022] [Accepted: 07/19/2022] [Indexed: 11/20/2022]
Abstract
How the human auditory cortex represents spatially separated simultaneous talkers and how talkers' locations and voices modulate the neural representations of attended and unattended speech are unclear. Here, we measured the neural responses from electrodes implanted in neurosurgical patients as they performed single-talker and multi-talker speech perception tasks. We found that spatial separation between talkers caused a preferential encoding of the contralateral speech in Heschl's gyrus (HG), planum temporale (PT), and superior temporal gyrus (STG). Location and spectrotemporal features were encoded in different aspects of the neural response. Specifically, the talker's location changed the mean response level, whereas the talker's spectrotemporal features altered the variation of response around response's baseline. These components were differentially modulated by the attended talker's voice or location, which improved the population decoding of attended speech features. Attentional modulation due to the talker's voice only appeared in the auditory areas with longer latencies, but attentional modulation due to location was present throughout. Our results show that spatial multi-talker speech perception relies upon a separable pre-attentive neural representation, which could be further tuned by top-down attention to the location and voice of the talker.
Collapse
|
65
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 PMCID: PMC9371745 DOI: 10.1073/pnas.2201968119] [Citation(s) in RCA: 75] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/28/2022] [Indexed: 02/05/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P. de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
66
|
Heilbron M, Armeni K, Schoffelen JM, Hagoort P, de Lange FP. A hierarchy of linguistic predictions during natural language comprehension. Proc Natl Acad Sci U S A 2022; 119:e2201968119. [PMID: 35921434 DOI: 10.1101/2020.12.03.410399] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2023] Open
Abstract
Understanding spoken language requires transforming ambiguous acoustic streams into a hierarchy of representations, from phonemes to meaning. It has been suggested that the brain uses prediction to guide the interpretation of incoming input. However, the role of prediction in language processing remains disputed, with disagreement about both the ubiquity and representational nature of predictions. Here, we address both issues by analyzing brain recordings of participants listening to audiobooks, and using a deep neural network (GPT-2) to precisely quantify contextual predictions. First, we establish that brain responses to words are modulated by ubiquitous predictions. Next, we disentangle model-based predictions into distinct dimensions, revealing dissociable neural signatures of predictions about syntactic category (parts of speech), phonemes, and semantics. Finally, we show that high-level (word) predictions inform low-level (phoneme) predictions, supporting hierarchical predictive processing. Together, these results underscore the ubiquity of prediction in language processing, showing that the brain spontaneously predicts upcoming language at multiple levels of abstraction.
Collapse
Affiliation(s)
- Micha Heilbron
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Kristijan Armeni
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
- Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands
| | - Floris P de Lange
- Donders Institute, Radboud University, 6525 EN Nijmegen, The Netherlands
| |
Collapse
|
67
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
68
|
Zhou D, Zhang G, Dang J, Unoki M, Liu X. Detection of Brain Network Communities During Natural Speech Comprehension From Functionally Aligned EEG Sources. Front Comput Neurosci 2022; 16:919215. [PMID: 35874316 PMCID: PMC9301328 DOI: 10.3389/fncom.2022.919215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 06/14/2022] [Indexed: 11/30/2022] Open
Abstract
In recent years, electroencephalograph (EEG) studies on speech comprehension have been extended from a controlled paradigm to a natural paradigm. Under the hypothesis that the brain can be approximated as a linear time-invariant system, the neural response to natural speech has been investigated extensively using temporal response functions (TRFs). However, most studies have modeled TRFs in the electrode space, which is a mixture of brain sources and thus cannot fully reveal the functional mechanism underlying speech comprehension. In this paper, we propose methods for investigating the brain networks of natural speech comprehension using TRFs on the basis of EEG source reconstruction. We first propose a functional hyper-alignment method with an additive average method to reduce EEG noise. Then, we reconstruct neural sources within the brain based on the EEG signals to estimate TRFs from speech stimuli to source areas, and then investigate the brain networks in the neural source space on the basis of the community detection method. To evaluate TRF-based brain networks, EEG data were recorded in story listening tasks with normal speech and time-reversed speech. To obtain reliable structures of brain networks, we detected TRF-based communities from multiple scales. As a result, the proposed functional hyper-alignment method could effectively reduce the noise caused by individual settings in an EEG experiment and thus improve the accuracy of source reconstruction. The detected brain networks for normal speech comprehension were clearly distinctive from those for non-semantically driven (time-reversed speech) audio processing. Our result indicates that the proposed source TRFs can reflect the cognitive processing of spoken language and that the multi-scale community detection method is powerful for investigating brain networks.
Collapse
Affiliation(s)
- Di Zhou
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Gaoyan Zhang
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
| | - Jianwu Dang
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
- College of Intelligence and Computing, Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China
| | - Masashi Unoki
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| | - Xin Liu
- School of Information Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
| |
Collapse
|
69
|
Bai F, Meyer AS, Martin AE. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol 2022; 20:e3001713. [PMID: 35834569 PMCID: PMC9282610 DOI: 10.1371/journal.pbio.3001713] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta-gamma phase-amplitude coupling occurred, but did not differ between the syntactic structures. Spectral-temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.
Collapse
Affiliation(s)
- Fan Bai
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Antje S. Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| |
Collapse
|
70
|
Pérez A, Davis MH, Ince RAA, Zhang H, Fu Z, Lamarca M, Lambon Ralph MA, Monahan PJ. Timing of brain entrainment to the speech envelope during speaking, listening and self-listening. Cognition 2022; 224:105051. [PMID: 35219954 PMCID: PMC9112165 DOI: 10.1016/j.cognition.2022.105051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 01/24/2022] [Accepted: 01/26/2022] [Indexed: 11/17/2022]
Abstract
This study investigates the dynamics of speech envelope tracking during speech production, listening and self-listening. We use a paradigm in which participants listen to natural speech (Listening), produce natural speech (Speech Production), and listen to the playback of their own speech (Self-Listening), all while their neural activity is recorded with EEG. After time-locking EEG data collection and auditory recording and playback, we used a Gaussian copula mutual information measure to estimate the relationship between information content in the EEG and auditory signals. In the 2-10 Hz frequency range, we identified different latencies for maximal speech envelope tracking during speech production and speech perception. Maximal speech tracking takes place approximately 110 ms after auditory presentation during perception and 25 ms before vocalisation during speech production. These results describe a specific timeline for speech tracking in speakers and listeners in line with the idea of a speech chain and hence, delays in communication.
Collapse
Affiliation(s)
- Alejandro Pérez
- MRC Cognition and Brain Sciences Unit, University of Cambridge, UK; Department of Language Studies, University of Toronto Scarborough, Canada; Department of Psychology, University of Toronto Scarborough, Canada.
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, UK
| | - Robin A A Ince
- School of Psychology and Neuroscience, University of Glasgow, UK
| | - Hanna Zhang
- Department of Language Studies, University of Toronto Scarborough, Canada; Department of Linguistics, University of Toronto, Canada
| | - Zhanao Fu
- Department of Language Studies, University of Toronto Scarborough, Canada; Department of Linguistics, University of Toronto, Canada
| | - Melanie Lamarca
- Department of Language Studies, University of Toronto Scarborough, Canada
| | | | - Philip J Monahan
- Department of Language Studies, University of Toronto Scarborough, Canada; Department of Psychology, University of Toronto Scarborough, Canada
| |
Collapse
|
71
|
Chalas N, Daube C, Kluger DS, Abbasi O, Nitsch R, Gross J. Multivariate analysis of speech envelope tracking reveals coupling beyond auditory cortex. Neuroimage 2022; 258:119395. [PMID: 35718023 DOI: 10.1016/j.neuroimage.2022.119395] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/16/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
The systematic alignment of low-frequency brain oscillations with the acoustic speech envelope signal is well established and has been proposed to be crucial for actively perceiving speech. Previous studies investigating speech-brain coupling in source space are restricted to univariate pairwise approaches between brain and speech signals, and therefore speech tracking information in frequency-specific communication channels might be lacking. To address this, we propose a novel multivariate framework for estimating speech-brain coupling where neural variability from source-derived activity is taken into account along with the rate of envelope's amplitude change (derivative). We applied it in magnetoencephalographic (MEG) recordings while human participants (male and female) listened to one hour of continuous naturalistic speech, showing that a multivariate approach outperforms the corresponding univariate method in low- and high frequencies across frontal, motor, and temporal areas. Systematic comparisons revealed that the gain in low frequencies (0.6 - 0.8 Hz) was related to the envelope's rate of change whereas in higher frequencies (from 0.8 to 10 Hz) it was mostly related to the increased neural variability from source-derived cortical areas. Furthermore, following a non-negative matrix factorization approach we found distinct speech-brain components across time and cortical space related to speech processing. We confirm that speech envelope tracking operates mainly in two timescales (δ and θ frequency bands) and we extend those findings showing shorter coupling delays in auditory-related components and longer delays in higher-association frontal and motor components, indicating temporal differences of speech tracking and providing implications for hierarchical stimulus-driven speech processing.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany.
| | - Christoph Daube
- Centre for Cognitive Neuroimaging, University of Glasgow, Glasgow, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
72
|
Preisig BC, Riecke L, Hervais-Adelman A. Speech sound categorization: The contribution of non-auditory and auditory cortical regions. Neuroimage 2022; 258:119375. [PMID: 35700949 DOI: 10.1016/j.neuroimage.2022.119375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 05/13/2022] [Accepted: 06/10/2022] [Indexed: 11/26/2022] Open
Abstract
Which processes in the human brain lead to the categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with disambiguating acoustic feature (third formant, F3) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in left perisylvian regions (STG, SMG), left inferior frontal regions (vMC, IFG, AI), left supplementary motor cortex (SMA/pre-SMA), and right motor and somatosensory regions (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. Most of these regions are outside of what is traditionally regarded as auditory or phonological processing areas. Our results indicate that the process of speech sound categorization implicates decision-making mechanisms and auditory-motor transformations.
Collapse
Affiliation(s)
- Basil C Preisig
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, 6500 HB Nijmegen, The Netherlands; Max Planck Institute for Psycholinguistics, 6525 XD Nijmegen, The Netherlands; Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Department of Comparative Language Science, Evolutionary Neuroscience of Language, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland.
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, 6229 ER Maastricht, The Netherlands
| | - Alexis Hervais-Adelman
- Department of Psychology, Neurolinguistics, University of Zurich, 8050 Zurich, Switzerland; Neuroscience Center Zurich, University of Zurich and Eidgenössische Technische Hochschule Zurich, 8057 Zurich, Switzerland
| |
Collapse
|
73
|
Muncke J, Kuruvila I, Hoppe U. Prediction of Speech Intelligibility by Means of EEG Responses to Sentences in Noise. Front Neurosci 2022; 16:876421. [PMID: 35720724 PMCID: PMC9198593 DOI: 10.3389/fnins.2022.876421] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/13/2022] [Indexed: 11/13/2022] Open
Abstract
Objective Understanding speech in noisy conditions is challenging even for people with mild hearing loss, and intelligibility for an individual person is usually evaluated by using several subjective test methods. In the last few years, a method has been developed to determine a temporal response function (TRF) between speech envelope and simultaneous electroencephalographic (EEG) measurements. By using this TRF it is possible to predict the EEG signal for any speech signal. Recent studies have suggested that the accuracy of this prediction varies with the level of noise added to the speech signal and can predict objectively the individual speech intelligibility. Here we assess the variations of the TRF itself when it is calculated for measurements with different signal-to-noise ratios and apply these variations to predict speech intelligibility. Methods For 18 normal hearing subjects the individual threshold of 50% speech intelligibility was determined by using a speech in noise test. Additionally, subjects listened passively to speech material of the speech in noise test at different signal-to-noise ratios close to individual threshold of 50% speech intelligibility while an EEG was recorded. Afterwards the shape of TRFs for each signal-to-noise ratio and subject were compared with the derived intelligibility. Results The strongest effect of variations in stimulus signal-to-noise ratio on the TRF shape occurred close to 100 ms after the stimulus presentation, and was located in the left central scalp region. The investigated variations in TRF morphology showed a strong correlation with speech intelligibility, and we were able to predict the individual threshold of 50% speech intelligibility with a mean deviation of less then 1.5 dB. Conclusion The intelligibility of speech in noise can be predicted by analyzing the shape of the TRF derived from different stimulus signal-to-noise ratios. Because TRFs are interpretable, in a manner similar to auditory evoked potentials, this method offers new options for clinical diagnostics.
Collapse
Affiliation(s)
- Jan Muncke
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| | - Ivine Kuruvila
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
- WS Audiology, Erlangen, Germany
| | - Ulrich Hoppe
- Department of Audiology, ENT-Clinic, University Hospital Erlangen, Erlangen, Germany
| |
Collapse
|
74
|
Haider CL, Suess N, Hauswald A, Park H, Weisz N. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker. Neuroimage 2022; 252:119044. [PMID: 35240298 DOI: 10.1016/j.neuroimage.2022.119044] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 02/26/2022] [Accepted: 02/27/2022] [Indexed: 11/29/2022] Open
Abstract
Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Collapse
Affiliation(s)
- Chandra Leon Haider
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria.
| | - Nina Suess
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Anne Hauswald
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria
| | - Hyojin Park
- School of Psychology & Centre for Human Brain Health (CHBH), University of Birmingham, Birmingham, UK
| | - Nathan Weisz
- Centre for Cognitive Neuroscience and Department of Psychology, University of Salzburg, Austria; Neuroscience Institute, Christian Doppler University Hospital, Paracelsus Medical University, Salzburg, Austria
| |
Collapse
|
75
|
Norman-Haignere SV, Feather J, Boebinger D, Brunner P, Ritaccio A, McDermott JH, Schalk G, Kanwisher N. A neural population selective for song in human auditory cortex. Curr Biol 2022; 32:1470-1484.e12. [PMID: 35196507 PMCID: PMC9092957 DOI: 10.1016/j.cub.2022.01.069] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 10/26/2021] [Accepted: 01/24/2022] [Indexed: 12/18/2022]
Abstract
How is music represented in the brain? While neuroimaging has revealed some spatial segregation between responses to music versus other sounds, little is known about the neural code for music itself. To address this question, we developed a method to infer canonical response components of human auditory cortex using intracranial responses to natural sounds, and further used the superior coverage of fMRI to map their spatial distribution. The inferred components replicated many prior findings, including distinct neural selectivity for speech and music, but also revealed a novel component that responded nearly exclusively to music with singing. Song selectivity was not explainable by standard acoustic features, was located near speech- and music-selective responses, and was also evident in individual electrodes. These results suggest that representations of music are fractionated into subpopulations selective for different types of music, one of which is specialized for the analysis of song.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Institute, Columbia University, New York, NY, USA; HHMI Fellow of the Life Sciences Research Foundation, Chevy Chase, MD, USA; Laboratoire des Sytèmes Perceptifs, Département d'Études Cognitives, ENS, PSL University, CNRS, Paris, France; Department of Biostatistics & Computational Biology, University of Rochester Medical Center, Rochester, NY, USA; Department of Neuroscience, University of Rochester Medical Center, Rochester, NY, USA; Department of Biomedical Engineering, University of Rochester, Rochester, NY, USA; Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
| | - Jenelle Feather
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| | - Dana Boebinger
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Peter Brunner
- Department of Neurology, Albany Medical College, Albany, NY, USA; National Center for Adaptive Neurotechnologies, Albany, NY, USA; Department of Neurosurgery, Washington University School of Medicine, St. Louis, MO, USA
| | - Anthony Ritaccio
- Department of Neurology, Albany Medical College, Albany, NY, USA; Department of Neurology, Mayo Clinic, Jacksonville, FL, USA
| | - Josh H McDermott
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA; Program in Speech and Hearing Biosciences and Technology, Harvard University, Cambridge, MA, USA
| | - Gerwin Schalk
- Department of Neurology, Albany Medical College, Albany, NY, USA
| | - Nancy Kanwisher
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA; McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA; Center for Brains, Minds and Machines, Cambridge, MA, USA
| |
Collapse
|
76
|
Di Liberto GM, Hjortkjær J, Mesgarani N. Editorial: Neural Tracking: Closing the Gap Between Neurophysiology and Translational Medicine. Front Neurosci 2022; 16:872600. [PMID: 35368278 PMCID: PMC8966872 DOI: 10.3389/fnins.2022.872600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 02/17/2022] [Indexed: 11/25/2022] Open
Affiliation(s)
- Giovanni M. Di Liberto
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
- ADAPT Centre, d-real, Trinity College Institute for Neuroscience, Dublin, Ireland
- *Correspondence: Giovanni M. Di Liberto
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Ireland
| | - Nima Mesgarani
- Electrical Engineering Department, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States
| |
Collapse
|
77
|
Gillis M, Decruy L, Vanthornhout J, Francart T. Hearing loss is associated with delayed neural responses to continuous speech. Eur J Neurosci 2022; 55:1671-1690. [PMID: 35263814 DOI: 10.1111/ejn.15644] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 02/21/2022] [Accepted: 02/23/2022] [Indexed: 11/28/2022]
Abstract
We investigated the impact of hearing loss on the neural processing of speech. Using a forward modeling approach, we compared the neural responses to continuous speech of 14 adults with sensorineural hearing loss with those of age-matched normal-hearing peers. Compared to their normal-hearing peers, hearing-impaired listeners had increased neural tracking and delayed neural responses to continuous speech in quiet. The latency also increased with the degree of hearing loss. As speech understanding decreased, neural tracking decreased in both populations; however, a significantly different trend was observed for the latency of the neural responses. For normal-hearing listeners, the latency increased with increasing background noise level. However, for hearing-impaired listeners, this increase was not observed. Our results support the idea that the neural response latency indicates the efficiency of neural speech processing: more or different brain regions are involved in processing speech, which causes longer communication pathways in the brain. These longer communication pathways hamper the information integration among these brain regions, reflected in longer processing times. Altogether, this suggests decreased neural speech processing efficiency in HI listeners as more time and more or different brain regions are required to process speech. Our results suggest that this reduction in neural speech processing efficiency occurs gradually as hearing deteriorates. From our results, it is apparent that sound amplification does not solve hearing loss. Even when listening to speech in silence at a comfortable loudness, hearing-impaired listeners process speech less efficiently.
Collapse
Affiliation(s)
- Marlies Gillis
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| | - Lien Decruy
- Institute for Systems Research, University of Maryland, College Park, MD, USA
| | | | - Tom Francart
- KU Leuven, Department of Neurosciences, ExpORL, Leuven, Belgium
| |
Collapse
|
78
|
Norman-Haignere SV, Long LK, Devinsky O, Doyle W, Irobunda I, Merricks EM, Feldstein NA, McKhann GM, Schevon CA, Flinker A, Mesgarani N. Multiscale temporal integration organizes hierarchical computation in human auditory cortex. Nat Hum Behav 2022; 6:455-469. [PMID: 35145280 PMCID: PMC8957490 DOI: 10.1038/s41562-021-01261-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Accepted: 11/18/2021] [Indexed: 01/11/2023]
Abstract
To derive meaning from sound, the brain must integrate information across many timescales. What computations underlie multiscale integration in human auditory cortex? Evidence suggests that auditory cortex analyses sound using both generic acoustic representations (for example, spectrotemporal modulation tuning) and category-specific computations, but the timescales over which these putatively distinct computations integrate remain unclear. To answer this question, we developed a general method to estimate sensory integration windows-the time window when stimuli alter the neural response-and applied our method to intracranial recordings from neurosurgical patients. We show that human auditory cortex integrates hierarchically across diverse timescales spanning from ~50 to 400 ms. Moreover, we find that neural populations with short and long integration windows exhibit distinct functional properties: short-integration electrodes (less than ~200 ms) show prominent spectrotemporal modulation selectivity, while long-integration electrodes (greater than ~200 ms) show prominent category selectivity. These findings reveal how multiscale integration organizes auditory computation in the human brain.
Collapse
Affiliation(s)
- Sam V Norman-Haignere
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,HHMI Postdoctoral Fellow of the Life Sciences Research Foundation
| | - Laura K. Long
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University
| | - Orrin Devinsky
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center
| | - Werner Doyle
- Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Neurosurgery, NYU Langone Medical Center
| | - Ifeoma Irobunda
- Department of Neurology, Columbia University Irving Medical Center
| | | | - Neil A. Feldstein
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | - Guy M. McKhann
- Department of Neurological Surgery, Columbia University Irving Medical Center
| | | | - Adeen Flinker
- Department of Neurology, NYU Langone Medical Center,Comprehensive Epilepsy Center, NYU Langone Medical Center,Department of Biomedical Engineering, NYU Tandon School of Engineering
| | - Nima Mesgarani
- Zuckerman Mind, Brain, Behavior Institute, Columbia University,Doctoral Program in Neurobiology and Behavior, Columbia University,Department of Electrical Engineering, Columbia University
| |
Collapse
|
79
|
Caucheteux C, King JR. Brains and algorithms partially converge in natural language processing. Commun Biol 2022; 5:134. [PMID: 35173264 PMCID: PMC8850612 DOI: 10.1038/s42003-022-03036-1] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Accepted: 12/29/2021] [Indexed: 11/29/2022] Open
Abstract
Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing.
Collapse
Affiliation(s)
- Charlotte Caucheteux
- Facebook AI Research, Paris, France.
- Université Paris-Saclay, Inria, CEA, Palaiseau, France.
| | - Jean-Rémi King
- Facebook AI Research, Paris, France.
- École normale supérieure, PSL University, CNRS, Paris, France.
| |
Collapse
|
80
|
Towards real-world neuroscience using mobile EEG and augmented reality. Sci Rep 2022; 12:2291. [PMID: 35145166 PMCID: PMC8831466 DOI: 10.1038/s41598-022-06296-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 01/25/2022] [Indexed: 01/10/2023] Open
Abstract
Our visual environment impacts multiple aspects of cognition including perception, attention and memory, yet most studies traditionally remove or control the external environment. As a result, we have a limited understanding of neurocognitive processes beyond the controlled lab environment. Here, we aim to study neural processes in real-world environments, while also maintaining a degree of control over perception. To achieve this, we combined mobile EEG (mEEG) and augmented reality (AR), which allows us to place virtual objects into the real world. We validated this AR and mEEG approach using a well-characterised cognitive response-the face inversion effect. Participants viewed upright and inverted faces in three EEG tasks (1) a lab-based computer task, (2) walking through an indoor environment while seeing face photographs, and (3) walking through an indoor environment while seeing virtual faces. We find greater low frequency EEG activity for inverted compared to upright faces in all experimental tasks, demonstrating that cognitively relevant signals can be extracted from mEEG and AR paradigms. This was established in both an epoch-based analysis aligned to face events, and a GLM-based approach that incorporates continuous EEG signals and face perception states. Together, this research helps pave the way to exploring neurocognitive processes in real-world environments while maintaining experimental control using AR.
Collapse
|
81
|
Taylor C, Hall S, Manivannan S, Mundil N, Border S. The neuroanatomical consequences and pathological implications of bilingualism. J Anat 2022; 240:410-427. [PMID: 34486112 PMCID: PMC8742975 DOI: 10.1111/joa.13542] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 07/26/2021] [Accepted: 08/23/2021] [Indexed: 01/17/2023] Open
Abstract
In recent years, there has been a rise in the number of people who are able to speak two or more languages. This has been paralleled by an increase in research related to bilingualism. Despite this, much of the neuroanatomical consequences and pathological implications of bilingualism are still subject to discussion. This review aims to evaluate the neuroanatomical structures related to language and to the acquisition of a second language as well as exploring how learning a second language can alter one's susceptibility to and the progression of certain cerebral pathologies. A literature search was conducted on the Medline, Embase, and Web of Science databases. A total of 137 articles regarding the neuroanatomical or pathological implications of bilingualism were included for review. Following analysis of the included papers, this review finds that bilingualism induces significant gray and white matter cerebral changes, particularly in the frontal lobes, anterior cingulate cortex, left inferior parietal lobule and subcortical areas, and that native language and acquired language largely recruit the same neuroanatomical structures with however, subtle functional and anatomical differences dependent on proficiency and age of language acquisition. There is adequate evidence to suggest that bilingualism offsets the symptoms and diagnosis of dementia, and that it is protective against both pathological and age-related cognitive decline. While many of the neuroanatomical changes are known, more remains to be elucidated and the relationship between bilingualism and other neurological pathologies remains unclear.
Collapse
Affiliation(s)
- Charles Taylor
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
| | - Samuel Hall
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Susruta Manivannan
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Nilesh Mundil
- Department of NeurosurgeryUniversity Hospitals Southampton NHS Foundation TrustSouthamptonUK
| | - Scott Border
- Centre for Learning Anatomical SciencesFaculty of MedicineUniversity of SouthamptonSouthamptonUK
| |
Collapse
|
82
|
Teoh ES, Ahmed F, Lalor EC. Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment. J Neurosci 2022; 42:682-691. [PMID: 34893546 PMCID: PMC8805628 DOI: 10.1523/jneurosci.1455-20.2021] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/21/2022] Open
Abstract
Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.
Collapse
Affiliation(s)
- Emily S Teoh
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
| | - Farhin Ahmed
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| |
Collapse
|
83
|
Brodbeck C, Bhattasali S, Cruz Heredia AAL, Resnik P, Simon JZ, Lau E. Parallel processing in speech perception with local and global representations of linguistic context. eLife 2022; 11:72056. [PMID: 35060904 PMCID: PMC8830882 DOI: 10.7554/elife.72056] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 01/16/2022] [Indexed: 12/03/2022] Open
Abstract
Speech processing is highly incremental. It is widely accepted that human listeners continuously use the linguistic context to anticipate upcoming concepts, words, and phonemes. However, previous evidence supports two seemingly contradictory models of how a predictive context is integrated with the bottom-up sensory input: Classic psycholinguistic paradigms suggest a two-stage process, in which acoustic input initially leads to local, context-independent representations, which are then quickly integrated with contextual constraints. This contrasts with the view that the brain constructs a single coherent, unified interpretation of the input, which fully integrates available information across representational hierarchies, and thus uses contextual constraints to modulate even the earliest sensory representations. To distinguish these hypotheses, we tested magnetoencephalography responses to continuous narrative speech for signatures of local and unified predictive models. Results provide evidence that listeners employ both types of models in parallel. Two local context models uniquely predict some part of early neural responses, one based on sublexical phoneme sequences, and one based on the phonemes in the current word alone; at the same time, even early responses to phonemes also reflect a unified model that incorporates sentence-level constraints to predict upcoming phonemes. Neural source localization places the anatomical origins of the different predictive models in nonidentical parts of the superior temporal lobes bilaterally, with the right hemisphere showing a relative preference for more local models. These results suggest that speech processing recruits both local and unified predictive models in parallel, reconciling previous disparate findings. Parallel models might make the perceptual system more robust, facilitate processing of unexpected inputs, and serve a function in language acquisition.
Collapse
Affiliation(s)
| | | | | | | | | | - Ellen Lau
- Department of Linguistics, University of Maryland
| |
Collapse
|
84
|
Piazza EA, Nencheva ML, Lew-Williams C. THE DEVELOPMENT OF COMMUNICATION ACROSS TIMESCALES. CURRENT DIRECTIONS IN PSYCHOLOGICAL SCIENCE 2021; 30:459-467. [PMID: 35177881 PMCID: PMC8849573 DOI: 10.1177/09637214211037665] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
How do young children learn to organize the statistics of communicative input across milliseconds and months? Developmental science has made progress in understanding how infants learn patterns in language and how infant-directed speech is engineered to ease short-timescale processing, but less is known about how they link perceptual experiences across multiple levels of processing within an interaction (from syllables to stories) and across development. In this article, we propose that three domains of research - statistical summary, neural processing hierarchies, and neural coupling - will be fruitful in uncovering the dynamic exchange of information between children and adults, both in the moment and in aggregate. In particular, we discuss how the study of brain-to-brain and brain-to-behavior coupling between children and adults will further our understanding of how children's neural representations become aligned with the increasingly complex statistics of communication across timescales.
Collapse
Affiliation(s)
- Elise A Piazza
- Department of Psychology, Princeton University, Princeton, NJ 08544
- Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544
- Department of Brain & Cognitive Sciences, University of Rochester, Rochester, NY 14611
| | - Mira L Nencheva
- Department of Psychology, Princeton University, Princeton, NJ 08544
| | | |
Collapse
|
85
|
Jessen S, Obleser J, Tune S. Neural tracking in infants - An analytical tool for multisensory social processing in development. Dev Cogn Neurosci 2021; 52:101034. [PMID: 34781250 PMCID: PMC8593584 DOI: 10.1016/j.dcn.2021.101034] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 10/09/2021] [Accepted: 11/07/2021] [Indexed: 11/18/2022] Open
Abstract
Humans are born into a social environment and from early on possess a range of abilities to detect and respond to social cues. In the past decade, there has been a rapidly increasing interest in investigating the neural responses underlying such early social processes under naturalistic conditions. However, the investigation of neural responses to continuous dynamic input poses the challenge of how to link neural responses back to continuous sensory input. In the present tutorial, we provide a step-by-step introduction to one approach to tackle this issue, namely the use of linear models to investigate neural tracking responses in electroencephalographic (EEG) data. While neural tracking has gained increasing popularity in adult cognitive neuroscience over the past decade, its application to infant EEG is still rare and comes with its own challenges. After introducing the concept of neural tracking, we discuss and compare the use of forward vs. backward models and individual vs. generic models using an example data set of infant EEG data. Each section comprises a theoretical introduction as well as a concrete example using MATLAB code. We argue that neural tracking provides a promising way to investigate early (social) processing in an ecologically valid setting.
Collapse
Affiliation(s)
- Sarah Jessen
- Department of Neurology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Germany.
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Germany
| | - Sarah Tune
- Department of Psychology, University of Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany; Center of Brain, Behavior, and Metabolism, University of Lübeck, Germany.
| |
Collapse
|
86
|
Crosse MJ, Zuk NJ, Di Liberto GM, Nidiffer AR, Molholm S, Lalor EC. Linear Modeling of Neurophysiological Responses to Speech and Other Continuous Stimuli: Methodological Considerations for Applied Research. Front Neurosci 2021; 15:705621. [PMID: 34880719 PMCID: PMC8648261 DOI: 10.3389/fnins.2021.705621] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Accepted: 09/21/2021] [Indexed: 01/01/2023] Open
Abstract
Cognitive neuroscience, in particular research on speech and language, has seen an increase in the use of linear modeling techniques for studying the processing of natural, environmental stimuli. The availability of such computational tools has prompted similar investigations in many clinical domains, facilitating the study of cognitive and sensory deficits under more naturalistic conditions. However, studying clinical (and often highly heterogeneous) cohorts introduces an added layer of complexity to such modeling procedures, potentially leading to instability of such techniques and, as a result, inconsistent findings. Here, we outline some key methodological considerations for applied research, referring to a hypothetical clinical experiment involving speech processing and worked examples of simulated electrophysiological (EEG) data. In particular, we focus on experimental design, data preprocessing, stimulus feature extraction, model design, model training and evaluation, and interpretation of model weights. Throughout the paper, we demonstrate the implementation of each step in MATLAB using the mTRF-Toolbox and discuss how to address issues that could arise in applied research. In doing so, we hope to provide better intuition on these more technical points and provide a resource for applied and clinical researchers investigating sensory and cognitive processing using ecologically rich stimuli.
Collapse
Affiliation(s)
- Michael J. Crosse
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
- X, The Moonshot Factory, Mountain View, CA, United States
- Department of Pediatrics, Albert Einstein College of Medicine, New York, NY, United States
- Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| | - Nathaniel J. Zuk
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
- Department of Neuroscience, University of Rochester, Rochester, NY, United States
| | - Giovanni M. Di Liberto
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
- Centre for Biomedical Engineering, School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
| | - Aaron R. Nidiffer
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
- Department of Neuroscience, University of Rochester, Rochester, NY, United States
| | - Sophie Molholm
- Department of Pediatrics, Albert Einstein College of Medicine, New York, NY, United States
- Department of Neuroscience, Albert Einstein College of Medicine, New York, NY, United States
| | - Edmund C. Lalor
- Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, NY, United States
- Department of Neuroscience, University of Rochester, Rochester, NY, United States
| |
Collapse
|
87
|
Landemard A, Bimbard C, Demené C, Shamma S, Norman-Haignere S, Boubenec Y. Distinct higher-order representations of natural sounds in human and ferret auditory cortex. eLife 2021; 10:e65566. [PMID: 34792467 PMCID: PMC8601661 DOI: 10.7554/elife.65566] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2020] [Accepted: 10/22/2021] [Indexed: 11/29/2022] Open
Abstract
Little is known about how neural representations of natural sounds differ across species. For example, speech and music play a unique role in human hearing, yet it is unclear how auditory representations of speech and music differ between humans and other animals. Using functional ultrasound imaging, we measured responses in ferrets to a set of natural and spectrotemporally matched synthetic sounds previously tested in humans. Ferrets showed similar lower-level frequency and modulation tuning to that observed in humans. But while humans showed substantially larger responses to natural vs. synthetic speech and music in non-primary regions, ferret responses to natural and synthetic sounds were closely matched throughout primary and non-primary auditory cortex, even when tested with ferret vocalizations. This finding reveals that auditory representations in humans and ferrets diverge sharply at late stages of cortical processing, potentially driven by higher-order processing demands in speech and music.
Collapse
Affiliation(s)
- Agnès Landemard
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
| | - Célian Bimbard
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- University College LondonLondonUnited Kingdom
| | - Charlie Demené
- Physics for Medicine Paris, Inserm, ESPCI Paris, PSL Research University, CNRSParisFrance
| | - Shihab Shamma
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- Institute for Systems Research, Department of Electrical and Computer Engineering, University of MarylandCollege ParkUnited States
| | - Sam Norman-Haignere
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
- HHMI Postdoctoral Fellow of the Life Sciences Research FoundationBaltimoreUnited States
- Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Yves Boubenec
- Laboratoire des Systèmes Perceptifs, Département d’Études Cognitives, École Normale Supérieure PSL Research University, CNRSParisFrance
| |
Collapse
|
88
|
Attaheri A, Choisdealbha ÁN, Di Liberto GM, Rocha S, Brusini P, Mead N, Olawole-Scott H, Boutris P, Gibbon S, Williams I, Grey C, Flanagan S, Goswami U. Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants. Neuroimage 2021; 247:118698. [PMID: 34798233 DOI: 10.1016/j.neuroimage.2021.118698] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 10/15/2021] [Accepted: 10/30/2021] [Indexed: 01/13/2023] Open
Abstract
The amplitude envelope of speech carries crucial low-frequency acoustic information that assists linguistic decoding at multiple time scales. Neurophysiological signals are known to track the amplitude envelope of adult-directed speech (ADS), particularly in the theta-band. Acoustic analysis of infant-directed speech (IDS) has revealed significantly greater modulation energy than ADS in an amplitude-modulation (AM) band centred on ∼2 Hz. Accordingly, cortical tracking of IDS by delta-band neural signals may be key to language acquisition. Speech also contains acoustic information within its higher-frequency bands (beta, gamma). Adult EEG and MEG studies reveal an oscillatory hierarchy, whereby low-frequency (delta, theta) neural phase dynamics temporally organize the amplitude of high-frequency signals (phase amplitude coupling, PAC). Whilst consensus is growing around the role of PAC in the matured adult brain, its role in the development of speech processing is unexplored. Here, we examined the presence and maturation of low-frequency (<12 Hz) cortical speech tracking in infants by recording EEG longitudinally from 60 participants when aged 4-, 7- and 11- months as they listened to nursery rhymes. After establishing stimulus-related neural signals in delta and theta, cortical tracking at each age was assessed in the delta, theta and alpha [control] bands using a multivariate temporal response function (mTRF) method. Delta-beta, delta-gamma, theta-beta and theta-gamma phase-amplitude coupling (PAC) was also assessed. Significant delta and theta but not alpha tracking was found. Significant PAC was present at all ages, with both delta and theta -driven coupling observed.
Collapse
Affiliation(s)
- Adam Attaheri
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Áine Ní Choisdealbha
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Ecole Normale Supérieure, PSL University, France; Department of Mechanical, Trinity Centre for Biomedical Engineering and Trinity Institute of Neuroscience, Manufacturing and Biomedical Engineering, Trinity College, The University of Dublin, Ireland; School of Electrical and Electronic Engineering and UCD Centre for Biomedical Engineering, University College Dublin, Ireland.
| | - Sinead Rocha
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Perrine Brusini
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom; Institute of Population Health, Waterhouse Building, Block B, Brownlow Street, Liverpool L69 3GF, United Kingdom.
| | - Natasha Mead
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Helen Olawole-Scott
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Panagiotis Boutris
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Samuel Gibbon
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Isabel Williams
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Christina Grey
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Sheila Flanagan
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| | - Usha Goswami
- Department of Psychology, Centre for Neuroscience in Education, University of Cambridge, Downing Street, Cambridge CB2 3 EB, United Kingdom.
| |
Collapse
|
89
|
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci 2021; 41:8946-8962. [PMID: 34503996 DOI: 10.1523/jneurosci.2891-20.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 11/21/2022] Open
Abstract
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Collapse
|
90
|
Lenc T, Merchant H, Keller PE, Honing H, Varlet M, Nozaradan S. Mapping between sound, brain and behaviour: four-level framework for understanding rhythm processing in humans and non-human primates. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200325. [PMID: 34420381 PMCID: PMC8380981 DOI: 10.1098/rstb.2020.0325] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/14/2021] [Indexed: 12/16/2022] Open
Abstract
Humans perceive and spontaneously move to one or several levels of periodic pulses (a meter, for short) when listening to musical rhythm, even when the sensory input does not provide prominent periodic cues to their temporal location. Here, we review a multi-levelled framework to understanding how external rhythmic inputs are mapped onto internally represented metric pulses. This mapping is studied using an approach to quantify and directly compare representations of metric pulses in signals corresponding to sensory inputs, neural activity and behaviour (typically body movement). Based on this approach, recent empirical evidence can be drawn together into a conceptual framework that unpacks the phenomenon of meter into four levels. Each level highlights specific functional processes that critically enable and shape the mapping from sensory input to internal meter. We discuss the nature, constraints and neural substrates of these processes, starting with fundamental mechanisms investigated in macaque monkeys that enable basic forms of mapping between simple rhythmic stimuli and internally represented metric pulse. We propose that human evolution has gradually built a robust and flexible system upon these fundamental processes, allowing more complex levels of mapping to emerge in musical behaviours. This approach opens promising avenues to understand the many facets of rhythmic behaviours across individuals and species. This article is part of the theme issue 'Synchrony and rhythm interaction: from the brain to behavioural ecology'.
Collapse
Affiliation(s)
- Tomas Lenc
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
| | - Hugo Merchant
- Instituto de Neurobiologia, UNAM, Campus Juriquilla, Querétaro 76230, Mexico
| | - Peter E. Keller
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Henkjan Honing
- Amsterdam Brain and Cognition (ABC), Institute for Logic, Language and Computation (ILLC), University of Amsterdam, Amsterdam 1090 GE, The Netherlands
| | - Manuel Varlet
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, New South Wales 2751, Australia
- School of Psychology, Western Sydney University, Penrith, New South Wales 2751, Australia
| | - Sylvie Nozaradan
- Institute of Neuroscience (IONS), Université Catholique de Louvain (UCL), Brussels 1200, Belgium
| |
Collapse
|
91
|
Lowe MX, Mohsenzadeh Y, Lahner B, Charest I, Oliva A, Teng S. Cochlea to categories: The spatiotemporal dynamics of semantic auditory representations. Cogn Neuropsychol 2021; 38:468-489. [PMID: 35729704 PMCID: PMC10589059 DOI: 10.1080/02643294.2022.2085085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 03/31/2022] [Accepted: 05/25/2022] [Indexed: 10/17/2022]
Abstract
How does the auditory system categorize natural sounds? Here we apply multimodal neuroimaging to illustrate the progression from acoustic to semantically dominated representations. Combining magnetoencephalographic (MEG) and functional magnetic resonance imaging (fMRI) scans of observers listening to naturalistic sounds, we found superior temporal responses beginning ∼55 ms post-stimulus onset, spreading to extratemporal cortices by ∼100 ms. Early regions were distinguished less by onset/peak latency than by functional properties and overall temporal response profiles. Early acoustically-dominated representations trended systematically toward category dominance over time (after ∼200 ms) and space (beyond primary cortex). Semantic category representation was spatially specific: Vocalizations were preferentially distinguished in frontotemporal voice-selective regions and the fusiform; scenes and objects were distinguished in parahippocampal and medial place areas. Our results are consistent with real-world events coded via an extended auditory processing hierarchy, in which acoustic representations rapidly enter multiple streams specialized by category, including areas typically considered visual cortex.
Collapse
Affiliation(s)
- Matthew X. Lowe
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Unlimited Sciences, Colorado Springs, CO
| | - Yalda Mohsenzadeh
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- The Brain and Mind Institute, The University of Western Ontario, London, ON, Canada
- Department of Computer Science, The University of Western Ontario, London, ON, Canada
| | - Benjamin Lahner
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Ian Charest
- Département de Psychologie, Université de Montréal, Montréal, Québec, Canada
- Center for Human Brain Health, University of Birmingham, UK
| | - Aude Oliva
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
| | - Santani Teng
- Computer Science and Artificial Intelligence Lab (CSAIL), MIT, Cambridge, MA
- Smith-Kettlewell Eye Research Institute (SKERI), San Francisco, CA
| |
Collapse
|
92
|
Kiremitçi I, Yilmaz Ö, Çelik E, Shahdloo M, Huth AG, Çukur T. Attentional Modulation of Hierarchical Speech Representations in a Multitalker Environment. Cereb Cortex 2021; 31:4986-5005. [PMID: 34115102 PMCID: PMC8491717 DOI: 10.1093/cercor/bhab136] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Revised: 04/01/2021] [Accepted: 04/21/2021] [Indexed: 11/13/2022] Open
Abstract
Humans are remarkably adept in listening to a desired speaker in a crowded environment, while filtering out nontarget speakers in the background. Attention is key to solving this difficult cocktail-party task, yet a detailed characterization of attentional effects on speech representations is lacking. It remains unclear across what levels of speech features and how much attentional modulation occurs in each brain area during the cocktail-party task. To address these questions, we recorded whole-brain blood-oxygen-level-dependent (BOLD) responses while subjects either passively listened to single-speaker stories, or selectively attended to a male or a female speaker in temporally overlaid stories in separate experiments. Spectral, articulatory, and semantic models of the natural stories were constructed. Intrinsic selectivity profiles were identified via voxelwise models fit to passive listening responses. Attentional modulations were then quantified based on model predictions for attended and unattended stories in the cocktail-party task. We find that attention causes broad modulations at multiple levels of speech representations while growing stronger toward later stages of processing, and that unattended speech is represented up to the semantic level in parabelt auditory cortex. These results provide insights on attentional mechanisms that underlie the ability to selectively listen to a desired speaker in noisy multispeaker environments.
Collapse
Affiliation(s)
- Ibrahim Kiremitçi
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Özgür Yilmaz
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
| | - Emin Çelik
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
| | - Mo Shahdloo
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Experimental Psychology, Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford OX3 9DU, UK
| | - Alexander G Huth
- Department of Neuroscience, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, USA
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| | - Tolga Çukur
- Neuroscience Program, Sabuncu Brain Research Center, Bilkent University, Ankara TR-06800, Turkey
- National Magnetic Resonance Research Center (UMRAM), Bilkent University, Ankara TR-06800, Turkey
- Department of Electrical and Electronics Engineering, Bilkent University, Ankara TR-06800, Turkey
- Helen Wills Neuroscience Institute, University of California, Berkeley, CA 94702, USA
| |
Collapse
|
93
|
Devaraju DS, Kemp A, Eddins DA, Shrivastav R, Chandrasekaran B, Hampton Wray A. Effects of Task Demands on Neural Correlates of Acoustic and Semantic Processing in Challenging Listening Conditions. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:3697-3706. [PMID: 34403278 DOI: 10.1044/2021_jslhr-21-00006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Purpose Listeners shift their listening strategies between lower level acoustic information and higher level semantic information to prioritize maximum speech intelligibility in challenging listening conditions. Although increasing task demands via acoustic degradation modulates lexical-semantic processing, the neural mechanisms underlying different listening strategies are unclear. The current study examined the extent to which encoding of lower level acoustic cues is modulated by task demand and associations with lexical-semantic processes. Method Electroencephalography was acquired while participants listened to sentences in the presence of four-talker babble that contained either higher or lower probability final words. Task difficulty was modulated by time available to process responses. Cortical tracking of speech-neural correlates of acoustic temporal envelope processing-were estimated using temporal response functions. Results Task difficulty did not affect cortical tracking of temporal envelope of speech under challenging listening conditions. Neural indices of lexical-semantic processing (N400 amplitudes) were larger with increased task difficulty. No correlations were observed between the cortical tracking of temporal envelope of speech and lexical-semantic processes, even after controlling for the effect of individualized signal-to-noise ratios. Conclusions Cortical tracking of the temporal envelope of speech and semantic processing are differentially influenced by task difficulty. While increased task demands modulated higher level semantic processing, cortical tracking of the temporal envelope of speech may be influenced by task difficulty primarily when the demand is manipulated in terms of acoustic properties of the stimulus, consistent with an emerging perspective in speech perception.
Collapse
Affiliation(s)
- Dhatri S Devaraju
- Department of Communication Science and Disorders, University of Pittsburgh, PA
| | - Amy Kemp
- Department of Communication Sciences and Special Education, University of Georgia, Athens
| | - David A Eddins
- Department of Communication Sciences & Disorders, University of South Florida, Tampa
| | | | | | - Amanda Hampton Wray
- Department of Communication Science and Disorders, University of Pittsburgh, PA
| |
Collapse
|
94
|
Di Liberto GM, Marion G, Shamma SA. Accurate Decoding of Imagined and Heard Melodies. Front Neurosci 2021; 15:673401. [PMID: 34421512 PMCID: PMC8375770 DOI: 10.3389/fnins.2021.673401] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 06/17/2021] [Indexed: 11/16/2022] Open
Abstract
Music perception requires the human brain to process a variety of acoustic and music-related properties. Recent research used encoding models to tease apart and study the various cortical contributors to music perception. To do so, such approaches study temporal response functions that summarise the neural activity over several minutes of data. Here we tested the possibility of assessing the neural processing of individual musical units (bars) with electroencephalography (EEG). We devised a decoding methodology based on a maximum correlation metric across EEG segments (maxCorr) and used it to decode melodies from EEG based on an experiment where professional musicians listened and imagined four Bach melodies multiple times. We demonstrate here that accurate decoding of melodies in single-subjects and at the level of individual musical units is possible, both from EEG signals recorded during listening and imagination. Furthermore, we find that greater decoding accuracies are measured for the maxCorr method than for an envelope reconstruction approach based on backward temporal response functions (bTRFenv). These results indicate that low-frequency neural signals encode information beyond note timing, especially with respect to low-frequency cortical signals below 1 Hz, which are shown to encode pitch-related information. Along with the theoretical implications of these results, we discuss the potential applications of this decoding methodology in the context of novel brain-computer interface solutions.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, CNRS, Paris, France.,Ecole Normale Supérieure, PSL University, Paris, France.,Department of Mechanical, Manufacturing and Biomedical Engineering, Trinity Centre for Biomedical Engineering, Trinity College, Trinity Institute of Neuroscience, The University of Dublin, Dublin, Ireland.,Centre for Biomedical Engineering, School of Electrical and Electronic Engineering and UCD University College Dublin, Dublin, Ireland
| | - Guilhem Marion
- Laboratoire des Systèmes Perceptifs, CNRS, Paris, France
| | - Shihab A Shamma
- Laboratoire des Systèmes Perceptifs, CNRS, Paris, France.,Institute for Systems Research, Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
95
|
Wang YC, Sohoglu E, Gilbert RA, Henson RN, Davis MH. Predictive Neural Computations Support Spoken Word Recognition: Evidence from MEG and Competitor Priming. J Neurosci 2021; 41:6919-6932. [PMID: 34210777 PMCID: PMC8360690 DOI: 10.1523/jneurosci.1685-20.2021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Revised: 05/22/2021] [Accepted: 05/25/2021] [Indexed: 11/24/2022] Open
Abstract
Human listeners achieve quick and effortless speech comprehension through computations of conditional probability using Bayes rule. However, the neural implementation of Bayesian perceptual inference remains unclear. Competitive-selection accounts (e.g., TRACE) propose that word recognition is achieved through direct inhibitory connections between units representing candidate words that share segments (e.g., hygiene and hijack share /haidʒ/). Manipulations that increase lexical uncertainty should increase neural responses associated with word recognition when words cannot be uniquely identified. In contrast, predictive-selection accounts (e.g., Predictive-Coding) propose that spoken word recognition involves comparing heard and predicted speech sounds and using prediction error to update lexical representations. Increased lexical uncertainty in words, such as hygiene and hijack, will increase prediction error and hence neural activity only at later time points when different segments are predicted. We collected MEG data from male and female listeners to test these two Bayesian mechanisms and used a competitor priming manipulation to change the prior probability of specific words. Lexical decision responses showed delayed recognition of target words (hygiene) following presentation of a neighboring prime word (hijack) several minutes earlier. However, this effect was not observed with pseudoword primes (higent) or targets (hijure). Crucially, MEG responses in the STG showed greater neural responses for word-primed words after the point at which they were uniquely identified (after /haidʒ/ in hygiene) but not before while similar changes were again absent for pseudowords. These findings are consistent with accounts of spoken word recognition in which neural computations of prediction error play a central role.SIGNIFICANCE STATEMENT Effective speech perception is critical to daily life and involves computations that combine speech signals with prior knowledge of spoken words (i.e., Bayesian perceptual inference). This study specifies the neural mechanisms that support spoken word recognition by testing two distinct implementations of Bayes perceptual inference. Most established theories propose direct competition between lexical units such that inhibition of irrelevant candidates leads to selection of critical words. Our results instead support predictive-selection theories (e.g., Predictive-Coding): by comparing heard and predicted speech sounds, neural computations of prediction error can help listeners continuously update lexical probabilities, allowing for more rapid word identification.
Collapse
Affiliation(s)
- Yingcan Carol Wang
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Ediz Sohoglu
- School of Psychology, University of Sussex, Brighton, BN1 9RH, United Kingdom
| | - Rebecca A Gilbert
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Richard N Henson
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Matthew H Davis
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
96
|
The Music of Silence: Part II: Music Listening Induces Imagery Responses. J Neurosci 2021; 41:7449-7460. [PMID: 34341154 DOI: 10.1523/jneurosci.0184-21.2021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 06/22/2021] [Accepted: 06/24/2021] [Indexed: 01/22/2023] Open
Abstract
During music listening, humans routinely acquire the regularities of the acoustic sequences and use them to anticipate and interpret the ongoing melody. Specifically, in line with this predictive framework, it is thought that brain responses during such listening reflect a comparison between the bottom-up sensory responses and top-down prediction signals generated by an internal model that embodies the music exposure and expectations of the listener. To attain a clear view of these predictive responses, previous work has eliminated the sensory inputs by inserting artificial silences (or sound omissions) that leave behind only the corresponding predictions of the thwarted expectations. Here, we demonstrate a new alternate approach in which we decode the predictive electroencephalography (EEG) responses to the silent intervals that are naturally interspersed within the music. We did this as participants (experiment 1, 20 participants, 10 female; experiment 2, 21 participants, 6 female) listened or imagined Bach piano melodies. Prediction signals were quantified and assessed via a computational model of the melodic structure of the music and were shown to exhibit the same response characteristics when measured during listening or imagining. These include an inverted polarity for both silence and imagined responses relative to listening, as well as response magnitude modulations that precisely reflect the expectations of notes and silences in both listening and imagery conditions. These findings therefore provide a unifying view that links results from many previous paradigms, including omission reactions and the expectation modulation of sensory responses, all in the context of naturalistic music listening.SIGNIFICANCE STATEMENT Music perception depends on our ability to learn and detect melodic structures. It has been suggested that our brain does so by actively predicting upcoming music notes, a process inducing instantaneous neural responses as the music confronts these expectations. Here, we studied this prediction process using EEGs recorded while participants listen to and imagine Bach melodies. Specifically, we examined neural signals during the ubiquitous musical pauses (or silent intervals) in a music stream and analyzed them in contrast to the imagery responses. We find that imagined predictive responses are routinely co-opted during ongoing music listening. These conclusions are revealed by a new paradigm using listening and imagery of naturalistic melodies.
Collapse
|
97
|
Tune S, Alavash M, Fiedler L, Obleser J. Neural attentional-filter mechanisms of listening success in middle-aged and older individuals. Nat Commun 2021; 12:4533. [PMID: 34312388 PMCID: PMC8313676 DOI: 10.1038/s41467-021-24771-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2020] [Accepted: 07/01/2021] [Indexed: 12/12/2022] Open
Abstract
Successful listening crucially depends on intact attentional filters that separate relevant from irrelevant information. Research into their neurobiological implementation has focused on two potential auditory filter strategies: the lateralization of alpha power and selective neural speech tracking. However, the functional interplay of the two neural filter strategies and their potency to index listening success in an ageing population remains unclear. Using electroencephalography and a dual-talker task in a representative sample of listeners (N = 155; age=39-80 years), we here demonstrate an often-missed link from single-trial behavioural outcomes back to trial-by-trial changes in neural attentional filtering. First, we observe preserved attentional-cue-driven modulation of both neural filters across chronological age and hearing levels. Second, neural filter states vary independently of one another, demonstrating complementary neurobiological solutions of spatial selective attention. Stronger neural speech tracking but not alpha lateralization boosts trial-to-trial behavioural performance. Our results highlight the translational potential of neural speech tracking as an individualized neural marker of adaptive listening behaviour.
Collapse
Affiliation(s)
- Sarah Tune
- Department of Psychology, University of Lübeck, Lübeck, Germany.
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany.
| | - Mohsen Alavash
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
| | - Lorenz Fiedler
- Department of Psychology, University of Lübeck, Lübeck, Germany
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany
- Eriksholm Research Centre, Snekkersten, Denmark
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Lübeck, Germany.
- Center for Brain, Behavior, and Metabolism, University of Lübeck, Lübeck, Germany.
| |
Collapse
|
98
|
Lunner T, Alickovic E, Graversen C, Ng EHN, Wendt D, Keidser G. Three New Outcome Measures That Tap Into Cognitive Processes Required for Real-Life Communication. Ear Hear 2021; 41 Suppl 1:39S-47S. [PMID: 33105258 PMCID: PMC7676869 DOI: 10.1097/aud.0000000000000941] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 07/11/2020] [Indexed: 11/29/2022]
Abstract
To increase the ecological validity of outcomes from laboratory evaluations of hearing and hearing devices, it is desirable to introduce more realistic outcome measures in the laboratory. This article presents and discusses three outcome measures that have been designed to go beyond traditional speech-in-noise measures to better reflect realistic everyday challenges. The outcome measures reviewed are: the Sentence-final Word Identification and Recall (SWIR) test that measures working memory performance while listening to speech in noise at ceiling performance; a neural tracking method that produces a quantitative measure of selective speech attention in noise; and pupillometry that measures changes in pupil dilation to assess listening effort while listening to speech in noise. According to evaluation data, the SWIR test provides a sensitive measure in situations where speech perception performance might be unaffected. Similarly, pupil dilation has also shown sensitivity in situations where traditional speech-in-noise measures are insensitive. Changes in working memory capacity and effort mobilization were found at positive signal-to-noise ratios (SNR), that is, at SNRs that might reflect everyday situations. Using stimulus reconstruction, it has been demonstrated that neural tracking is a robust method at determining to what degree a listener is attending to a specific talker in a typical cocktail party situation. Using both established and commercially available noise reduction schemes, data have further shown that all three measures are sensitive to variation in SNR. In summary, the new outcome measures seem suitable for testing hearing and hearing devices under more realistic and demanding everyday conditions than traditional speech-in-noise tests.
Collapse
Affiliation(s)
- Thomas Lunner
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Emina Alickovic
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Electrical Engineering, Division Automatic Control, Linköping University, Linköping, Sweden
| | | | - Elaine Hoi Ning Ng
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
- Oticon A/S, Kongebakken, Denmark
| | - Dorothea Wendt
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Health Technology, Hearing Systems, Technical University of Denmark, Lyngby, Denmark
| | - Gitte Keidser
- Eriksholm Research Centre, Oticon A/S, Snekkersten, Denmark
- Department of Behavioural Sciences and Learning, Linnaeus Centre HEAD, Linköping University, Linköping, Sweden
| |
Collapse
|
99
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
100
|
Keshavarzi M, Varano E, Reichenbach T. Cortical Tracking of a Background Speaker Modulates the Comprehension of a Foreground Speech Signal. J Neurosci 2021; 41:5093-5101. [PMID: 33926996 PMCID: PMC8197648 DOI: 10.1523/jneurosci.3200-20.2021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 02/23/2021] [Accepted: 04/12/2021] [Indexed: 11/21/2022] Open
Abstract
Understanding speech in background noise is a difficult task. The tracking of speech rhythms such as the rate of syllables and words by cortical activity has emerged as a key neural mechanism for speech-in-noise comprehension. In particular, recent investigations have used transcranial alternating current stimulation (tACS) with the envelope of a speech signal to influence the cortical speech tracking, demonstrating that this type of stimulation modulates comprehension and therefore providing evidence of a functional role of the cortical tracking in speech processing. Cortical activity has been found to track the rhythms of a background speaker as well, but the functional significance of this neural response remains unclear. Here we use a speech-comprehension task with a target speaker in the presence of a distractor voice to show that tACS with the speech envelope of the target voice as well as tACS with the envelope of the distractor speaker both modulate the comprehension of the target speech. Because the envelope of the distractor speech does not carry information about the target speech stream, the modulation of speech comprehension through tACS with this envelope provides evidence that the cortical tracking of the background speaker affects the comprehension of the foreground speech signal. The phase dependency of the resulting modulation of speech comprehension is, however, opposite to that obtained from tACS with the envelope of the target speech signal. This suggests that the cortical tracking of the ignored speech stream and that of the attended speech stream may compete for neural resources.SIGNIFICANCE STATEMENT Loud environments such as busy pubs or restaurants can make conversation difficult. However, they also allow us to eavesdrop into other conversations that occur in the background. In particular, we often notice when somebody else mentions our name, even if we have not been listening to that person. However, the neural mechanisms by which background speech is processed remain poorly understood. Here we use transcranial alternating current stimulation, a technique through which neural activity in the cerebral cortex can be influenced, to show that cortical responses to rhythms in the distractor speech modulate the comprehension of the target speaker. Our results provide evidence that the cortical tracking of background speech rhythms plays a functional role in speech processing.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Enrico Varano
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, South Kensington Campus, London, SW7 2AZ, England
| |
Collapse
|