1
|
Lamekina Y, Titone L, Maess B, Meyer L. Speech Prosody Serves Temporal Prediction of Language via Contextual Entrainment. J Neurosci 2024; 44:e1041232024. [PMID: 38839302 PMCID: PMC11236583 DOI: 10.1523/jneurosci.1041-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 03/08/2024] [Accepted: 04/08/2024] [Indexed: 06/07/2024] Open
Abstract
Temporal prediction assists language comprehension. In a series of recent behavioral studies, we have shown that listeners specifically employ rhythmic modulations of prosody to estimate the duration of upcoming sentences, thereby speeding up comprehension. In the current human magnetoencephalography (MEG) study on participants of either sex, we show that the human brain achieves this function through a mechanism termed entrainment. Through entrainment, electrophysiological brain activity maintains and continues contextual rhythms beyond their offset. Our experiment combined exposure to repetitive prosodic contours with the subsequent presentation of visual sentences that either matched or mismatched the duration of the preceding contour. During exposure to prosodic contours, we observed MEG coherence with the contours, which was source-localized to right-hemispheric auditory areas. During the processing of the visual targets, activity at the frequency of the preceding contour was still detectable in the MEG; yet sources shifted to the (left) frontal cortex, in line with a functional inheritance of the rhythmic acoustic context for prediction. Strikingly, when the target sentence was shorter than expected from the preceding contour, an omission response appeared in the evoked potential record. We conclude that prosodic entrainment is a functional mechanism of temporal prediction in language comprehension. In general, acoustic rhythms appear to endow language for employing the brain's electrophysiological mechanisms of temporal prediction.
Collapse
Affiliation(s)
- Yulia Lamekina
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lorenzo Titone
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Burkhard Maess
- Methods and Development Group Brain Networks, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lars Meyer
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- University Clinic Münster, Münster 48149, Germany
| |
Collapse
|
2
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
3
|
Inbar M, Genzer S, Perry A, Grossman E, Landau AN. Intonation Units in Spontaneous Speech Evoke a Neural Response. J Neurosci 2023; 43:8189-8200. [PMID: 37793909 PMCID: PMC10697392 DOI: 10.1523/jneurosci.0235-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/16/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023] Open
Abstract
Spontaneous speech is produced in chunks called intonation units (IUs). IUs are defined by a set of prosodic cues and presumably occur in all human languages. Recent work has shown that across different grammatical and sociocultural conditions IUs form rhythms of ∼1 unit per second. Linguistic theory suggests that IUs pace the flow of information in the discourse. As a result, IUs provide a promising and hitherto unexplored theoretical framework for studying the neural mechanisms of communication. In this article, we identify a neural response unique to the boundary defined by the IU. We measured the EEG of human participants (of either sex), who listened to different speakers recounting an emotional life event. We analyzed the speech stimuli linguistically and modeled the EEG response at word offset using a GLM approach. We find that the EEG response to IU-final words differs from the response to IU-nonfinal words even when equating acoustic boundary strength. Finally, we relate our findings to the body of research on rhythmic brain mechanisms in speech processing. We study the unique contribution of IUs and acoustic boundary strength in predicting delta-band EEG. This analysis suggests that IU-related neural activity, which is tightly linked to the classic Closure Positive Shift (CPS), could be a time-locked component that captures the previously characterized delta-band neural speech tracking.SIGNIFICANCE STATEMENT Linguistic communication is central to human experience, and its neural underpinnings are a topic of much research in recent years. Neuroscientific research has benefited from studying human behavior in naturalistic settings, an endeavor that requires explicit models of complex behavior. Usage-based linguistic theory suggests that spoken language is prosodically structured in intonation units. We reveal that the neural system is attuned to intonation units by explicitly modeling their impact on the EEG response beyond mere acoustics. To our understanding, this is the first time this is demonstrated in spontaneous speech under naturalistic conditions and under a theoretical framework that connects the prosodic chunking of speech, on the one hand, with the flow of information during communication, on the other.
Collapse
Affiliation(s)
- Maya Inbar
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Shir Genzer
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Anat Perry
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Eitan Grossman
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Ayelet N Landau
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| |
Collapse
|
4
|
Wang X, Delgado J, Marchesotti S, Kojovic N, Sperdin HF, Rihs TA, Schaer M, Giraud AL. Speech Reception in Young Children with Autism Is Selectively Indexed by a Neural Oscillation Coupling Anomaly. J Neurosci 2023; 43:6779-6795. [PMID: 37607822 PMCID: PMC10552944 DOI: 10.1523/jneurosci.0112-22.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Revised: 07/02/2023] [Accepted: 07/07/2023] [Indexed: 08/24/2023] Open
Abstract
Communication difficulties are one of the core criteria in diagnosing autism spectrum disorder (ASD), and are often characterized by speech reception difficulties, whose biological underpinnings are not yet identified. This deficit could denote atypical neuronal ensemble activity, as reflected by neural oscillations. Atypical cross-frequency oscillation coupling, in particular, could disrupt the joint tracking and prediction of dynamic acoustic stimuli, a dual process that is essential for speech comprehension. Whether such oscillatory anomalies already exist in very young children with ASD, and with what specificity they relate to individual language reception capacity is unknown. We collected neural activity data using electroencephalography (EEG) in 64 very young children with and without ASD (mean age 3; 17 females, 47 males) while they were exposed to naturalistic-continuous speech. EEG power of frequency bands typically associated with phrase-level chunking (δ, 1-3 Hz), phonemic encoding (low-γ, 25-35 Hz), and top-down control (β, 12-20 Hz) were markedly reduced in ASD relative to typically developing (TD) children. Speech neural tracking by δ and θ (4-8 Hz) oscillations was also weaker in ASD compared with TD children. After controlling gaze-pattern differences, we found that the classical θ/γ coupling was replaced by an atypical β/γ coupling in children with ASD. This anomaly was the single most specific predictor of individual speech reception difficulties in ASD children. These findings suggest that early interventions (e.g., neurostimulation) targeting the disruption of β/γ coupling and the upregulation of θ/γ coupling could improve speech processing coordination in young children with ASD and help them engage in oral interactions.SIGNIFICANCE STATEMENT Very young children already present marked alterations of neural oscillatory activity in response to natural speech at the time of autism spectrum disorder (ASD) diagnosis. Hierarchical processing of phonemic-range and syllabic-range information (θ/γ coupling) is disrupted in ASD children. Abnormal bottom-up (low-γ) and top-down (low-β) coordination specifically predicts speech reception deficits in very young ASD children, and no other cognitive deficit.
Collapse
Affiliation(s)
- Xiaoyue Wang
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland, 1202
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France, 75012
| | - Jaime Delgado
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland, 1202
| | - Silvia Marchesotti
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland, 1202
| | - Nada Kojovic
- Autism Brain & Behavior Lab, Department of Psychiatry, University of Geneva, Geneva, Switzerland, 1202
| | - Holger Franz Sperdin
- Autism Brain & Behavior Lab, Department of Psychiatry, University of Geneva, Geneva, Switzerland, 1202
| | - Tonia A Rihs
- Functional Brain Mapping Laboratory, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland, 1202
| | - Marie Schaer
- Autism Brain & Behavior Lab, Department of Psychiatry, University of Geneva, Geneva, Switzerland, 1202
| | - Anne-Lise Giraud
- Auditory Language Group, Department of Basic Neuroscience, University of Geneva, Geneva, Switzerland, 1202
- Institut Pasteur, Université Paris Cité, Hearing Institute, Paris, France, 75012
| |
Collapse
|
5
|
Quique YM, Gnanateja GN, Dickey MW, Evans WS, Chandrasekaran B. Examining cortical tracking of the speech envelope in post-stroke aphasia. Front Hum Neurosci 2023; 17:1122480. [PMID: 37780966 PMCID: PMC10538638 DOI: 10.3389/fnhum.2023.1122480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Accepted: 08/28/2023] [Indexed: 10/03/2023] Open
Abstract
Introduction People with aphasia have been shown to benefit from rhythmic elements for language production during aphasia rehabilitation. However, it is unknown whether rhythmic processing is associated with such benefits. Cortical tracking of the speech envelope (CTenv) may provide a measure of encoding of speech rhythmic properties and serve as a predictor of candidacy for rhythm-based aphasia interventions. Methods Electroencephalography was used to capture electrophysiological responses while Spanish speakers with aphasia (n = 9) listened to a continuous speech narrative (audiobook). The Temporal Response Function was used to estimate CTenv in the delta (associated with word- and phrase-level properties), theta (syllable-level properties), and alpha bands (attention-related properties). CTenv estimates were used to predict aphasia severity, performance in rhythmic perception and production tasks, and treatment response in a sentence-level rhythm-based intervention. Results CTenv in delta and theta, but not alpha, predicted aphasia severity. Neither CTenv in delta, alpha, or theta bands predicted performance in rhythmic perception or production tasks. Some evidence supported that CTenv in theta could predict sentence-level learning in aphasia, but alpha and delta did not. Conclusion CTenv of the syllable-level properties was relatively preserved in individuals with less language impairment. In contrast, higher encoding of word- and phrase-level properties was relatively impaired and was predictive of more severe language impairments. CTenv and treatment response to sentence-level rhythm-based interventions need to be further investigated.
Collapse
Affiliation(s)
- Yina M. Quique
- Center for Education in Health Sciences, Northwestern University Feinberg School of Medicine, Chicago, IL, United States
| | - G. Nike Gnanateja
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI, United States
| | - Michael Walsh Dickey
- VA Pittsburgh Healthcare System, Pittsburgh, PA, United States
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Bharath Chandrasekaran
- Department of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
- Roxelyn and Richard Pepper Department of Communication Science and Disorders, School of Communication. Northwestern University, Evanston, IL, United States
| |
Collapse
|
6
|
Kovács P, Tóth B, Honbolygó F, Szalárdy O, Kohári A, Mády K, Magyari L, Winkler I. Speech prosody supports speaker selection and auditory stream segregation in a multi-talker situation. Brain Res 2023; 1805:148246. [PMID: 36657631 DOI: 10.1016/j.brainres.2023.148246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 01/06/2023] [Accepted: 01/12/2023] [Indexed: 01/19/2023]
Abstract
To process speech in a multi-talker environment, listeners need to segregate the mixture of incoming speech streams and focus their attention on one of them. Potentially, speech prosody could aid the segregation of different speakers, the selection of the desired speech stream, and detecting targets within the attended stream. For testing these issues, we recorded behavioral responses and extracted event-related potentials and functional brain networks from electroencephalographic signals recorded while participants listened to two concurrent speech streams, performing a lexical detection and a recognition memory task in parallel. Prosody manipulation was applied to the attended speech stream in one group of participants and to the ignored speech stream in another group. Naturally recorded speech stimuli were either intact, synthetically F0-flattened, or prosodically suppressed by the speaker. Results show that prosody - especially the parsing cues mediated by speech rate - facilitates stream selection, while playing a smaller role in auditory stream segmentation and target detection.
Collapse
Affiliation(s)
- Petra Kovács
- Department of Cognitive Science, Budapest University of Technology and Economics, Hungary
| | - Brigitta Tóth
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary.
| | - Ferenc Honbolygó
- Brain Imaging Center, Research Center for Natural Sciences, Hungary
| | - Orsolya Szalárdy
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary; Institute of Behavioural Sciences, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Anna Kohári
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Katalin Mády
- Research Group of Phonetics, Institute for General and Hungarian Linguistics, Hungarian Research Centre for Linguistics, Hungary
| | - Lilla Magyari
- Department of Social Studies, Faculty of Social Sciences, University of Stavanger, Stavanger, Norway; Norwegian Centre for Reading Education and Research, Faculty of Arts and Education, University of Stavanger, Stavanger, Norway
| | - István Winkler
- Institute of Cognitive Neuroscience and Psychology, Research Center for Natural Sciences, Hungary
| |
Collapse
|
7
|
Carta S, Mangiacotti AMA, Valdes AL, Reilly RB, Franco F, Di Liberto GM. The impact of temporal synchronisation imprecision on TRF analyses. J Neurosci Methods 2023; 385:109765. [PMID: 36481165 DOI: 10.1016/j.jneumeth.2022.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/17/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022]
Affiliation(s)
- Sara Carta
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland
| | - Anthony M A Mangiacotti
- Department of Psychology, Middlesex University, London, United Kingdom; FISPPA Department, University of Padova, Padova, Italy
| | - Alejandro Lopez Valdes
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Global Brain Health Institute, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland
| | - Richard B Reilly
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland; School of Medicine, Trinity College, The University of Dublin, Ireland
| | - Fabia Franco
- Department of Psychology, Middlesex University, London, United Kingdom
| | - Giovanni M Di Liberto
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland.
| |
Collapse
|
8
|
Desai M, Field AM, Hamilton LS. Dataset size considerations for robust acoustic and phonetic speech encoding models in EEG. Front Hum Neurosci 2023; 16:1001171. [PMID: 36741776 PMCID: PMC9895838 DOI: 10.3389/fnhum.2022.1001171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 12/22/2022] [Indexed: 01/21/2023] Open
Abstract
In many experiments that investigate auditory and speech processing in the brain using electroencephalography (EEG), the experimental paradigm is often lengthy and tedious. Typically, the experimenter errs on the side of including more data, more trials, and therefore conducting a longer task to ensure that the data are robust and effects are measurable. Recent studies used naturalistic stimuli to investigate the brain's response to individual or a combination of multiple speech features using system identification techniques, such as multivariate temporal receptive field (mTRF) analyses. The neural data collected from such experiments must be divided into a training set and a test set to fit and validate the mTRF weights. While a good strategy is clearly to collect as much data as is feasible, it is unclear how much data are needed to achieve stable results. Furthermore, it is unclear whether the specific stimulus used for mTRF fitting and the choice of feature representation affects how much data would be required for robust and generalizable results. Here, we used previously collected EEG data from our lab using sentence stimuli and movie stimuli as well as EEG data from an open-source dataset using audiobook stimuli to better understand how much data needs to be collected for naturalistic speech experiments measuring acoustic and phonetic tuning. We found that the EEG receptive field structure tested here stabilizes after collecting a training dataset of approximately 200 s of TIMIT sentences, around 600 s of movie trailers training set data, and approximately 460 s of audiobook training set data. Thus, we provide suggestions on the minimum amount of data that would be necessary for fitting mTRFs from naturalistic listening data. Our findings are motivated by highly practical concerns when working with children, patient populations, or others who may not tolerate long study sessions. These findings will aid future researchers who wish to study naturalistic speech processing in healthy and clinical populations while minimizing participant fatigue and retaining signal quality.
Collapse
Affiliation(s)
- Maansi Desai
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Alyssa M. Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Liberty S. Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States,Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States,*Correspondence: Liberty S. Hamilton ✉
| |
Collapse
|
9
|
Brodbeck C, Simon JZ. Cortical tracking of voice pitch in the presence of multiple speakers depends on selective attention. Front Neurosci 2022; 16:828546. [PMID: 36003957 PMCID: PMC9393379 DOI: 10.3389/fnins.2022.828546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Voice pitch carries linguistic and non-linguistic information. Previous studies have described cortical tracking of voice pitch in clean speech, with responses reflecting both pitch strength and pitch value. However, pitch is also a powerful cue for auditory stream segregation, especially when competing streams have pitch differing in fundamental frequency, as is the case when multiple speakers talk simultaneously. We therefore investigated how cortical speech pitch tracking is affected in the presence of a second, task-irrelevant speaker. We analyzed human magnetoencephalography (MEG) responses to continuous narrative speech, presented either as a single talker in a quiet background or as a two-talker mixture of a male and a female speaker. In clean speech, voice pitch was associated with a right-dominant response, peaking at a latency of around 100 ms, consistent with previous electroencephalography and electrocorticography results. The response tracked both the presence of pitch and the relative value of the speaker's fundamental frequency. In the two-talker mixture, the pitch of the attended speaker was tracked bilaterally, regardless of whether or not there was simultaneously present pitch in the speech of the irrelevant speaker. Pitch tracking for the irrelevant speaker was reduced: only the right hemisphere still significantly tracked pitch of the unattended speaker, and only during intervals in which no pitch was present in the attended talker's speech. Taken together, these results suggest that pitch-based segregation of multiple speakers, at least as measured by macroscopic cortical tracking, is not entirely automatic but strongly dependent on selective attention.
Collapse
Affiliation(s)
- Christian Brodbeck
- Department of Psychological Sciences, University of Connecticut, Storrs, CT, United States
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, College Park, MD, United States
- Department of Electrical and Computer Engineering, University of Maryland, College Park, College Park, MD, United States
- Department of Biology, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
10
|
Gnanateja GN, Devaraju DS, Heyne M, Quique YM, Sitek KR, Tardif MC, Tessmer R, Dial HR. On the Role of Neural Oscillations Across Timescales in Speech and Music Processing. Front Comput Neurosci 2022; 16:872093. [PMID: 35814348 PMCID: PMC9260496 DOI: 10.3389/fncom.2022.872093] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 05/24/2022] [Indexed: 11/25/2022] Open
Abstract
This mini review is aimed at a clinician-scientist seeking to understand the role of oscillations in neural processing and their functional relevance in speech and music perception. We present an overview of neural oscillations, methods used to study them, and their functional relevance with respect to music processing, aging, hearing loss, and disorders affecting speech and language. We first review the oscillatory frequency bands and their associations with speech and music processing. Next we describe commonly used metrics for quantifying neural oscillations, briefly touching upon the still-debated mechanisms underpinning oscillatory alignment. Following this, we highlight key findings from research on neural oscillations in speech and music perception, as well as contributions of this work to our understanding of disordered perception in clinical populations. Finally, we conclude with a look toward the future of oscillatory research in speech and music perception, including promising methods and potential avenues for future work. We note that the intention of this mini review is not to systematically review all literature on cortical tracking of speech and music. Rather, we seek to provide the clinician-scientist with foundational information that can be used to evaluate and design research studies targeting the functional role of oscillations in speech and music processing in typical and clinical populations.
Collapse
Affiliation(s)
- G. Nike Gnanateja
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Dhatri S. Devaraju
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Matthias Heyne
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Yina M. Quique
- Center for Education in Health Sciences, Northwestern University, Chicago, IL, United States
| | - Kevin R. Sitek
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Monique C. Tardif
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Rachel Tessmer
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin, Austin, TX, United States
| | - Heather R. Dial
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin, Austin, TX, United States
- Department of Communication Sciences and Disorders, University of Houston, Houston, TX, United States
| |
Collapse
|
11
|
Bröhl F, Keitel A, Kayser C. MEG Activity in Visual and Auditory Cortices Represents Acoustic Speech-Related Information during Silent Lip Reading. eNeuro 2022; 9:ENEURO.0209-22.2022. [PMID: 35728955 PMCID: PMC9239847 DOI: 10.1523/eneuro.0209-22.2022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 06/06/2022] [Indexed: 11/21/2022] Open
Abstract
Speech is an intrinsically multisensory signal, and seeing the speaker's lips forms a cornerstone of communication in acoustically impoverished environments. Still, it remains unclear how the brain exploits visual speech for comprehension. Previous work debated whether lip signals are mainly processed along the auditory pathways or whether the visual system directly implements speech-related processes. To probe this, we systematically characterized dynamic representations of multiple acoustic and visual speech-derived features in source localized MEG recordings that were obtained while participants listened to speech or viewed silent speech. Using a mutual-information framework we provide a comprehensive assessment of how well temporal and occipital cortices reflect the physically presented signals and unique aspects of acoustic features that were physically absent but may be critical for comprehension. Our results demonstrate that both cortices feature a functionally specific form of multisensory restoration: during lip reading, they reflect unheard acoustic features, independent of co-existing representations of the visible lip movements. This restoration emphasizes the unheard pitch signature in occipital cortex and the speech envelope in temporal cortex and is predictive of lip-reading performance. These findings suggest that when seeing the speaker's lips, the brain engages both visual and auditory pathways to support comprehension by exploiting multisensory correspondences between lip movements and spectro-temporal acoustic cues.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| | - Anne Keitel
- Psychology, University of Dundee, Dundee DD1 4HN, United Kingdom
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Bielefeld 33615, Germany
| |
Collapse
|
12
|
Di Liberto GM, Hjortkjær J, Mesgarani N. Editorial: Neural Tracking: Closing the Gap Between Neurophysiology and Translational Medicine. Front Neurosci 2022; 16:872600. [PMID: 35368278 PMCID: PMC8966872 DOI: 10.3389/fnins.2022.872600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 02/17/2022] [Indexed: 11/25/2022] Open
Affiliation(s)
- Giovanni M. Di Liberto
- School of Computer Science and Statistics, Trinity College Dublin, Dublin, Ireland
- ADAPT Centre, d-real, Trinity College Institute for Neuroscience, Dublin, Ireland
- *Correspondence: Giovanni M. Di Liberto
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Ireland
| | - Nima Mesgarani
- Electrical Engineering Department, Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY, United States
| |
Collapse
|
13
|
MacIntyre AD, Cai CQ, Scott SK. Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2002. [PMID: 35364952 DOI: 10.1121/10.0009844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Accepted: 03/05/2022] [Indexed: 06/14/2023]
Abstract
The amplitude of the speech signal varies over time, and the speech envelope is an attempt to characterise this variation in the form of an acoustic feature. Although tacitly assumed, the similarity between the speech envelope-derived time series and that of phonetic objects (e.g., vowels) remains empirically unestablished. The current paper, therefore, evaluates several speech envelope extraction techniques, such as the Hilbert transform, by comparing different acoustic landmarks (e.g., peaks in the speech envelope) with manual phonetic annotation in a naturalistic and diverse dataset. Joint speech tasks are also introduced to determine which acoustic landmarks are most closely coordinated when voices are aligned. Finally, the acoustic landmarks are evaluated as predictors for the temporal characterisation of speaking style using classification tasks. The landmark that performed most closely to annotated vowel onsets was peaks in the first derivative of a human audition-informed envelope, consistent with converging evidence from neural and behavioural data. However, differences also emerged based on language and speaking style. Overall, the results show that both the choice of speech envelope extraction technique and the form of speech under study affect how sensitive an engineered feature is at capturing aspects of speech rhythm, such as the timing of vowels.
Collapse
Affiliation(s)
| | - Ceci Qing Cai
- Institute of Cognitive Neuroscience, University College London, London, WC1N 3AZ, United Kingdom
| | - Sophie K Scott
- Institute of Cognitive Neuroscience, University College London, London, WC1N 3AZ, United Kingdom
| |
Collapse
|
14
|
Teoh ES, Ahmed F, Lalor EC. Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment. J Neurosci 2022; 42:682-691. [PMID: 34893546 PMCID: PMC8805628 DOI: 10.1523/jneurosci.1455-20.2021] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/21/2022] Open
Abstract
Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.
Collapse
Affiliation(s)
- Emily S Teoh
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
| | - Farhin Ahmed
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| |
Collapse
|
15
|
Bachmann FL, MacDonald EN, Hjortkjær J. Neural Measures of Pitch Processing in EEG Responses to Running Speech. Front Neurosci 2022; 15:738408. [PMID: 35002597 PMCID: PMC8729880 DOI: 10.3389/fnins.2021.738408] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 11/01/2021] [Indexed: 11/13/2022] Open
Abstract
Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.
Collapse
Affiliation(s)
- Florine L Bachmann
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark
| | - Ewen N MacDonald
- Department of Systems Design Engineering, University of Waterloo, Waterloo, ON, Canada
| | - Jens Hjortkjær
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark
| |
Collapse
|
16
|
Tomasello R, Grisoni L, Boux I, Sammler D, Pulvermüller F. OUP accepted manuscript. Cereb Cortex 2022; 32:4885-4901. [PMID: 35136980 PMCID: PMC9626830 DOI: 10.1093/cercor/bhab522] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 11/16/2021] [Accepted: 12/17/2021] [Indexed: 11/20/2022] Open
Abstract
During conversations, speech prosody provides important clues about the speaker’s communicative intentions. In many languages, a rising vocal pitch at the end of a sentence typically expresses a question function, whereas a falling pitch suggests a statement. Here, the neurophysiological basis of intonation and speech act understanding were investigated with high-density electroencephalography (EEG) to determine whether prosodic features are reflected at the neurophysiological level. Already approximately 100 ms after the sentence-final word differing in prosody, questions, and statements expressed with the same sentences led to different neurophysiological activity recorded in the event-related potential. Interestingly, low-pass filtered sentences and acoustically matched nonvocal musical signals failed to show any neurophysiological dissociations, thus suggesting that the physical intonation alone cannot explain this modulation. Our results show rapid neurophysiological indexes of prosodic communicative information processing that emerge only when pragmatic and lexico-semantic information are fully expressed. The early enhancement of question-related activity compared with statements was due to sources in the articulatory-motor region, which may reflect the richer action knowledge immanent to questions, namely the expectation of the partner action of answering the question. The present findings demonstrate a neurophysiological correlate of prosodic communicative information processing, which enables humans to rapidly detect and understand speaker intentions in linguistic interactions.
Collapse
Affiliation(s)
- Rosario Tomasello
- Address correspondence to Rosario Tomasello, Brain Language Laboratory, Department of Philosophy and Humanities, WE4, Freie Universität Berlin, Habelschwerdter Allee 45, 14195 Berlin, Germany.
| | - Luigi Grisoni
- Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, 14195 Berlin, Germany
- Cluster of Excellence ‘Matters of Activity. Image Space Material’, Humboldt Universität zu Berlin, 10099 Berlin, Germany
| | - Isabella Boux
- Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, 14195 Berlin, Germany
- Berlin School of Mind and Brain, Humboldt Universität zu Berlin, 10117 Berlin, Germany
- Einstein Center for Neurosciences, 10117 Berlin, Germany
| | - Daniela Sammler
- Research Group ‘Neurocognition of Music and Language’, Max Planck Institute for Empirical Aesthetics, 60322 Frankfurt am Main, Germany
- Department of Neuropsychology, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany
| | - Friedemann Pulvermüller
- Brain Language Laboratory, Department of Philosophy and Humanities, Freie Universität Berlin, 14195 Berlin, Germany
- Cluster of Excellence ‘Matters of Activity. Image Space Material’, Humboldt Universität zu Berlin, 10099 Berlin, Germany
- Berlin School of Mind and Brain, Humboldt Universität zu Berlin, 10117 Berlin, Germany
- Einstein Center for Neurosciences, 10117 Berlin, Germany
| |
Collapse
|
17
|
Palana J, Schwartz S, Tager-Flusberg H. Evaluating the Use of Cortical Entrainment to Measure Atypical Speech Processing: A Systematic Review. Neurosci Biobehav Rev 2021; 133:104506. [PMID: 34942267 DOI: 10.1016/j.neubiorev.2021.12.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 12/12/2021] [Accepted: 12/18/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Cortical entrainment has emerged as promising means for measuring continuous speech processing in young, neurotypical adults. However, its utility for capturing atypical speech processing has not been systematically reviewed. OBJECTIVES Synthesize evidence regarding the merit of measuring cortical entrainment to capture atypical speech processing and recommend avenues for future research. METHOD We systematically reviewed publications investigating entrainment to continuous speech in populations with auditory processing differences. RESULTS In the 25 publications reviewed, most studies were conducted on older and/or hearing-impaired adults, for whom slow-wave entrainment to speech was often heightened compared to controls. Research conducted on populations with neurodevelopmental disorders, in whom slow-wave entrainment was often reduced, was less common. Across publications, findings highlighted associations between cortical entrainment and speech processing performance differences. CONCLUSIONS Measures of cortical entrainment offer useful means of capturing speech processing differences and future research should leverage them more extensively when studying populations with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Joseph Palana
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA; Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Harvard Medical School, Boston Children's Hospital, 1 Autumn Street, Boston, MA, 02215, USA
| | - Sophie Schwartz
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
18
|
Symons AE, Dick F, Tierney AT. Dimension-selective attention and dimensional salience modulate cortical tracking of acoustic dimensions. Neuroimage 2021; 244:118544. [PMID: 34492294 DOI: 10.1016/j.neuroimage.2021.118544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/19/2021] [Accepted: 08/31/2021] [Indexed: 11/17/2022] Open
Abstract
Some theories of auditory categorization suggest that auditory dimensions that are strongly diagnostic for particular categories - for instance voice onset time or fundamental frequency in the case of some spoken consonants - attract attention. However, prior cognitive neuroscience research on auditory selective attention has largely focused on attention to simple auditory objects or streams, and so little is known about the neural mechanisms that underpin dimension-selective attention, or how the relative salience of variations along these dimensions might modulate neural signatures of attention. Here we investigate whether dimensional salience and dimension-selective attention modulate the cortical tracking of acoustic dimensions. In two experiments, participants listened to tone sequences varying in pitch and spectral peak frequency; these two dimensions changed at different rates. Inter-trial phase coherence (ITPC) and amplitude of the EEG signal at the frequencies tagged to pitch and spectral changes provided a measure of cortical tracking of these dimensions. In Experiment 1, tone sequences varied in the size of the pitch intervals, while the size of spectral peak intervals remained constant. Cortical tracking of pitch changes was greater for sequences with larger compared to smaller pitch intervals, with no difference in cortical tracking of spectral peak changes. In Experiment 2, participants selectively attended to either pitch or spectral peak. Cortical tracking was stronger in response to the attended compared to unattended dimension for both pitch and spectral peak. These findings suggest that attention can enhance the cortical tracking of specific acoustic dimensions rather than simply enhancing tracking of the auditory object as a whole.
Collapse
Affiliation(s)
- Ashley E Symons
- Department of Psychological Sciences, Birkbeck College, University of London UK.
| | - Fred Dick
- Department of Psychological Sciences, Birkbeck College, University of London UK; Division of Psychology & Language Sciences, University College London UK
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck College, University of London UK
| |
Collapse
|
19
|
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci 2021; 41:8946-8962. [PMID: 34503996 DOI: 10.1523/jneurosci.2891-20.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 11/21/2022] Open
Abstract
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Collapse
|
20
|
Learning nonnative speech sounds changes local encoding in the adult human cortex. Proc Natl Acad Sci U S A 2021; 118:2101777118. [PMID: 34475209 DOI: 10.1073/pnas.2101777118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/12/2021] [Indexed: 11/18/2022] Open
Abstract
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
Collapse
|
21
|
Zuk NJ, Murphy JW, Reilly RB, Lalor EC. Envelope reconstruction of speech and music highlights stronger tracking of speech at low frequencies. PLoS Comput Biol 2021; 17:e1009358. [PMID: 34534211 PMCID: PMC8480853 DOI: 10.1371/journal.pcbi.1009358] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 09/29/2021] [Accepted: 08/18/2021] [Indexed: 11/19/2022] Open
Abstract
The human brain tracks amplitude fluctuations of both speech and music, which reflects acoustic processing in addition to the encoding of higher-order features and one's cognitive state. Comparing neural tracking of speech and music envelopes can elucidate stimulus-general mechanisms, but direct comparisons are confounded by differences in their envelope spectra. Here, we use a novel method of frequency-constrained reconstruction of stimulus envelopes using EEG recorded during passive listening. We expected to see music reconstruction match speech in a narrow range of frequencies, but instead we found that speech was reconstructed better than music for all frequencies we examined. Additionally, models trained on all stimulus types performed as well or better than the stimulus-specific models at higher modulation frequencies, suggesting a common neural mechanism for tracking speech and music. However, speech envelope tracking at low frequencies, below 1 Hz, was associated with increased weighting over parietal channels, which was not present for the other stimuli. Our results highlight the importance of low-frequency speech tracking and suggest an origin from speech-specific processing in the brain.
Collapse
Affiliation(s)
- Nathaniel J. Zuk
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| | - Jeremy W. Murphy
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Richard B. Reilly
- Department of Mechanical, Manufacturing & Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Dublin, Ireland
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
| | - Edmund C. Lalor
- Department of Electronic & Electrical Engineering, Trinity College, The University of Dublin, Dublin, Ireland
- Department of Biomedical Engineering, University of Rochester, Rochester, New York, United States of America
- Del Monte Institute of Neuroscience, University of Rochester Medical Center, Rochester, New York, United States of America
| |
Collapse
|
22
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
23
|
Llanos F, German JS, Gnanateja GN, Chandrasekaran B. The neural processing of pitch accents in continuous speech. Neuropsychologia 2021; 158:107883. [PMID: 33989647 DOI: 10.1016/j.neuropsychologia.2021.107883] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 04/29/2021] [Accepted: 05/03/2021] [Indexed: 12/21/2022]
Abstract
Pitch accents are local pitch patterns that convey differences in word prominence and modulate the information structure of the discourse. Despite the importance to discourse in languages like English, neural processing of pitch accents remains understudied. The current study investigates the neural processing of pitch accents by native and non-native English speakers while they are listening to or ignoring 45 min of continuous, natural speech. Leveraging an approach used to study phonemes in natural speech, we analyzed thousands of electroencephalography (EEG) segments time-locked to pitch accents in a prosodic transcription. The optimal neural discrimination between pitch accent categories emerged at latencies between 100 and 200 ms. During these latencies, we found a strong structural alignment between neural and phonetic representations of pitch accent categories. In the same latencies, native listeners exhibited more robust processing of pitch accent contrasts than non-native listeners. However, these group differences attenuated when the speech signal was ignored. We can reliably capture the neural processing of discrete and contrastive pitch accent categories in continuous speech. Our analytic approach also captures how language-specific knowledge and selective attention influences the neural processing of pitch accent categories.
Collapse
Affiliation(s)
- Fernando Llanos
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA; Department of Linguistics, The University of Texas at Austin, Austin, TX, USA
| | - James S German
- Aix-Marseille University, CNRS, LPL, Aix-en-Provence, France
| | - G Nike Gnanateja
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
24
|
de Cheveigné A, Slaney M, Fuglsang SA, Hjortkjaer J. Auditory stimulus-response modeling with a match-mismatch task. J Neural Eng 2021; 18. [PMID: 33849003 DOI: 10.1088/1741-2552/abf771] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 04/13/2021] [Indexed: 11/12/2022]
Abstract
Objective.An auditory stimulus can be related to the brain response that it evokes by a stimulus-response model fit to the data. This offers insight into perceptual processes within the brain and is also of potential use for devices such as brain computer interfaces (BCIs). The quality of the model can be quantified by measuring the fit with a regression problem, or by applying it to a classification task and measuring its performance.Approach.Here we focus on amatch-mismatch(MM) task that entails deciding whether a segment of brain signal matches, via a model, the auditory stimulus that evoked it.Main results. Using these metrics, we describe a range of models of increasing complexity that we compare to methods in the literature, showing state-of-the-art performance. We document in detail one particular implementation, calibrated on a publicly-available database, that can serve as a robust reference to evaluate future developments.Significance.The MM task allows stimulus-response models to be evaluated in the limit of very high model accuracy, making it an attractive alternative to the more commonly used task of auditory attention detection. The MM task does not require class labels, so it is immune to mislabeling, and it is applicable to data recorded in listening scenarios with only one sound source, thus it is cheap to obtain large quantities of training and testing data. Performance metrics from this task, associated with regression accuracy, provide complementary insights into the relation between stimulus and response, as well as information about discriminatory power directly applicable to BCI applications.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Paris, CNRS UMR 8248, France.,Département d'Etudes Cognitives, Ecole Normale Supérieure, Paris, PSL, France.,UCL Ear Institute, London, United Kingdom.,Audition, DEC, ENS, 29 rue d'Ulm, 75230 Paris, France
| | - Malcolm Slaney
- Google Research, Machine Hearing Group, Mountain View, CA, United States of America
| | - Søren A Fuglsang
- Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| | - Jens Hjortkjaer
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, Kgs. Lyngby, Denmark.,Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Copenhagen, Denmark
| |
Collapse
|
25
|
Dial HR, Gnanateja GN, Tessmer RS, Gorno-Tempini ML, Chandrasekaran B, Henry ML. Cortical Tracking of the Speech Envelope in Logopenic Variant Primary Progressive Aphasia. Front Hum Neurosci 2021; 14:597694. [PMID: 33488371 PMCID: PMC7815818 DOI: 10.3389/fnhum.2020.597694] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 11/19/2020] [Indexed: 11/13/2022] Open
Abstract
Logopenic variant primary progressive aphasia (lvPPA) is a neurodegenerative language disorder primarily characterized by impaired phonological processing. Sentence repetition and comprehension deficits are observed in lvPPA and linked to impaired phonological working memory, but recent evidence also implicates impaired speech perception. Currently, neural encoding of the speech envelope, which forms the scaffolding for perception, is not clearly understood in lvPPA. We leveraged recent analytical advances in electrophysiology to examine speech envelope encoding in lvPPA. We assessed cortical tracking of the speech envelope and in-task comprehension of two spoken narratives in individuals with lvPPA (n = 10) and age-matched (n = 10) controls. Despite markedly reduced narrative comprehension relative to controls, individuals with lvPPA had increased cortical tracking of the speech envelope in theta oscillations, which track low-level features (e.g., syllables), but not delta oscillations, which track speech units that unfold across a longer time scale (e.g., words, phrases, prosody). This neural signature was highly correlated across narratives. Results indicate an increased reliance on acoustic cues during speech encoding. This may reflect inefficient encoding of bottom-up speech cues, likely as a consequence of dysfunctional temporoparietal cortex.
Collapse
Affiliation(s)
- Heather R. Dial
- Aphasia Research and Treatment Lab, Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX, United States
| | - G. Nike Gnanateja
- SoundBrain Lab, Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
| | - Rachel S. Tessmer
- Aphasia Research and Treatment Lab, Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX, United States
| | - Maria Luisa Gorno-Tempini
- Language Neurobiology Laboratory, Department of Neurology, Memory and Aging Center, University of California, San Francisco, San Francisco, CA, United States
| | - Bharath Chandrasekaran
- SoundBrain Lab, Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, United States
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, United States
| | - Maya L. Henry
- Aphasia Research and Treatment Lab, Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, Austin, TX, United States
- Department of Neurology, Dell Medical School, University of Texas at Austin, Austin, TX, United States
| |
Collapse
|
26
|
Hausfeld L, Shiell M, Formisano E, Riecke L. Cortical processing of distracting speech in noisy auditory scenes depends on perceptual demand. Neuroimage 2020; 228:117670. [PMID: 33359352 DOI: 10.1016/j.neuroimage.2020.117670] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2020] [Revised: 12/13/2020] [Accepted: 12/14/2020] [Indexed: 11/15/2022] Open
Abstract
Selective attention is essential for the processing of multi-speaker auditory scenes because they require the perceptual segregation of the relevant speech ("target") from irrelevant speech ("distractors"). For simple sounds, it has been suggested that the processing of multiple distractor sounds depends on bottom-up factors affecting task performance. However, it remains unclear whether such dependency applies to naturalistic multi-speaker auditory scenes. In this study, we tested the hypothesis that increased perceptual demand (the processing requirement posed by the scene to separate the target speech) reduces the cortical processing of distractor speech thus decreasing their perceptual segregation. Human participants were presented with auditory scenes including three speakers and asked to selectively attend to one speaker while their EEG was acquired. The perceptual demand of this selective listening task was varied by introducing an auditory cue (interaural time differences, ITDs) for segregating the target from the distractor speakers, while acoustic differences between the distractors were matched in ITD and loudness. We obtained a quantitative measure of the cortical segregation of distractor speakers by assessing the difference in how accurately speech-envelope following EEG responses could be predicted by models of averaged distractor speech versus models of individual distractor speech. In agreement with our hypothesis, results show that interaural segregation cues led to improved behavioral word-recognition performance and stronger cortical segregation of the distractor speakers. The neural effect was strongest in the δ-band and at early delays (0 - 200 ms). Our results indicate that during low perceptual demand, the human cortex represents individual distractor speech signals as more segregated. This suggests that, in addition to purely acoustical properties, the cortical processing of distractor speakers depends on factors like perceptual demand.
Collapse
Affiliation(s)
- Lars Hausfeld
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands.
| | - Martha Shiell
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| | - Elia Formisano
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands; Maastricht Centre for Systems Biology, 6200MD Maastricht, The Netherlands
| | - Lars Riecke
- Department of Cognitive Neuroscience, Faculty of Psychology and Neuroscience, Maastricht University, P.O. Box 616, 6200MD Maastricht, The Netherlands; Maastricht Brain Imaging Centre, 6200MD Maastricht, The Netherlands
| |
Collapse
|
27
|
Abstract
Speech processing in the human brain is grounded in non-specific auditory processing in the general mammalian brain, but relies on human-specific adaptations for processing speech and language. For this reason, many recent neurophysiological investigations of speech processing have turned to the human brain, with an emphasis on continuous speech. Substantial progress has been made using the phenomenon of "neural speech tracking", in which neurophysiological responses time-lock to the rhythm of auditory (and other) features in continuous speech. One broad category of investigations concerns the extent to which speech tracking measures are related to speech intelligibility, which has clinical applications in addition to its scientific importance. Recent investigations have also focused on disentangling different neural processes that contribute to speech tracking. The two lines of research are closely related, since processing stages throughout auditory cortex contribute to speech comprehension, in addition to subcortical processing and higher order and attentional processes.
Collapse
Affiliation(s)
- Christian Brodbeck
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
| | - Jonathan Z. Simon
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, U.S.A
- Department of Biology, University of Maryland, College Park, Maryland 20742, U.S.A
| |
Collapse
|
28
|
Chen Y, Jin P, Ding N. The influence of linguistic information on cortical tracking of words. Neuropsychologia 2020; 148:107640. [DOI: 10.1016/j.neuropsychologia.2020.107640] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 08/08/2020] [Accepted: 09/28/2020] [Indexed: 10/23/2022]
|
29
|
Broderick MP, Anderson AJ, Lalor EC. Semantic Context Enhances the Early Auditory Encoding of Natural Speech. J Neurosci 2019; 39:7564-7575. [PMID: 31371424 PMCID: PMC6750931 DOI: 10.1523/jneurosci.0584-19.2019] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/20/2019] [Accepted: 07/29/2019] [Indexed: 01/22/2023] Open
Abstract
Speech perception involves the integration of sensory input with expectations based on the context of that speech. Much debate surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can be best explained as a purely feedforward process. Although there has been compelling evidence on both sides of this debate, experiments involving naturalistic speech stimuli to address these questions have been lacking. Here, we use a recently introduced method for quantifying the semantic context of speech and relate it to a commonly used method for indexing low-level auditory encoding of speech. The relationship between these measures is taken to be an indication of how semantic context leading up to a word influences how its low-level acoustic and phonetic features are processed. We record EEG from human participants (both male and female) listening to continuous natural speech and find that the early cortical tracking of a word's speech envelope is enhanced by its semantic similarity to its sentential context. Using a forward modeling approach, we find that prediction accuracy of the EEG signal also shows the same effect. Furthermore, this effect shows distinct temporal patterns of correlation depending on the type of speech input representation (acoustic or phonological) used for the model, implicating a top-down propagation of information through the processing hierarchy. These results suggest a mechanism that links top-down prior information with the early cortical entrainment of words in natural, continuous speech.SIGNIFICANCE STATEMENT During natural speech comprehension, we use semantic context when processing information about new incoming words. However, precisely how the neural processing of bottom-up sensory information is affected by top-down context-based predictions remains controversial. We address this discussion using a novel approach that indexes a word's similarity to context and how well a word's acoustic and phonetic features are processed by the brain at the time of its utterance. We relate these two measures and show that lower-level auditory tracking of speech improves for words that are more related to their preceding context. These results suggest a mechanism that links top-down prior information with bottom-up sensory processing in the context of natural, narrative speech listening.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland,
| | - Andrew J Anderson
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|