1
|
Chalas N, Meyer L, Lo CW, Park H, Kluger DS, Abbasi O, Kayser C, Nitsch R, Gross J. Dissociating prosodic from syntactic delta activity during natural speech comprehension. Curr Biol 2024; 34:3537-3549.e5. [PMID: 39047734 DOI: 10.1016/j.cub.2024.06.072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 06/24/2024] [Accepted: 06/27/2024] [Indexed: 07/27/2024]
Abstract
Decoding human speech requires the brain to segment the incoming acoustic signal into meaningful linguistic units, ranging from syllables and words to phrases. Integrating these linguistic constituents into a coherent percept sets the root of compositional meaning and hence understanding. One important cue for segmentation in natural speech is prosodic cues, such as pauses, but their interplay with higher-level linguistic processing is still unknown. Here, we dissociate the neural tracking of prosodic pauses from the segmentation of multi-word chunks using magnetoencephalography (MEG). We find that manipulating the regularity of pauses disrupts slow speech-brain tracking bilaterally in auditory areas (below 2 Hz) and in turn increases left-lateralized coherence of higher-frequency auditory activity at speech onsets (around 25-45 Hz). Critically, we also find that multi-word chunks-defined as short, coherent bundles of inter-word dependencies-are processed through the rhythmic fluctuations of low-frequency activity (below 2 Hz) bilaterally and independently of prosodic cues. Importantly, low-frequency alignment at chunk onsets increases the accuracy of an encoding model in bilateral auditory and frontal areas while controlling for the effect of acoustics. Our findings provide novel insights into the neural basis of speech perception, demonstrating that both acoustic features (prosodic cues) and abstract linguistic processing at the multi-word timescale are underpinned independently by low-frequency electrophysiological brain activity in the delta frequency range.
Collapse
Affiliation(s)
- Nikos Chalas
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany; Institute for Translational Neuroscience, University of Münster, Münster, Germany.
| | - Lars Meyer
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Chia-Wen Lo
- Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Hyojin Park
- Centre for Human Brain Health (CHBH), School of Psychology, University of Birmingham, Birmingham, UK
| | - Daniel S Kluger
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| | - Omid Abbasi
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, 33615 Bielefeld, Germany
| | - Robert Nitsch
- Institute for Translational Neuroscience, University of Münster, Münster, Germany
| | - Joachim Gross
- Institute for Biomagnetism and Biosignal Analysis, University of Münster, Münster, Germany; Otto-Creutzfeldt-Center for Cognitive and Behavioral Neuroscience, University of Münster, Münster, Germany
| |
Collapse
|
2
|
te Rietmolen N, Mercier MR, Trébuchon A, Morillon B, Schön D. Speech and music recruit frequency-specific distributed and overlapping cortical networks. eLife 2024; 13:RP94509. [PMID: 39038076 PMCID: PMC11262799 DOI: 10.7554/elife.94509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024] Open
Abstract
To what extent does speech and music processing rely on domain-specific and domain-general neural networks? Using whole-brain intracranial EEG recordings in 18 epilepsy patients listening to natural, continuous speech or music, we investigated the presence of frequency-specific and network-level brain activity. We combined it with a statistical approach in which a clear operational distinction is made between shared, preferred, and domain-selective neural responses. We show that the majority of focal and network-level neural activity is shared between speech and music processing. Our data also reveal an absence of anatomical regional selectivity. Instead, domain-selective neural responses are restricted to distributed and frequency-specific coherent oscillations, typical of spectral fingerprints. Our work highlights the importance of considering natural stimuli and brain dynamics in their full complexity to map cognitive and brain functions.
Collapse
Affiliation(s)
- Noémie te Rietmolen
- Institute for Language, Communication, and the Brain, Aix-Marseille UniversityMarseilleFrance
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des SystèmesMarseilleFrance
| | - Manuel R Mercier
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des SystèmesMarseilleFrance
| | - Agnès Trébuchon
- Institute for Language, Communication, and the Brain, Aix-Marseille UniversityMarseilleFrance
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des SystèmesMarseilleFrance
- APHM, Hôpital de la Timone, Service de Neurophysiologie CliniqueMarseilleFrance
| | - Benjamin Morillon
- Institute for Language, Communication, and the Brain, Aix-Marseille UniversityMarseilleFrance
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des SystèmesMarseilleFrance
| | - Daniele Schön
- Institute for Language, Communication, and the Brain, Aix-Marseille UniversityMarseilleFrance
- Aix Marseille Université, INSERM, INS, Institut de Neurosciences des SystèmesMarseilleFrance
| |
Collapse
|
3
|
Lamekina Y, Titone L, Maess B, Meyer L. Speech Prosody Serves Temporal Prediction of Language via Contextual Entrainment. J Neurosci 2024; 44:e1041232024. [PMID: 38839302 PMCID: PMC11236583 DOI: 10.1523/jneurosci.1041-23.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 03/08/2024] [Accepted: 04/08/2024] [Indexed: 06/07/2024] Open
Abstract
Temporal prediction assists language comprehension. In a series of recent behavioral studies, we have shown that listeners specifically employ rhythmic modulations of prosody to estimate the duration of upcoming sentences, thereby speeding up comprehension. In the current human magnetoencephalography (MEG) study on participants of either sex, we show that the human brain achieves this function through a mechanism termed entrainment. Through entrainment, electrophysiological brain activity maintains and continues contextual rhythms beyond their offset. Our experiment combined exposure to repetitive prosodic contours with the subsequent presentation of visual sentences that either matched or mismatched the duration of the preceding contour. During exposure to prosodic contours, we observed MEG coherence with the contours, which was source-localized to right-hemispheric auditory areas. During the processing of the visual targets, activity at the frequency of the preceding contour was still detectable in the MEG; yet sources shifted to the (left) frontal cortex, in line with a functional inheritance of the rhythmic acoustic context for prediction. Strikingly, when the target sentence was shorter than expected from the preceding contour, an omission response appeared in the evoked potential record. We conclude that prosodic entrainment is a functional mechanism of temporal prediction in language comprehension. In general, acoustic rhythms appear to endow language for employing the brain's electrophysiological mechanisms of temporal prediction.
Collapse
Affiliation(s)
- Yulia Lamekina
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lorenzo Titone
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Burkhard Maess
- Methods and Development Group Brain Networks, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
| | - Lars Meyer
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig 04103, Germany
- University Clinic Münster, Münster 48149, Germany
| |
Collapse
|
4
|
Lo CW, Meyer L. Chunk boundaries disrupt dependency processing in an AG: Reconciling incremental processing and discrete sampling. PLoS One 2024; 19:e0305333. [PMID: 38889141 PMCID: PMC11185458 DOI: 10.1371/journal.pone.0305333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/29/2024] [Indexed: 06/20/2024] Open
Abstract
Language is rooted in our ability to compose: We link words together, fusing their meanings. Links are not limited to neighboring words but often span intervening words. The ability to process these non-adjacent dependencies (NADs) conflicts with the brain's sampling of speech: We consume speech in chunks that are limited in time, containing only a limited number of words. It is unknown how we link words together that belong to separate chunks. Here, we report that we cannot-at least not so well. In our electroencephalography (EEG) study, 37 human listeners learned chunks and dependencies from an artificial grammar (AG) composed of syllables. Multi-syllable chunks to be learned were equal-sized, allowing us to employ a frequency-tagging approach. On top of chunks, syllable streams contained NADs that were either confined to a single chunk or crossed a chunk boundary. Frequency analyses of the EEG revealed a spectral peak at the chunk rate, showing that participants learned the chunks. NADs that cross boundaries were associated with smaller electrophysiological responses than within-chunk NADs. This shows that NADs are processed readily when they are confined to the same chunk, but not as well when crossing a chunk boundary. Our findings help to reconcile the classical notion that language is processed incrementally with recent evidence for discrete perceptual sampling of speech. This has implications for language acquisition and processing as well as for the general view of syntax in human language.
Collapse
Affiliation(s)
- Chia-Wen Lo
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Lars Meyer
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- University Clinic Münster, Münster, Germany
| |
Collapse
|
5
|
Zoefel B, Kösem A. Neural tracking of continuous acoustics: properties, speech-specificity and open questions. Eur J Neurosci 2024; 59:394-414. [PMID: 38151889 DOI: 10.1111/ejn.16221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 11/17/2023] [Accepted: 11/22/2023] [Indexed: 12/29/2023]
Abstract
Human speech is a particularly relevant acoustic stimulus for our species, due to its role of information transmission during communication. Speech is inherently a dynamic signal, and a recent line of research focused on neural activity following the temporal structure of speech. We review findings that characterise neural dynamics in the processing of continuous acoustics and that allow us to compare these dynamics with temporal aspects in human speech. We highlight properties and constraints that both neural and speech dynamics have, suggesting that auditory neural systems are optimised to process human speech. We then discuss the speech-specificity of neural dynamics and their potential mechanistic origins and summarise open questions in the field.
Collapse
Affiliation(s)
- Benedikt Zoefel
- Centre de Recherche Cerveau et Cognition (CerCo), CNRS UMR 5549, Toulouse, France
- Université de Toulouse III Paul Sabatier, Toulouse, France
| | - Anne Kösem
- Lyon Neuroscience Research Center (CRNL), INSERM U1028, Bron, France
| |
Collapse
|
6
|
Barchet AV, Henry MJ, Pelofi C, Rimmele JM. Auditory-motor synchronization and perception suggest partially distinct time scales in speech and music. COMMUNICATIONS PSYCHOLOGY 2024; 2:2. [PMID: 39242963 PMCID: PMC11332030 DOI: 10.1038/s44271-023-00053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2023] [Accepted: 12/19/2023] [Indexed: 09/09/2024]
Abstract
Speech and music might involve specific cognitive rhythmic timing mechanisms related to differences in the dominant rhythmic structure. We investigate the influence of different motor effectors on rate-specific processing in both domains. A perception and a synchronization task involving syllable and piano tone sequences and motor effectors typically associated with speech (whispering) and music (finger-tapping) were tested at slow (~2 Hz) and fast rates (~4.5 Hz). Although synchronization performance was generally better at slow rates, the motor effectors exhibited specific rate preferences. Finger-tapping was advantaged compared to whispering at slow but not at faster rates, with synchronization being effector-dependent at slow, but highly correlated at faster rates. Perception of speech and music was better at different rates and predicted by a fast general and a slow finger-tapping synchronization component. Our data suggests partially independent rhythmic timing mechanisms for speech and music, possibly related to a differential recruitment of cortical motor circuitry.
Collapse
Affiliation(s)
- Alice Vivien Barchet
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
| | - Molly J Henry
- Research Group 'Neural and Environmental Rhythms', Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany
- Department of Psychology, Toronto Metropolitan University, Toronto, Canada
| | - Claire Pelofi
- Music and Audio Research Laboratory, New York University, New York, NY, USA
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA
| | - Johanna M Rimmele
- Department of Cognitive Neuropsychology, Max Planck Institute for Empirical Aesthetics, Frankfurt am Main, Germany.
- Max Planck NYU Center for Language, Music, and Emotion, New York, NY, USA.
| |
Collapse
|
7
|
Inbar M, Genzer S, Perry A, Grossman E, Landau AN. Intonation Units in Spontaneous Speech Evoke a Neural Response. J Neurosci 2023; 43:8189-8200. [PMID: 37793909 PMCID: PMC10697392 DOI: 10.1523/jneurosci.0235-23.2023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/16/2023] [Accepted: 08/29/2023] [Indexed: 10/06/2023] Open
Abstract
Spontaneous speech is produced in chunks called intonation units (IUs). IUs are defined by a set of prosodic cues and presumably occur in all human languages. Recent work has shown that across different grammatical and sociocultural conditions IUs form rhythms of ∼1 unit per second. Linguistic theory suggests that IUs pace the flow of information in the discourse. As a result, IUs provide a promising and hitherto unexplored theoretical framework for studying the neural mechanisms of communication. In this article, we identify a neural response unique to the boundary defined by the IU. We measured the EEG of human participants (of either sex), who listened to different speakers recounting an emotional life event. We analyzed the speech stimuli linguistically and modeled the EEG response at word offset using a GLM approach. We find that the EEG response to IU-final words differs from the response to IU-nonfinal words even when equating acoustic boundary strength. Finally, we relate our findings to the body of research on rhythmic brain mechanisms in speech processing. We study the unique contribution of IUs and acoustic boundary strength in predicting delta-band EEG. This analysis suggests that IU-related neural activity, which is tightly linked to the classic Closure Positive Shift (CPS), could be a time-locked component that captures the previously characterized delta-band neural speech tracking.SIGNIFICANCE STATEMENT Linguistic communication is central to human experience, and its neural underpinnings are a topic of much research in recent years. Neuroscientific research has benefited from studying human behavior in naturalistic settings, an endeavor that requires explicit models of complex behavior. Usage-based linguistic theory suggests that spoken language is prosodically structured in intonation units. We reveal that the neural system is attuned to intonation units by explicitly modeling their impact on the EEG response beyond mere acoustics. To our understanding, this is the first time this is demonstrated in spontaneous speech under naturalistic conditions and under a theoretical framework that connects the prosodic chunking of speech, on the one hand, with the flow of information during communication, on the other.
Collapse
Affiliation(s)
- Maya Inbar
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Shir Genzer
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Anat Perry
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Eitan Grossman
- Department of Linguistics, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| | - Ayelet N Landau
- Department of Psychology, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
- Department of Cognitive and Brain Sciences, Hebrew University of Jerusalem, Mount Scopus, Jerusalem 9190501, Israel
| |
Collapse
|
8
|
Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope. Neurosci Biobehav Rev 2023; 147:105111. [PMID: 36822385 DOI: 10.1016/j.neubiorev.2023.105111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/04/2022] [Accepted: 02/19/2023] [Indexed: 02/25/2023]
Abstract
The syllable is a perceptually salient unit in speech. Since both the syllable and its acoustic correlate, i.e., the speech envelope, have a preferred range of rhythmicity between 4 and 8 Hz, it is hypothesized that theta-band neural oscillations play a major role in extracting syllables based on the envelope. A literature survey, however, reveals inconsistent evidence about the relationship between speech envelope and syllables, and the current study revisits this question by analyzing large speech corpora. It is shown that the center frequency of speech envelope, characterized by the modulation spectrum, reliably correlates with the rate of syllables only when the analysis is pooled over minutes of speech recordings. In contrast, in the time domain, a component of the speech envelope is reliably phase-locked to syllable onsets. Based on a speaker-independent model, the timing of syllable onsets explains about 24% variance of the speech envelope. These results indicate that local features in the speech envelope, instead of the modulation spectrum, are a more reliable acoustic correlate of syllables.
Collapse
|
9
|
Snapiri L, Kaplan Y, Shalev N, Landau AN. Rhythmic modulation of visual discrimination is linked to individuals' spontaneous motor tempo. Eur J Neurosci 2023; 57:646-656. [PMID: 36512369 DOI: 10.1111/ejn.15898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/17/2022] [Accepted: 12/01/2022] [Indexed: 12/15/2022]
Abstract
The impact of external rhythmic structure on perception has been demonstrated across different modalities and experimental paradigms. However, recent findings emphasize substantial individual differences in rhythm-based perceptual modulation. Here, we examine the link between spontaneous rhythmic preferences, as measured through the motor system, and individual differences in rhythmic modulation of visual discrimination. As a first step, we measure individual rhythmic preferences using the spontaneous tapping task. Then we assess perceptual rhythmic modulation using a visual discrimination task in which targets can appear either in-phase or out-of-phase with a preceding rhythmic stream of visual stimuli. The tempo of the preceding stream was manipulated over different experimental blocks (0.77 Hz, 1.4 Hz, 2 Hz). We find that visual rhythmic stimulation modulates discrimination performance. The modulation is dependent on the tempo of stimulation, with maximal perceptual benefits for the slowest tempo of stimulation (0.77 Hz). Most importantly, the strength of modulation is also linked to individuals' spontaneous motor tempo. Individuals with slower spontaneous tempi show greater rhythmic modulation compared to individuals with faster spontaneous tempi. This finding suggests that different tempi affect the cognitive system with varying levels of efficiency and that self-generated rhythms impact our ability to utilize rhythmic structure in the environment for guiding perception and performance.
Collapse
Affiliation(s)
- Leah Snapiri
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yael Kaplan
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nir Shalev
- Department of Experimental Psychology, University of Oxford, Oxford, UK
| | - Ayelet N Landau
- Department of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel.,Department of Cognitive Science, The Hebrew University of Jerusalem, Jerusalem, Israel
| |
Collapse
|
10
|
Pérez-Navarro J, Lallier M, Clark C, Flanagan S, Goswami U. Local Temporal Regularities in Child-Directed Speech in Spanish. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3776-3788. [PMID: 36194778 DOI: 10.1044/2022_jslhr-22-00111] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
PURPOSE The purpose of this study is to characterize the local (utterance-level) temporal regularities of child-directed speech (CDS) that might facilitate phonological development in Spanish, classically termed a syllable-timed language. METHOD Eighteen female adults addressed their 4-year-old children versus other adults spontaneously and also read aloud (CDS vs. adult-directed speech [ADS]). We compared CDS and ADS speech productions using a spectrotemporal model (Leong & Goswami, 2015), obtaining three temporal metrics: (a) distribution of modulation energy, (b) temporal regularity of stressed syllables, and (c) syllable rate. RESULTS CDS was characterized by (a) significantly greater modulation energy in the lower frequencies (0.5-4 Hz), (b) more regular rhythmic occurrence of stressed syllables, and (c) a slower syllable rate than ADS, across both spontaneous and read conditions. DISCUSSION CDS is characterized by a robust local temporal organization (i.e., within utterances) with amplitude modulation bands aligning with delta and theta electrophysiological frequency bands, respectively, showing greater phase synchronization than in ADS, facilitating parsing of stress units and syllables. These temporal regularities, together with the slower rate of production of CDS, might support the automatic extraction of phonological units in speech and hence support the phonological development of children. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.21210893.
Collapse
Affiliation(s)
- Jose Pérez-Navarro
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- University of the Basque Country UPV/EHU, Donostia-San Sebastián, Spain
| | - Marie Lallier
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
| | - Catherine Clark
- BCBL, Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- University of the Basque Country UPV/EHU, Donostia-San Sebastián, Spain
| | - Sheila Flanagan
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, United Kingdom
| | - Usha Goswami
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, United Kingdom
| |
Collapse
|
11
|
Modulation transfer functions for audiovisual speech. PLoS Comput Biol 2022; 18:e1010273. [PMID: 35852989 PMCID: PMC9295967 DOI: 10.1371/journal.pcbi.1010273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 06/01/2022] [Indexed: 11/19/2022] Open
Abstract
Temporal synchrony between facial motion and acoustic modulations is a hallmark feature of audiovisual speech. The moving face and mouth during natural speech is known to be correlated with low-frequency acoustic envelope fluctuations (below 10 Hz), but the precise rates at which envelope information is synchronized with motion in different parts of the face are less clear. Here, we used regularized canonical correlation analysis (rCCA) to learn speech envelope filters whose outputs correlate with motion in different parts of the speakers face. We leveraged recent advances in video-based 3D facial landmark estimation allowing us to examine statistical envelope-face correlations across a large number of speakers (∼4000). Specifically, rCCA was used to learn modulation transfer functions (MTFs) for the speech envelope that significantly predict correlation with facial motion across different speakers. The AV analysis revealed bandpass speech envelope filters at distinct temporal scales. A first set of MTFs showed peaks around 3-4 Hz and were correlated with mouth movements. A second set of MTFs captured envelope fluctuations in the 1-2 Hz range correlated with more global face and head motion. These two distinctive timescales emerged only as a property of natural AV speech statistics across many speakers. A similar analysis of fewer speakers performing a controlled speech task highlighted only the well-known temporal modulations around 4 Hz correlated with orofacial motion. The different bandpass ranges of AV correlation align notably with the average rates at which syllables (3-4 Hz) and phrases (1-2 Hz) are produced in natural speech. Whereas periodicities at the syllable rate are evident in the envelope spectrum of the speech signal itself, slower 1-2 Hz regularities thus only become prominent when considering crossmodal signal statistics. This may indicate a motor origin of temporal regularities at the timescales of syllables and phrases in natural speech.
Collapse
|
12
|
Anurova I, Vetchinnikova S, Dobrego A, Williams N, Mikusova N, Suni A, Mauranen A, Palva S. Event-related responses reflect chunk boundaries in natural speech. Neuroimage 2022; 255:119203. [PMID: 35413442 DOI: 10.1016/j.neuroimage.2022.119203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 03/22/2022] [Accepted: 04/08/2022] [Indexed: 10/18/2022] Open
Abstract
Chunking language has been proposed to be vital for comprehension enabling the extraction of meaning from a continuous stream of speech. However, neurocognitive mechanisms of chunking are poorly understood. The present study investigated neural correlates of chunk boundaries intuitively identified by listeners in natural speech drawn from linguistic corpora using magneto- and electroencephalography (MEEG). In a behavioral experiment, subjects marked chunk boundaries in the excerpts intuitively, which revealed highly consistent chunk boundary markings across the subjects. We next recorded brain activity to investigate whether chunk boundaries with high and medium agreement rates elicit distinct evoked responses compared to non-boundaries. Pauses placed at chunk boundaries elicited a closure positive shift with the sources over bilateral auditory cortices. In contrast, pauses placed within a chunk were perceived as interruptions and elicited a biphasic emitted potential with sources located in the bilateral primary and non-primary auditory areas with right-hemispheric dominance, and in the right inferior frontal cortex. Furthermore, pauses placed at stronger boundaries elicited earlier and more prominent activation over the left hemisphere suggesting that brain responses to chunk boundaries of natural speech can be modulated by the relative strength of different linguistic cues, such as syntactic structure and prosody.
Collapse
Affiliation(s)
- Irina Anurova
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; BioMag Laboratory, HUS Medical Imaging Center, Helsinki, Finland.
| | | | | | - Nitin Williams
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; Department of Languages, University of Helsinki, Finland
| | - Nina Mikusova
- Department of Languages, University of Helsinki, Finland
| | - Antti Suni
- Department of Languages, University of Helsinki, Finland
| | - Anna Mauranen
- Department of Languages, University of Helsinki, Finland
| | - Satu Palva
- Helsinki Institute of Life Sciences, Neuroscience Center, University of Helsinki, Finland; Centre for Cognitive Neuroscience, Institute of Neuroscience and Psychology, University of Glasgow, United Kingdom.
| |
Collapse
|
13
|
Acoustically Driven Cortical δ Oscillations Underpin Prosodic Chunking. eNeuro 2021; 8:ENEURO.0562-20.2021. [PMID: 34083380 PMCID: PMC8272402 DOI: 10.1523/eneuro.0562-20.2021] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 05/05/2021] [Accepted: 05/09/2021] [Indexed: 12/21/2022] Open
Abstract
Oscillation-based models of speech perception postulate a cortical computational principle by which decoding is performed within a window structure derived by a segmentation process. Segmentation of syllable-size chunks is realized by a θ oscillator. We provide evidence for an analogous role of a δ oscillator in the segmentation of phrase-sized chunks. We recorded magnetoencephalography (MEG) in humans, while participants performed a target identification task. Random-digit strings, with phrase-long chunks of two digits, were presented at chunk rates of 1.8 or 2.6 Hz, inside or outside the δ frequency band (defined here to be 0.5–2 Hz). Strong periodicities were elicited by chunk rates inside of δ in superior, middle temporal areas and speech-motor integration areas. Periodicities were diminished or absent for chunk rates outside δ, in line with behavioral performance. Our findings show that prosodic chunking of phrase-sized acoustic segments is correlated with acoustic-driven δ oscillations, expressing anatomically specific patterns of neuronal periodicities.
Collapse
|
14
|
Biron T, Baum D, Freche D, Matalon N, Ehrmann N, Weinreb E, Biron D, Moses E. Automatic detection of prosodic boundaries in spontaneous speech. PLoS One 2021; 16:e0250969. [PMID: 33939754 PMCID: PMC8092678 DOI: 10.1371/journal.pone.0250969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 04/16/2021] [Indexed: 11/19/2022] Open
Abstract
Automatic speech recognition (ASR) and natural language processing (NLP) are expected to benefit from an effective, simple, and reliable method to automatically parse conversational speech. The ability to parse conversational speech depends crucially on the ability to identify boundaries between prosodic phrases. This is done naturally by the human ear, yet has proved surprisingly difficult to achieve reliably and simply in an automatic manner. Efforts to date have focused on detecting phrase boundaries using a variety of linguistic and acoustic cues. We propose a method which does not require model training and utilizes two prosodic cues that are based on ASR output. Boundaries are identified using discontinuities in speech rate (pre-boundary lengthening and phrase-initial acceleration) and silent pauses. The resulting phrases preserve syntactic validity, exhibit pitch reset, and compare well with manual tagging of prosodic boundaries. Collectively, our findings support the notion of prosodic phrases that represent coherent patterns across textual and acoustic parameters.
Collapse
Affiliation(s)
- Tirza Biron
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Daniel Baum
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Dominik Freche
- Sagol Center for Brain and Mind, Interdisciplinary Center, Herzliya, Israel
| | - Nadav Matalon
- Department of Linguistics, The Hebrew University, Jerusalem, Israel
| | - Netanel Ehrmann
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Eyal Weinreb
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - David Biron
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| | - Elisha Moses
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|