1
|
Lo CW, Meyer L. Chunk boundaries disrupt dependency processing in an AG: Reconciling incremental processing and discrete sampling. PLoS One 2024; 19:e0305333. [PMID: 38889141 PMCID: PMC11185458 DOI: 10.1371/journal.pone.0305333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 05/29/2024] [Indexed: 06/20/2024] Open
Abstract
Language is rooted in our ability to compose: We link words together, fusing their meanings. Links are not limited to neighboring words but often span intervening words. The ability to process these non-adjacent dependencies (NADs) conflicts with the brain's sampling of speech: We consume speech in chunks that are limited in time, containing only a limited number of words. It is unknown how we link words together that belong to separate chunks. Here, we report that we cannot-at least not so well. In our electroencephalography (EEG) study, 37 human listeners learned chunks and dependencies from an artificial grammar (AG) composed of syllables. Multi-syllable chunks to be learned were equal-sized, allowing us to employ a frequency-tagging approach. On top of chunks, syllable streams contained NADs that were either confined to a single chunk or crossed a chunk boundary. Frequency analyses of the EEG revealed a spectral peak at the chunk rate, showing that participants learned the chunks. NADs that cross boundaries were associated with smaller electrophysiological responses than within-chunk NADs. This shows that NADs are processed readily when they are confined to the same chunk, but not as well when crossing a chunk boundary. Our findings help to reconcile the classical notion that language is processed incrementally with recent evidence for discrete perceptual sampling of speech. This has implications for language acquisition and processing as well as for the general view of syntax in human language.
Collapse
Affiliation(s)
- Chia-Wen Lo
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
| | - Lars Meyer
- Research Group Language Cycles, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
- University Clinic Münster, Münster, Germany
| |
Collapse
|
2
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
3
|
Jalilpour Monesi M, Vanthornhout J, Francart T, Van Hamme H. The role of vowel and consonant onsets in neural tracking of natural speech. J Neural Eng 2024; 21:016002. [PMID: 38205849 DOI: 10.1088/1741-2552/ad1784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Accepted: 12/20/2023] [Indexed: 01/12/2024]
Abstract
Objective. To investigate how the auditory system processes natural speech, models have been created to relate the electroencephalography (EEG) signal of a person listening to speech to various representations of the speech. Mainly the speech envelope has been used, but also phonetic representations. We investigated to which degree of granularity phonetic representations can be related to the EEG signal.Approach. We used recorded EEG signals from 105 subjects while they listened to fairy tale stories. We utilized speech representations, including onset of any phone, vowel-consonant onsets, broad phonetic class (BPC) onsets, and narrow phonetic class onsets, and related them to EEG using forward modeling and match-mismatch tasks. In forward modeling, we used a linear model to predict EEG from speech representations. In the match-mismatch task, we trained a long short term memory based model to determine which of two candidate speech segments matches with a given EEG segment.Main results. Our results show that vowel-consonant onsets outperform onsets of any phone in both tasks, which suggests that neural tracking of the vowel vs. consonant exists in the EEG to some degree. We also observed that vowel (syllable nucleus) onsets exhibit a more consistent representation in EEG compared to syllable onsets.Significance. Finally, our findings suggest that neural tracking previously thought to be associated with BPCs might actually originate from vowel-consonant onsets rather than the differentiation between different phonetic classes.
Collapse
Affiliation(s)
- Mohammad Jalilpour Monesi
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | | | - Tom Francart
- Department Neurosciences, ExpORL, KU Leuven, Leuven, Belgium
| | - Hugo Van Hamme
- Department of Electrical Engineering (ESAT), PSI, KU Leuven, Leuven, Belgium
| |
Collapse
|
4
|
Liu W, Pan X, Zhou X. The Temporal Dynamics of Stop Consonant Perception: Evidence from Context Effects. LANGUAGE AND SPEECH 2023; 66:1046-1055. [PMID: 36775903 DOI: 10.1177/00238309231153355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Empirical evidence and theoretical models suggest that phonetic category perception involves two stages of auditory and phonetic processing. However, few studies examined the time course of these two processing stages. With brief stop consonant segments as context stimuli, this study examined the temporal dynamics of stop consonant perception by varying the inter-stimulus interval between context and target stimuli. The results suggest that phonetic category activation of stop consonants may appear before 100 ms of processing time. Furthermore, the activation of phonetic categories resulted in contrast context effects on identifying the target stop continuum; the auditory processing of stop consonants resulted in a different context effect from those caused by phonetic category activation. The findings provide further evidence for the two-stage model of speech perception and reveal the time course of auditory and phonetic processing.
Collapse
Affiliation(s)
- Wenli Liu
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| | - Xiaoguang Pan
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| | - Xiang Zhou
- Department of Social Psychology, Zhou Enlai School of Government, Nankai University, China
| |
Collapse
|
5
|
Kurteff GL, Lester-Smith RA, Martinez A, Currens N, Holder J, Villarreal C, Mercado VR, Truong C, Huber C, Pokharel P, Hamilton LS. Speaker-induced Suppression in EEG during a Naturalistic Reading and Listening Task. J Cogn Neurosci 2023; 35:1538-1556. [PMID: 37584593 DOI: 10.1162/jocn_a_02037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Speaking elicits a suppressed neural response when compared with listening to others' speech, a phenomenon known as speaker-induced suppression (SIS). Previous research has focused on investigating SIS at constrained levels of linguistic representation, such as the individual phoneme and word level. Here, we present scalp EEG data from a dual speech perception and production task where participants read sentences aloud then listened to playback of themselves reading those sentences. Playback was separated into immediate repetition of the previous trial and randomized repetition of a former trial to investigate if forward modeling of responses during passive listening suppresses the neural response. Concurrent EMG was recorded to control for movement artifact during speech production. In line with previous research, ERP analyses at the sentence level demonstrated suppression of early auditory components of the EEG for production compared with perception. To evaluate whether linguistic abstractions (in the form of phonological feature tuning) are suppressed during speech production alongside lower-level acoustic information, we fit linear encoding models that predicted scalp EEG based on phonological features, EMG activity, and task condition. We found that phonological features were encoded similarly between production and perception. However, this similarity was only observed when controlling for movement by using the EMG response as an additional regressor. Our results suggest that SIS operates at a sensory representational level and is dissociated from higher order cognitive and linguistic processing that takes place during speech perception and production. We also detail some important considerations when analyzing EEG during continuous speech production.
Collapse
|
6
|
Stephen EP, Li Y, Metzger S, Oganian Y, Chang EF. Latent neural dynamics encode temporal context in speech. Hear Res 2023; 437:108838. [PMID: 37441880 PMCID: PMC11182421 DOI: 10.1016/j.heares.2023.108838] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 06/15/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023]
Abstract
Direct neural recordings from human auditory cortex have demonstrated encoding for acoustic-phonetic features of consonants and vowels. Neural responses also encode distinct acoustic amplitude cues related to timing, such as those that occur at the onset of a sentence after a silent period or the onset of the vowel in each syllable. Here, we used a group reduced rank regression model to show that distributed cortical responses support a low-dimensional latent state representation of temporal context in speech. The timing cues each capture more unique variance than all other phonetic features and exhibit rotational or cyclical dynamics in latent space from activity that is widespread over the superior temporal gyrus. We propose that these spatially distributed timing signals could serve to provide temporal context for, and possibly bind across time, the concurrent processing of individual phonetic features, to compose higher-order phonological (e.g. word-level) representations.
Collapse
Affiliation(s)
- Emily P Stephen
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; Department of Mathematics and Statistics, Boston University, Boston, MA 02215, United States
| | - Yuanning Li
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; School of Biomedical Engineering, ShanghaiTech University, Shanghai, China
| | - Sean Metzger
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States
| | - Yulia Oganian
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States; Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany
| | - Edward F Chang
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA 94143, United States.
| |
Collapse
|
7
|
Jeon MJ, Woo J. Effect of speech-stimulus degradation on phoneme-related potential. PLoS One 2023; 18:e0287584. [PMID: 37352220 PMCID: PMC10289326 DOI: 10.1371/journal.pone.0287584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 06/08/2023] [Indexed: 06/25/2023] Open
Abstract
Auditory evoked potential (AEP) has been used to evaluate the degree of hearing and speech cognition. Because AEP generates a very small voltage relative to ambient noise, a repetitive presentation of a stimulus, such as a tone, word, or short sentence, should be employed to generate ensemble averages over trials. However, the stimulation of repetitive short words and sentences may present an unnatural situation to a subject. Phoneme-related potentials (PRPs), which are evoked-responses to typical phonemic stimuli, can be extracted from electroencephalography (EEG) data in response to a continuous storybook. In this study, we investigated the effects of spectrally degraded speech stimuli on PRPs. The EEG data in response to the spectrally degraded and natural storybooks were recorded from normal listeners, and the PRP components for 10 vowels and 12 consonants were extracted. The PRP responses to a vocoded (spectrally-degraded) storybook showed a statistically significant lower peak amplitude and were prolonged compared with those of a natural storybook. The findings in this study suggest that PRPs can be considered a potential tool to evaluate hearing and speech cognition as other AEPs. Moreover, PRPs can provide the details of phonological processing and phonemic awareness to understand poor speech intelligibility. Further investigation with the hearing impaired is required prior to clinical application.
Collapse
Affiliation(s)
- Min-Jae Jeon
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea
| | - Jihwan Woo
- Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan, Republic of Korea
- Department of Biomedical Engineering, University of Ulsan, Ulsan, Republic of Korea
| |
Collapse
|
8
|
Carta S, Mangiacotti AMA, Valdes AL, Reilly RB, Franco F, Di Liberto GM. The impact of temporal synchronisation imprecision on TRF analyses. J Neurosci Methods 2023; 385:109765. [PMID: 36481165 DOI: 10.1016/j.jneumeth.2022.109765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 11/17/2022] [Accepted: 12/02/2022] [Indexed: 12/12/2022]
Affiliation(s)
- Sara Carta
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland
| | - Anthony M A Mangiacotti
- Department of Psychology, Middlesex University, London, United Kingdom; FISPPA Department, University of Padova, Padova, Italy
| | - Alejandro Lopez Valdes
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Global Brain Health Institute, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland
| | - Richard B Reilly
- Trinity Centre for Biomedical Engineering, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland; School of Engineering, Trinity College, The University of Dublin, Ireland; School of Medicine, Trinity College, The University of Dublin, Ireland
| | - Fabia Franco
- Department of Psychology, Middlesex University, London, United Kingdom
| | - Giovanni M Di Liberto
- ADAPT Centre, Trinity College, The University of Dublin, Ireland; School of Computer Science and Statistics, Trinity College, The University of Dublin, Ireland; Trinity College Institute of Neuroscience, Trinity College, The University of Dublin, Ireland.
| |
Collapse
|
9
|
Desai M, Field AM, Hamilton LS. Dataset size considerations for robust acoustic and phonetic speech encoding models in EEG. Front Hum Neurosci 2023; 16:1001171. [PMID: 36741776 PMCID: PMC9895838 DOI: 10.3389/fnhum.2022.1001171] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 12/22/2022] [Indexed: 01/21/2023] Open
Abstract
In many experiments that investigate auditory and speech processing in the brain using electroencephalography (EEG), the experimental paradigm is often lengthy and tedious. Typically, the experimenter errs on the side of including more data, more trials, and therefore conducting a longer task to ensure that the data are robust and effects are measurable. Recent studies used naturalistic stimuli to investigate the brain's response to individual or a combination of multiple speech features using system identification techniques, such as multivariate temporal receptive field (mTRF) analyses. The neural data collected from such experiments must be divided into a training set and a test set to fit and validate the mTRF weights. While a good strategy is clearly to collect as much data as is feasible, it is unclear how much data are needed to achieve stable results. Furthermore, it is unclear whether the specific stimulus used for mTRF fitting and the choice of feature representation affects how much data would be required for robust and generalizable results. Here, we used previously collected EEG data from our lab using sentence stimuli and movie stimuli as well as EEG data from an open-source dataset using audiobook stimuli to better understand how much data needs to be collected for naturalistic speech experiments measuring acoustic and phonetic tuning. We found that the EEG receptive field structure tested here stabilizes after collecting a training dataset of approximately 200 s of TIMIT sentences, around 600 s of movie trailers training set data, and approximately 460 s of audiobook training set data. Thus, we provide suggestions on the minimum amount of data that would be necessary for fitting mTRFs from naturalistic listening data. Our findings are motivated by highly practical concerns when working with children, patient populations, or others who may not tolerate long study sessions. These findings will aid future researchers who wish to study naturalistic speech processing in healthy and clinical populations while minimizing participant fatigue and retaining signal quality.
Collapse
Affiliation(s)
- Maansi Desai
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Alyssa M. Field
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States
| | - Liberty S. Hamilton
- Department of Speech, Language, and Hearing Sciences, Moody College of Communication, The University of Texas at Austin, Austin, TX, United States,Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States,*Correspondence: Liberty S. Hamilton ✉
| |
Collapse
|
10
|
Neural dynamics of phoneme sequences reveal position-invariant code for content and order. Nat Commun 2022; 13:6606. [PMID: 36329058 PMCID: PMC9633780 DOI: 10.1038/s41467-022-34326-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2020] [Accepted: 10/19/2022] [Indexed: 11/06/2022] Open
Abstract
Speech consists of a continuously-varying acoustic signal. Yet human listeners experience it as sequences of discrete speech sounds, which are used to recognise discrete words. To examine how the human brain appropriately sequences the speech signal, we recorded two-hour magnetoencephalograms from 21 participants listening to short narratives. Our analyses show that the brain continuously encodes the three most recently heard speech sounds in parallel, and maintains this information long past its dissipation from the sensory input. Each speech sound representation evolves over time, jointly encoding both its phonetic features and the amount of time elapsed since onset. As a result, this dynamic neural pattern encodes both the relative order and phonetic content of the speech sequence. These representations are active earlier when phonemes are more predictable, and are sustained longer when lexical identity is uncertain. Our results show how phonetic sequences in natural speech are represented at the level of populations of neurons, providing insight into what intermediary representations exist between the sensory input and sub-lexical units. The flexibility in the dynamics of these representations paves the way for further understanding of how such sequences may be used to interface with higher order structure such as lexical identity.
Collapse
|
11
|
Maggu AR. Auditory Evoked Potentials in Communication Disorders: An Overview of Past, Present, and Future. Semin Hear 2022; 43:137-148. [PMID: 36313051 PMCID: PMC9605805 DOI: 10.1055/s-0042-1756160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023] Open
Abstract
This article provides a brief overview of auditory evoked potentials (AEPs) and their application in the areas of research and clinics within the field of communication disorders. The article begins with providing a historical perspective within the context of the key scientific developments that led to the emergence of numerous types of AEPs. Furthermore, the article discusses the different AEP techniques in the light of their feasibility in clinics. As AEPs, because of their versatility, find their use across disciplines, this article also discusses some of the research questions that are currently being addressed using AEP techniques in the field of communication disorders and beyond. At the end, this article summarizes the shortcomings of the existing AEP techniques and provides a general perspective toward the future directions. The article is aimed at a broad readership including (but not limited to) students, clinicians, and researchers. Overall, this article may act as a brief primer for the new AEP users, and as an overview of the progress in the field of AEPs along with future directions, for those who already use AEPs on a routine basis.
Collapse
Affiliation(s)
- Akshay R. Maggu
- Department of Speech-Language-Hearing Sciences, Hofstra University, Hempstead, New York
| |
Collapse
|
12
|
Bai F, Meyer AS, Martin AE. Neural dynamics differentially encode phrases and sentences during spoken language comprehension. PLoS Biol 2022; 20:e3001713. [PMID: 35834569 PMCID: PMC9282610 DOI: 10.1371/journal.pbio.3001713] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 06/14/2022] [Indexed: 11/19/2022] Open
Abstract
Human language stands out in the natural world as a biological signal that uses a structured system to combine the meanings of small linguistic units (e.g., words) into larger constituents (e.g., phrases and sentences). However, the physical dynamics of speech (or sign) do not stand in a one-to-one relationship with the meanings listeners perceive. Instead, listeners infer meaning based on their knowledge of the language. The neural readouts of the perceptual and cognitive processes underlying these inferences are still poorly understood. In the present study, we used scalp electroencephalography (EEG) to compare the neural response to phrases (e.g., the red vase) and sentences (e.g., the vase is red), which were close in semantic meaning and had been synthesized to be physically indistinguishable. Differences in structure were well captured in the reorganization of neural phase responses in delta (approximately <2 Hz) and theta bands (approximately 2 to 7 Hz),and in power and power connectivity changes in the alpha band (approximately 7.5 to 13.5 Hz). Consistent with predictions from a computational model, sentences showed more power, more power connectivity, and more phase synchronization than phrases did. Theta–gamma phase–amplitude coupling occurred, but did not differ between the syntactic structures. Spectral–temporal response function (STRF) modeling revealed different encoding states for phrases and sentences, over and above the acoustically driven neural response. Our findings provide a comprehensive description of how the brain encodes and separates linguistic structures in the dynamics of neural responses. They imply that phase synchronization and strength of connectivity are readouts for the constituent structure of language. The results provide a novel basis for future neurophysiological research on linguistic structure representation in the brain, and, together with our simulations, support time-based binding as a mechanism of structure encoding in neural dynamics.
Collapse
Affiliation(s)
- Fan Bai
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Antje S. Meyer
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Andrea E. Martin
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands
- * E-mail:
| |
Collapse
|
13
|
Cucu MO, Kazanina N, Houghton C. Syllable-Initial Phonemes Affect Neural Entrainment to Consonant-Vowel Syllables. Front Neurosci 2022; 16:826105. [PMID: 35774556 PMCID: PMC9237462 DOI: 10.3389/fnins.2022.826105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 05/10/2021] [Indexed: 11/23/2022] Open
Abstract
Neural entrainment to speech appears to rely on syllabic features, especially those pertaining to the acoustic envelope of the stimuli. It has been proposed that the neural tracking of speech depends on the phoneme features. In the present electroencephalography experiment, we examined data from 25 participants to investigate neural entrainment to near-isochronous stimuli comprising syllables beginning with different phonemes. We measured the inter-trial phase coherence of neural responses to these stimuli and assessed the relationship between this coherence and acoustic properties of the stimuli designed to quantify their “edginess.” We found that entrainment was different across different classes of the syllable-initial phoneme and that entrainment depended on the amount of “edge” in the sound envelope. In particular, the best edge marker and predictor of entrainment was the latency of the maximum derivative of each syllable.
Collapse
Affiliation(s)
- M. Oana Cucu
- Department of Computer Science, University of Bristol, Bristol, United Kingdom
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- *Correspondence: M. Oana Cucu
| | - Nina Kazanina
- School of Psychological Sciences, University of Bristol, Bristol, United Kingdom
- International Laboratory of Social Neurobiology, Institute for Cognitive Neuroscience, National Research University Higher School of Economics, HSE University, Moscow, Russia
| | - Conor Houghton
- Department of Computer Science, University of Bristol, Bristol, United Kingdom
| |
Collapse
|
14
|
Teoh ES, Ahmed F, Lalor EC. Attention Differentially Affects Acoustic and Phonetic Feature Encoding in a Multispeaker Environment. J Neurosci 2022; 42:682-691. [PMID: 34893546 PMCID: PMC8805628 DOI: 10.1523/jneurosci.1455-20.2021] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/21/2022] Open
Abstract
Humans have the remarkable ability to selectively focus on a single talker in the midst of other competing talkers. The neural mechanisms that underlie this phenomenon remain incompletely understood. In particular, there has been longstanding debate over whether attention operates at an early or late stage in the speech processing hierarchy. One way to better understand this is to examine how attention might differentially affect neurophysiological indices of hierarchical acoustic and linguistic speech representations. In this study, we do this by using encoding models to identify neural correlates of speech processing at various levels of representation. Specifically, we recorded EEG from fourteen human subjects (nine female and five male) during a "cocktail party" attention experiment. Model comparisons based on these data revealed phonetic feature processing for attended, but not unattended speech. Furthermore, we show that attention specifically enhances isolated indices of phonetic feature processing, but that such attention effects are not apparent for isolated measures of acoustic processing. These results provide new insights into the effects of attention on different prelexical representations of speech, insights that complement recent anatomic accounts of the hierarchical encoding of attended speech. Furthermore, our findings support the notion that, for attended speech, phonetic features are processed as a distinct stage, separate from the processing of the speech acoustics.SIGNIFICANCE STATEMENT Humans are very good at paying attention to one speaker in an environment with multiple speakers. However, the details of how attended and unattended speech are processed differently by the brain is not completely clear. Here, we explore how attention affects the processing of the acoustic sounds of speech as well as the mapping of those sounds onto categorical phonetic features. We find evidence of categorical phonetic feature processing for attended, but not unattended speech. Furthermore, we find evidence that categorical phonetic feature processing is enhanced by attention, but acoustic processing is not. These findings add an important new layer in our understanding of how the human brain solves the cocktail party problem.
Collapse
Affiliation(s)
- Emily S Teoh
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
| | - Farhin Ahmed
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering, and Trinity College Institute of Neuroscience, Trinity College, University of Dublin, Dublin 2, Ireland
- Department of Neuroscience, Department of Biomedical Engineering, and Del Monte Neuroscience Institute, University of Rochester, Rochester, New York 14627
| |
Collapse
|
15
|
Monahan PJ, Schertz J, Fu Z, Pérez A. Unified Coding of Spectral and Temporal Phonetic Cues: Electrophysiological Evidence for Abstract Phonological Features. J Cogn Neurosci 2022; 34:618-638. [DOI: 10.1162/jocn_a_01817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Abstract
Spoken word recognition models and phonological theory propose that abstract features play a central role in speech processing. It remains unknown, however, whether auditory cortex encodes linguistic features in a manner beyond the phonetic properties of the speech sounds themselves. We took advantage of the fact that English phonology functionally codes stops and fricatives as voiced or voiceless with two distinct phonetic cues: Fricatives use a spectral cue, whereas stops use a temporal cue. Evidence that these cues can be grouped together would indicate the disjunctive coding of distinct phonetic cues into a functionally defined abstract phonological feature. In English, the voicing feature, which distinguishes the consonants [s] and [t] from [z] and [d], respectively, is hypothesized to be specified only for voiceless consonants (e.g., [s t]). Here, participants listened to syllables in a many-to-one oddball design, while their EEG was recorded. In one block, both voiceless stops and fricatives were the standards. In the other block, both voiced stops and fricatives were the standards. A critical design element was the presence of intercategory variation within the standards. Therefore, a many-to-one relationship, which is necessary to elicit an MMN, existed only if the stop and fricative standards were grouped together. In addition to the ERPs, event-related spectral power was also analyzed. Results showed an MMN effect in the voiceless standards block—an asymmetric MMN—in a time window consistent with processing in auditory cortex, as well as increased prestimulus beta-band oscillatory power to voiceless standards. These findings suggest that (i) there is an auditory memory trace of the standards based on the shared (voiceless) feature, which is only functionally defined; (ii) voiced consonants are underspecified; and (iii) features can serve as a basis for predictive processing. Taken together, these results point toward auditory cortex's ability to functionally code distinct phonetic cues together and suggest that abstract features can be used to parse the continuous acoustic signal.
Collapse
Affiliation(s)
| | | | - Zhanao Fu
- Cambridge University, United Kingdom
| | - Alejandro Pérez
- University of Toronto Scarborough, Ontario, Canada
- Cambridge University, United Kingdom
| |
Collapse
|
16
|
Measuring context dependency in birdsong using artificial neural networks. PLoS Comput Biol 2021; 17:e1009707. [PMID: 34962915 PMCID: PMC8746767 DOI: 10.1371/journal.pcbi.1009707] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 01/10/2022] [Accepted: 12/01/2021] [Indexed: 11/19/2022] Open
Abstract
Context dependency is a key feature in sequential structures of human language, which requires reference between words far apart in the produced sequence. Assessing how long the past context has an effect on the current status provides crucial information to understand the mechanism for complex sequential behaviors. Birdsongs serve as a representative model for studying the context dependency in sequential signals produced by non-human animals, while previous reports were upper-bounded by methodological limitations. Here, we newly estimated the context dependency in birdsongs in a more scalable way using a modern neural-network-based language model whose accessible context length is sufficiently long. The detected context dependency was beyond the order of traditional Markovian models of birdsong, but was consistent with previous experimental investigations. We also studied the relation between the assumed/auto-detected vocabulary size of birdsong (i.e., fine- vs. coarse-grained syllable classifications) and the context dependency. It turned out that the larger vocabulary (or the more fine-grained classification) is assumed, the shorter context dependency is detected.
Collapse
|
17
|
Palana J, Schwartz S, Tager-Flusberg H. Evaluating the Use of Cortical Entrainment to Measure Atypical Speech Processing: A Systematic Review. Neurosci Biobehav Rev 2021; 133:104506. [PMID: 34942267 DOI: 10.1016/j.neubiorev.2021.12.029] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Revised: 12/12/2021] [Accepted: 12/18/2021] [Indexed: 11/30/2022]
Abstract
BACKGROUND Cortical entrainment has emerged as promising means for measuring continuous speech processing in young, neurotypical adults. However, its utility for capturing atypical speech processing has not been systematically reviewed. OBJECTIVES Synthesize evidence regarding the merit of measuring cortical entrainment to capture atypical speech processing and recommend avenues for future research. METHOD We systematically reviewed publications investigating entrainment to continuous speech in populations with auditory processing differences. RESULTS In the 25 publications reviewed, most studies were conducted on older and/or hearing-impaired adults, for whom slow-wave entrainment to speech was often heightened compared to controls. Research conducted on populations with neurodevelopmental disorders, in whom slow-wave entrainment was often reduced, was less common. Across publications, findings highlighted associations between cortical entrainment and speech processing performance differences. CONCLUSIONS Measures of cortical entrainment offer useful means of capturing speech processing differences and future research should leverage them more extensively when studying populations with neurodevelopmental disorders.
Collapse
Affiliation(s)
- Joseph Palana
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA; Laboratories of Cognitive Neuroscience, Division of Developmental Medicine, Harvard Medical School, Boston Children's Hospital, 1 Autumn Street, Boston, MA, 02215, USA
| | - Sophie Schwartz
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA
| | - Helen Tager-Flusberg
- Department of Psychological and Brain Sciences, Boston University, 64 Cummington Mall, Boston, MA, 02215, USA.
| |
Collapse
|
18
|
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli. J Neurosci 2021; 41:8946-8962. [PMID: 34503996 DOI: 10.1523/jneurosci.2891-20.2021] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Revised: 08/24/2021] [Accepted: 08/29/2021] [Indexed: 11/21/2022] Open
Abstract
In natural conversations, listeners must attend to what others are saying while ignoring extraneous background sounds. Recent studies have used encoding models to predict electroencephalography (EEG) responses to speech in noise-free listening situations, sometimes referred to as "speech tracking." Researchers have analyzed how speech tracking changes with different types of background noise. It is unclear, however, whether neural responses from acoustically rich, naturalistic environments with and without background noise can be generalized to more controlled stimuli. If encoding models for acoustically rich, naturalistic stimuli are generalizable to other tasks, this could aid in data collection from populations of individuals who may not tolerate listening to more controlled and less engaging stimuli for long periods of time. We recorded noninvasive scalp EEG while 17 human participants (8 male/9 female) listened to speech without noise and audiovisual speech stimuli containing overlapping speakers and background sounds. We fit multivariate temporal receptive field encoding models to predict EEG responses to pitch, the acoustic envelope, phonological features, and visual cues in both stimulus conditions. Our results suggested that neural responses to naturalistic stimuli were generalizable to more controlled datasets. EEG responses to speech in isolation were predicted accurately using phonological features alone, while responses to speech in a rich acoustic background were more accurate when including both phonological and acoustic features. Our findings suggest that naturalistic audiovisual stimuli can be used to measure receptive fields that are comparable and generalizable to more controlled audio-only stimuli.SIGNIFICANCE STATEMENT Understanding spoken language in natural environments requires listeners to parse acoustic and linguistic information in the presence of other distracting stimuli. However, most studies of auditory processing rely on highly controlled stimuli with no background noise, or with background noise inserted at specific times. Here, we compare models where EEG data are predicted based on a combination of acoustic, phonetic, and visual features in highly disparate stimuli-sentences from a speech corpus and speech embedded within movie trailers. We show that modeling neural responses to highly noisy, audiovisual movies can uncover tuning for acoustic and phonetic information that generalizes to simpler stimuli typically used in sensory neuroscience experiments.
Collapse
|
19
|
Lim SJ, Carter YD, Njoroge JM, Shinn-Cunningham BG, Perrachione TK. Talker discontinuity disrupts attention to speech: Evidence from EEG and pupillometry. BRAIN AND LANGUAGE 2021; 221:104996. [PMID: 34358924 PMCID: PMC8515637 DOI: 10.1016/j.bandl.2021.104996] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 07/11/2021] [Accepted: 07/13/2021] [Indexed: 05/13/2023]
Abstract
Speech is processed less efficiently from discontinuous, mixed talkers than one consistent talker, but little is known about the neural mechanisms for processing talker variability. Here, we measured psychophysiological responses to talker variability using electroencephalography (EEG) and pupillometry while listeners performed a delayed recall of digit span task. Listeners heard and recalled seven-digit sequences with both talker (single- vs. mixed-talker digits) and temporal (0- vs. 500-ms inter-digit intervals) discontinuities. Talker discontinuity reduced serial recall accuracy. Both talker and temporal discontinuities elicited P3a-like neural evoked response, while rapid processing of mixed-talkers' speech led to increased phasic pupil dilation. Furthermore, mixed-talkers' speech produced less alpha oscillatory power during working memory maintenance, but not during speech encoding. Overall, these results are consistent with an auditory attention and streaming framework in which talker discontinuity leads to involuntary, stimulus-driven attentional reorientation to novel speech sources, resulting in the processing interference classically associated with talker variability.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| | - Yaminah D Carter
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | - J Michelle Njoroge
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | | | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| |
Collapse
|
20
|
Learning nonnative speech sounds changes local encoding in the adult human cortex. Proc Natl Acad Sci U S A 2021; 118:2101777118. [PMID: 34475209 DOI: 10.1073/pnas.2101777118] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2021] [Accepted: 07/12/2021] [Indexed: 11/18/2022] Open
Abstract
Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy ("learning effect"), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.
Collapse
|
21
|
Abstract
Creating invariant representations from an everchanging speech signal is a major challenge for the human brain. Such an ability is particularly crucial for preverbal infants who must discover the phonological, lexical, and syntactic regularities of an extremely inconsistent signal in order to acquire language. Within the visual domain, an efficient neural solution to overcome variability consists in factorizing the input into a reduced set of orthogonal components. Here, we asked whether a similar decomposition strategy is used in early speech perception. Using a 256-channel electroencephalographic system, we recorded the neural responses of 3-mo-old infants to 120 natural consonant-vowel syllables with varying acoustic and phonetic profiles. Using multivariate pattern analyses, we show that syllables are factorized into distinct and orthogonal neural codes for consonants and vowels. Concerning consonants, we further demonstrate the existence of two stages of processing. A first phase is characterized by orthogonal and context-invariant neural codes for the dimensions of manner and place of articulation. Within the second stage, manner and place codes are integrated to recover the identity of the phoneme. We conclude that, despite the paucity of articulatory motor plans and speech production skills, pre-babbling infants are already equipped with a structured combinatorial code for speech analysis, which might account for the rapid pace of language acquisition during the first year.
Collapse
|
22
|
O'Sullivan AE, Crosse MJ, Liberto GMD, de Cheveigné A, Lalor EC. Neurophysiological Indices of Audiovisual Speech Processing Reveal a Hierarchy of Multisensory Integration Effects. J Neurosci 2021; 41:4991-5003. [PMID: 33824190 PMCID: PMC8197638 DOI: 10.1523/jneurosci.0906-20.2021] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2020] [Revised: 03/16/2021] [Accepted: 03/22/2021] [Indexed: 12/27/2022] Open
Abstract
Seeing a speaker's face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker's face provides temporal cues to auditory cortex, and articulatory information from the speaker's mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However, it remains unclear how the integration of these cues varies as a function of listening conditions. Here, we sought to provide insight on these questions by examining EEG responses in humans (males and females) to natural audiovisual (AV), audio, and visual speech in quiet and in noise. We represented our speech stimuli in terms of their spectrograms and their phonetic features and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis (CCA). The encoding of both spectrotemporal and phonetic features was shown to be more robust in AV speech responses than what would have been expected from the summation of the audio and visual speech responses, suggesting that multisensory integration occurs at both spectrotemporal and phonetic stages of speech processing. We also found evidence to suggest that the integration effects may change with listening conditions; however, this was an exploratory analysis and future work will be required to examine this effect using a within-subject design. These findings demonstrate that integration of audio and visual speech occurs at multiple stages along the speech processing hierarchy.SIGNIFICANCE STATEMENT During conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here, we examine audiovisual (AV) integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how AV integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions. These findings reveal neural indices of multisensory interactions at different stages of processing and provide support for the multistage integration framework.
Collapse
Affiliation(s)
- Aisling E O'Sullivan
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
| | - Michael J Crosse
- X, The Moonshot Factory, Mountain View, CA and Department of Neuroscience, Albert Einstein College of Medicine, Bronx, New York 10461
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, Département d'Études Cognitives, École Normale Supérieure, Paris Sciences et Lettres University, Centre National de la Recherche Scientifique, Paris 75005, France
- University College London Ear Institute, University College London, London WC1X 8EE, United Kingdom
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Biomedical Engineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering and Department of Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
23
|
Bröhl F, Kayser C. Delta/theta band EEG differentially tracks low and high frequency speech-derived envelopes. Neuroimage 2021; 233:117958. [PMID: 33744458 PMCID: PMC8204264 DOI: 10.1016/j.neuroimage.2021.117958] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 03/08/2021] [Accepted: 03/09/2021] [Indexed: 11/01/2022] Open
Abstract
The representation of speech in the brain is often examined by measuring the alignment of rhythmic brain activity to the speech envelope. To conveniently quantify this alignment (termed 'speech tracking') many studies consider the broadband speech envelope, which combines acoustic fluctuations across the spectral range. Using EEG recordings, we show that using this broadband envelope can provide a distorted picture on speech encoding. We systematically investigated the encoding of spectrally-limited speech-derived envelopes presented by individual and multiple noise carriers in the human brain. Tracking in the 1 to 6 Hz EEG bands differentially reflected low (0.2 - 0.83 kHz) and high (2.66 - 8 kHz) frequency speech-derived envelopes. This was independent of the specific carrier frequency but sensitive to attentional manipulations, and may reflect the context-dependent emphasis of information from distinct spectral ranges of the speech envelope in low frequency brain activity. As low and high frequency speech envelopes relate to distinct phonemic features, our results suggest that functionally distinct processes contribute to speech tracking in the same EEG bands, and are easily confounded when considering the broadband speech envelope.
Collapse
Affiliation(s)
- Felix Bröhl
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience, Faculty of Biology, Bielefeld University, Universitätsstr. 25, 33615 Bielefeld, Germany
| |
Collapse
|
24
|
Llanos F, German JS, Gnanateja GN, Chandrasekaran B. The neural processing of pitch accents in continuous speech. Neuropsychologia 2021; 158:107883. [PMID: 33989647 DOI: 10.1016/j.neuropsychologia.2021.107883] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 04/29/2021] [Accepted: 05/03/2021] [Indexed: 12/21/2022]
Abstract
Pitch accents are local pitch patterns that convey differences in word prominence and modulate the information structure of the discourse. Despite the importance to discourse in languages like English, neural processing of pitch accents remains understudied. The current study investigates the neural processing of pitch accents by native and non-native English speakers while they are listening to or ignoring 45 min of continuous, natural speech. Leveraging an approach used to study phonemes in natural speech, we analyzed thousands of electroencephalography (EEG) segments time-locked to pitch accents in a prosodic transcription. The optimal neural discrimination between pitch accent categories emerged at latencies between 100 and 200 ms. During these latencies, we found a strong structural alignment between neural and phonetic representations of pitch accent categories. In the same latencies, native listeners exhibited more robust processing of pitch accent contrasts than non-native listeners. However, these group differences attenuated when the speech signal was ignored. We can reliably capture the neural processing of discrete and contrastive pitch accent categories in continuous speech. Our analytic approach also captures how language-specific knowledge and selective attention influences the neural processing of pitch accent categories.
Collapse
Affiliation(s)
- Fernando Llanos
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA; Department of Linguistics, The University of Texas at Austin, Austin, TX, USA
| | - James S German
- Aix-Marseille University, CNRS, LPL, Aix-en-Provence, France
| | - G Nike Gnanateja
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
25
|
Speech Perception with Noise Vocoding and Background Noise: An EEG and Behavioral Study. J Assoc Res Otolaryngol 2021; 22:349-363. [PMID: 33851289 DOI: 10.1007/s10162-021-00787-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 01/26/2021] [Indexed: 10/21/2022] Open
Abstract
This study explored the physiological response of the human brain to degraded speech syllables. The degradation was introduced using noise vocoding and/or background noise. The goal was to identify physiological features of auditory-evoked potentials (AEPs) that may explain speech intelligibility. Ten human subjects with normal hearing participated in syllable-detection tasks, while their AEPs were recorded with 32-channel electroencephalography. Subjects were presented with six syllables in the form of consonant-vowel-consonant or vowel-consonant-vowel. Noise vocoding with 22 or 4 frequency channels was applied to the syllables. When examining the peak heights in the AEPs (P1, N1, and P2), vocoding alone showed no consistent effect. P1 was not consistently reduced by background noise, N1 was sometimes reduced by noise, and P2 was almost always highly reduced. Two other physiological metrics were examined: (1) classification accuracy of the syllables based on AEPs, which indicated whether AEPs were distinguishable for different syllables, and (2) cross-condition correlation of AEPs (rcc) between the clean and degraded speech, which indicated the brain's ability to extract speech-related features and suppress response to noise. Both metrics decreased with degraded speech quality. We further tested if the two metrics can explain cross-subject variations in their behavioral performance. A significant correlation existed for rcc, as well as classification based on early AEPs, in the fronto-central areas. Because rcc indicates similarities between clean and degraded speech, our finding suggests that high speech intelligibility may be a result of the brain's ability to ignore noise in the sound carrier and/or background.
Collapse
|
26
|
Liberto GMD, Nie J, Yeaton J, Khalighinejad B, Shamma SA, Mesgarani N. Neural representation of linguistic feature hierarchy reflects second-language proficiency. Neuroimage 2020; 227:117586. [PMID: 33346131 PMCID: PMC8527895 DOI: 10.1016/j.neuroimage.2020.117586] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 11/15/2020] [Accepted: 11/18/2020] [Indexed: 12/03/2022] Open
Abstract
Acquiring a new language requires individuals to simultaneously and gradually learn linguistic attributes on multiple levels. Here, we investigated how this learning process changes the neural encoding of natural speech by assessing the encoding of the linguistic feature hierarchy in second-language listeners. Electroencephalography (EEG) signals were recorded from native Mandarin speakers with varied English proficiency and from native English speakers while they listened to audio-stories in English. We measured the temporal response functions (TRFs) for acoustic, phonemic, phonotactic, and semantic features in individual participants and found a main effect of proficiency on linguistic encoding. This effect of second-language proficiency was particularly prominent on the neural encoding of phonemes, showing stronger encoding of “new” phonemic contrasts (i.e., English contrasts that do not exist in Mandarin) with increasing proficiency. Overall, we found that the nonnative listeners with higher proficiency levels had a linguistic feature representation more similar to that of native listeners, which enabled the accurate decoding of language proficiency. This result advances our understanding of the cortical processing of linguistic information in second-language learners and provides an objective measure of language proficiency.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France.
| | - Jingping Nie
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States
| | - Jeremy Yeaton
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France; Laboratoire de Psychologie Cognitive, UMR 7290, CNRS, France. Aix-Marseille Université, France
| | - Bahar Khalighinejad
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States
| | - Shihab A Shamma
- Laboratoire des systèmes perceptifs, Département d'études cognitives, École normale supérieure, PSL University, CNRS, 75005 Paris, France; Institute for Systems Research, Electrical and Computer Engineering, University of Maryland, College Park, USA
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia University, New York, NY, USA; Mortimer B. Zuckerman Mind Brain Behavior Institute, Columbia University, New York, NY 10027, United States.
| |
Collapse
|
27
|
Dynamic Time-Locking Mechanism in the Cortical Representation of Spoken Words. eNeuro 2020; 7:ENEURO.0475-19.2020. [PMID: 32513662 PMCID: PMC7470935 DOI: 10.1523/eneuro.0475-19.2020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Revised: 05/15/2020] [Accepted: 06/01/2020] [Indexed: 11/21/2022] Open
Abstract
Human speech has a unique capacity to carry and communicate rich meanings. However, it is not known how the highly dynamic and variable perceptual signal is mapped to existing linguistic and semantic representations. In this novel approach, we used the natural acoustic variability of sounds and mapped them to magnetoencephalography (MEG) data using physiologically-inspired machine-learning models. We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers. We discovered that dynamic time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features of speech. In contrast, time-locking was not highlighted in cortical processing of non-speech environmental sounds that conveyed the same meanings as the spoken words, including human-made sounds with temporal modulation content similar to speech. The amplitude envelope of the spoken words was particularly well reconstructed based on cortical evoked responses. Our results indicate that speech is encoded cortically with especially high temporal fidelity. This speech tracking by evoked responses may partly reflect the same underlying neural mechanism as the frequently reported entrainment of the cortical oscillations to the amplitude envelope of speech. Furthermore, the phoneme content was reflected in cortical evoked responses simultaneously with the spectrotemporal features, pointing to an instantaneous transformation of the unfolding acoustic features into linguistic representations during speech processing.
Collapse
|
28
|
Getz LM, Toscano JC. The time-course of speech perception revealed by temporally-sensitive neural measures. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2020; 12:e1541. [PMID: 32767836 DOI: 10.1002/wcs.1541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Revised: 05/28/2020] [Accepted: 06/26/2020] [Indexed: 11/07/2022]
Abstract
Recent advances in cognitive neuroscience have provided a detailed picture of the early time-course of speech perception. In this review, we highlight this work, placing it within the broader context of research on the neurobiology of speech processing, and discuss how these data point us toward new models of speech perception and spoken language comprehension. We focus, in particular, on temporally-sensitive measures that allow us to directly measure early perceptual processes. Overall, the data provide support for two key principles: (a) speech perception is based on gradient representations of speech sounds and (b) speech perception is interactive and receives input from higher-level linguistic context at the earliest stages of cortical processing. Implications for models of speech processing and the neurobiology of language more broadly are discussed. This article is categorized under: Psychology > Language Psychology > Perception and Psychophysics Neuroscience > Cognition.
Collapse
Affiliation(s)
- Laura M Getz
- Department of Psychological Sciences, University of San Diego, San Diego, California, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|
29
|
Kaya Z, Soltanipour M, Treves A. Non-hexagonal neural dynamics in vowel space. AIMS Neurosci 2020; 7:275-298. [PMID: 32995486 PMCID: PMC7519971 DOI: 10.3934/neuroscience.2020015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 07/27/2020] [Indexed: 12/02/2022] Open
Abstract
Are the grid cells discovered in rodents relevant to human cognition? Following up on two seminal studies by others, we aimed to check whether an approximate 6-fold, grid-like symmetry shows up in the cortical activity of humans who "navigate" between vowels, given that vowel space can be approximated with a continuous trapezoidal 2D manifold, spanned by the first and second formant frequencies. We created 30 vowel trajectories in the assumedly flat central portion of the trapezoid. Each of these trajectories had a duration of 240 milliseconds, with a steady start and end point on the perimeter of a "wheel". We hypothesized that if the neural representation of this "box" is similar to that of rodent grid units, there should be an at least partial hexagonal (6-fold) symmetry in the EEG response of participants who navigate it. We have not found any dominant n-fold symmetry, however, but instead, using PCAs, we find indications that the vowel representation may reflect phonetic features, as positioned on the vowel manifold. The suggestion, therefore, is that vowels are encoded in relation to their salient sensory-perceptual variables, and are not assigned to arbitrary grid-like abstract maps. Finally, we explored the relationship between the first PCA eigenvector and putative vowel attractors for native Italian speakers, who served as the subjects in our study.
Collapse
Affiliation(s)
- Zeynep Kaya
- SISSA–Cognitive Neuroscience, via Bonomea 265, 34136 Trieste, Italy
| | | | - Alessandro Treves
- SISSA–Cognitive Neuroscience, via Bonomea 265, 34136 Trieste, Italy
- NTNU–Centre for Neural Computation, Trondheim, Norway
| |
Collapse
|
30
|
Tang K, DeMille MMC, Frijters JC, Gruen JR. DCDC2 READ1 regulatory element: how temporal processing differences may shape language. Proc Biol Sci 2020; 287:20192712. [PMID: 32486976 PMCID: PMC7341942 DOI: 10.1098/rspb.2019.2712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Classic linguistic theory ascribes language change and diversity to population migrations, conquests, and geographical isolation, with the assumption that human populations have equivalent language processing abilities. We hypothesize that spectral and temporal characteristics make some consonant manners vulnerable to differences in temporal precision associated with specific population allele frequencies. To test this hypothesis, we modelled association between RU1-1 alleles of DCDC2 and manner of articulation in 51 populations spanning five continents, and adjusting for geographical proximity, and genetic and linguistic relatedness. RU1-1 alleles, acting through increased expression of DCDC2, appear to increase auditory processing precision that enhances stop-consonant discrimination, favouring retention in some populations and loss by others. These findings enhance classical linguistic theories by adding a genetic dimension, which until recently, has not been considered to be a significant catalyst for language change.
Collapse
Affiliation(s)
- Kevin Tang
- Department of Linguistics, University of Florida, Gainesville, FL 32611-5454, USA
| | - Mellissa M C DeMille
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT 06520, USA
| | - Jan C Frijters
- Child and Youth Studies, Brock University, St. Catherine's, Ontario, Canada L2S 3A1
| | - Jeffrey R Gruen
- Department of Pediatrics, Yale University School of Medicine, New Haven, CT 06520, USA.,Department of Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
| |
Collapse
|
31
|
Ortiz-Mantilla S, Realpe-Bonilla T, Benasich AA. Early Interactive Acoustic Experience with Non-speech Generalizes to Speech and Confers a Syllabic Processing Advantage at 9 Months. Cereb Cortex 2020; 29:1789-1801. [PMID: 30722000 PMCID: PMC6418390 DOI: 10.1093/cercor/bhz001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 12/04/2018] [Accepted: 01/07/2019] [Indexed: 12/19/2022] Open
Abstract
During early development, the infant brain is highly plastic and sensory experiences modulate emerging cortical maps, enhancing processing efficiency as infants set up key linguistic precursors. Early interactive acoustic experience (IAE) with spectrotemporally-modulated non-speech has been shown to facilitate optimal acoustic processing and generalizes to novel non-speech sounds at 7-months-of-age. Here we demonstrate that effects of non-speech IAE endure well beyond the immediate training period and robustly generalize to speech processing. Infants who received non-speech IAE differed at 9-months-of-age from both naïve controls and those with only passive acoustic exposure, demonstrating broad modulation of oscillatory dynamics. For the standard syllable, increased high-gamma (>70 Hz) power within auditory cortices indicates that IAE fosters native speech processing, facilitating establishment of phonemic representations. The higher left beta power seen may reflect increased linking of sensory information and corresponding articulatory patterns, while bilateral decreases in theta power suggest more mature automatized speech processing, as less neuronal resources were allocated to process syllabic information. For the deviant syllable, left-lateralized gamma (<70 Hz) enhancement suggests IAE promotes phonemic-related discrimination abilities. Theta power increases in right auditory cortex, known for favoring slow-rate decoding, implies IAE facilitates the more demanding processing of the sporadic deviant syllable.
Collapse
Affiliation(s)
- Silvia Ortiz-Mantilla
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| | - Teresa Realpe-Bonilla
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| | - April A Benasich
- Center for Molecular & Behavioral Neuroscience, Rutgers University-Newark, 197 University Avenue, Newark, NJ, USA
| |
Collapse
|
32
|
Zuk NJ, Teoh ES, Lalor EC. EEG-based classification of natural sounds reveals specialized responses to speech and music. Neuroimage 2020; 210:116558. [DOI: 10.1016/j.neuroimage.2020.116558] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 12/23/2019] [Accepted: 01/14/2020] [Indexed: 11/30/2022] Open
|
33
|
Di Liberto GM, Pelofi C, Bianco R, Patel P, Mehta AD, Herrero JL, de Cheveigné A, Shamma S, Mesgarani N. Cortical encoding of melodic expectations in human temporal cortex. eLife 2020; 9:e51784. [PMID: 32122465 PMCID: PMC7053998 DOI: 10.7554/elife.51784] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 01/20/2020] [Indexed: 01/14/2023] Open
Abstract
Humans engagement in music rests on underlying elements such as the listeners' cultural background and interest in music. These factors modulate how listeners anticipate musical events, a process inducing instantaneous neural responses as the music confronts these expectations. Measuring such neural correlates would represent a direct window into high-level brain processing. Here we recorded cortical signals as participants listened to Bach melodies. We assessed the relative contributions of acoustic versus melodic components of the music to the neural signal. Melodic features included information on pitch progressions and their tempo, which were extracted from a predictive model of musical structure based on Markov chains. We related the music to brain activity with temporal response functions demonstrating, for the first time, distinct cortical encoding of pitch and note-onset expectations during naturalistic music listening. This encoding was most pronounced at response latencies up to 350 ms, and in both planum temporale and Heschl's gyrus.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
| | - Claire Pelofi
- Department of Psychology, New York UniversityNew YorkUnited States
- Institut de Neurosciences des Système, UMR S 1106, INSERM, Aix Marseille UniversitéMarseilleFrance
| | | | - Prachi Patel
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| | - Ashesh D Mehta
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/NorthwellManhassetUnited States
- Feinstein Institute of Medical Research, Northwell HealthManhassetUnited States
| | - Jose L Herrero
- Department of Neurosurgery, Zucker School of Medicine at Hofstra/NorthwellManhassetUnited States
- Feinstein Institute of Medical Research, Northwell HealthManhassetUnited States
| | - Alain de Cheveigné
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
- UCL Ear InstituteLondonUnited Kingdom
| | - Shihab Shamma
- Laboratoire des systèmes perceptifs, Département d’études cognitives, École normale supérieure, PSL University, CNRS75005 ParisFrance
- Institute for Systems Research, Electrical and Computer Engineering, University of MarylandCollege ParkUnited States
| | - Nima Mesgarani
- Department of Electrical Engineering, Columbia UniversityNew YorkUnited States
- Mortimer B Zuckerman Mind Brain Behavior Institute, Columbia UniversityNew YorkUnited States
| |
Collapse
|
34
|
Abstract
What is the nature of the neural code by which the human brain represents spoken language? New research suggests that previous findings of a language-specific code in cortical responses to speech can be explained solely by simple acoustic features.
Collapse
Affiliation(s)
- Ediz Sohoglu
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge CB2 7EF, UK.
| |
Collapse
|
35
|
Joint Representation of Spatial and Phonetic Features in the Human Core Auditory Cortex. Cell Rep 2020; 24:2051-2062.e2. [PMID: 30134167 DOI: 10.1016/j.celrep.2018.07.076] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Revised: 04/09/2018] [Accepted: 07/22/2018] [Indexed: 12/12/2022] Open
Abstract
The human auditory cortex simultaneously processes speech and determines the location of a speaker in space. Neuroimaging studies in humans have implicated core auditory areas in processing the spectrotemporal and the spatial content of sound; however, how these features are represented together is unclear. We recorded directly from human subjects implanted bilaterally with depth electrodes in core auditory areas as they listened to speech from different directions. We found local and joint selectivity to spatial and spectrotemporal speech features, where the spatial and spectrotemporal features are organized independently of each other. This representation enables successful decoding of both spatial and phonetic information. Furthermore, we found that the location of the speaker does not change the spectrotemporal tuning of the electrodes but, rather, modulates their mean response level. Our findings contribute to defining the functional organization of responses in the human auditory cortex, with implications for more accurate neurophysiological models of speech processing.
Collapse
|
36
|
Obleser J, Kayser C. Neural Entrainment and Attentional Selection in the Listening Brain. Trends Cogn Sci 2019; 23:913-926. [PMID: 31606386 DOI: 10.1016/j.tics.2019.08.004] [Citation(s) in RCA: 181] [Impact Index Per Article: 36.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2019] [Revised: 08/16/2019] [Accepted: 08/20/2019] [Indexed: 01/07/2023]
Abstract
The streams of sounds we typically attend to abound in acoustic regularities. Neural entrainment is seen as an important mechanism that the listening brain exploits to attune to these regularities and to enhance the representation of attended sounds. We delineate the neurophysiology underlying this mechanism and review entrainment alongside its more pragmatic signature, often called 'speech tracking'. The latter has become a popular analytical approach to trace the reflection of acoustic and linguistic information at different levels of granularity, from neurophysiology to neuroimaging. As we discuss, the concept of entrainment offers both a putative neurophysiological mechanism for selective listening and a versatile window onto the neural basis of hearing and speech comprehension.
Collapse
Affiliation(s)
- Jonas Obleser
- Department of Psychology, University of Lübeck, 23562 Lübeck, Germany.
| | - Christoph Kayser
- Department for Cognitive Neuroscience and Cognitive Interaction Technology, Center of Excellence, Bielefeld University, 33615 Bielefeld, Germany.
| |
Collapse
|
37
|
Ogg M, Carlson TA, Slevc LR. The Rapid Emergence of Auditory Object Representations in Cortex Reflect Central Acoustic Attributes. J Cogn Neurosci 2019; 32:111-123. [PMID: 31560265 DOI: 10.1162/jocn_a_01472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Human listeners are bombarded by acoustic information that the brain rapidly organizes into coherent percepts of objects and events in the environment, which aids speech and music perception. The efficiency of auditory object recognition belies the critical constraint that acoustic stimuli necessarily require time to unfold. Using magnetoencephalography, we studied the time course of the neural processes that transform dynamic acoustic information into auditory object representations. Participants listened to a diverse set of 36 tokens comprising everyday sounds from a typical human environment. Multivariate pattern analysis was used to decode the sound tokens from the magnetoencephalographic recordings. We show that sound tokens can be decoded from brain activity beginning 90 msec after stimulus onset with peak decoding performance occurring at 155 msec poststimulus onset. Decoding performance was primarily driven by differences between category representations (e.g., environmental vs. instrument sounds), although within-category decoding was better than chance. Representational similarity analysis revealed that these emerging neural representations were related to harmonic and spectrotemporal differences among the stimuli, which correspond to canonical acoustic features processed by the auditory pathway. Our findings begin to link the processing of physical sound properties with the perception of auditory objects and events in cortex.
Collapse
|
38
|
Broderick MP, Anderson AJ, Lalor EC. Semantic Context Enhances the Early Auditory Encoding of Natural Speech. J Neurosci 2019; 39:7564-7575. [PMID: 31371424 PMCID: PMC6750931 DOI: 10.1523/jneurosci.0584-19.2019] [Citation(s) in RCA: 62] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2019] [Revised: 07/20/2019] [Accepted: 07/29/2019] [Indexed: 01/22/2023] Open
Abstract
Speech perception involves the integration of sensory input with expectations based on the context of that speech. Much debate surrounds the issue of whether or not prior knowledge feeds back to affect early auditory encoding in the lower levels of the speech processing hierarchy, or whether perception can be best explained as a purely feedforward process. Although there has been compelling evidence on both sides of this debate, experiments involving naturalistic speech stimuli to address these questions have been lacking. Here, we use a recently introduced method for quantifying the semantic context of speech and relate it to a commonly used method for indexing low-level auditory encoding of speech. The relationship between these measures is taken to be an indication of how semantic context leading up to a word influences how its low-level acoustic and phonetic features are processed. We record EEG from human participants (both male and female) listening to continuous natural speech and find that the early cortical tracking of a word's speech envelope is enhanced by its semantic similarity to its sentential context. Using a forward modeling approach, we find that prediction accuracy of the EEG signal also shows the same effect. Furthermore, this effect shows distinct temporal patterns of correlation depending on the type of speech input representation (acoustic or phonological) used for the model, implicating a top-down propagation of information through the processing hierarchy. These results suggest a mechanism that links top-down prior information with the early cortical entrainment of words in natural, continuous speech.SIGNIFICANCE STATEMENT During natural speech comprehension, we use semantic context when processing information about new incoming words. However, precisely how the neural processing of bottom-up sensory information is affected by top-down context-based predictions remains controversial. We address this discussion using a novel approach that indexes a word's similarity to context and how well a word's acoustic and phonetic features are processed by the brain at the time of its utterance. We relate these two measures and show that lower-level auditory tracking of speech improves for words that are more related to their preceding context. These results suggest a mechanism that links top-down prior information with bottom-up sensory processing in the context of natural, narrative speech listening.
Collapse
Affiliation(s)
- Michael P Broderick
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland,
| | - Andrew J Anderson
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| | - Edmund C Lalor
- School of Engineering, Trinity Centre for Bioengineering and Trinity College Institute of Neuroscience, Trinity College Dublin, Dublin 2, Ireland
- Department of Biomedical Engineering, and
- Department of Neuroscience and Del Monte Institute for Neuroscience, University of Rochester, Rochester, New York 14627
| |
Collapse
|
39
|
Daube C, Ince RAA, Gross J. Simple Acoustic Features Can Explain Phoneme-Based Predictions of Cortical Responses to Speech. Curr Biol 2019; 29:1924-1937.e9. [PMID: 31130454 PMCID: PMC6584359 DOI: 10.1016/j.cub.2019.04.067] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Revised: 03/25/2019] [Accepted: 04/25/2019] [Indexed: 01/06/2023]
Abstract
When we listen to speech, we have to make sense of a waveform of sound pressure. Hierarchical models of speech perception assume that, to extract semantic meaning, the signal is transformed into unknown, intermediate neuronal representations. Traditionally, studies of such intermediate representations are guided by linguistically defined concepts, such as phonemes. Here, we argue that in order to arrive at an unbiased understanding of the neuronal responses to speech, we should focus instead on representations obtained directly from the stimulus. We illustrate our view with a data-driven, information theoretic analysis of a dataset of 24 young, healthy humans who listened to a 1 h narrative while their magnetoencephalogram (MEG) was recorded. We find that two recent results, the improved performance of an encoding model in which annotated linguistic and acoustic features were combined and the decoding of phoneme subgroups from phoneme-locked responses, can be explained by an encoding model that is based entirely on acoustic features. These acoustic features capitalize on acoustic edges and outperform Gabor-filtered spectrograms, which can explicitly describe the spectrotemporal characteristics of individual phonemes. By replicating our results in publicly available electroencephalography (EEG) data, we conclude that models of brain responses based on linguistic features can serve as excellent benchmarks. However, we believe that in order to further our understanding of human cortical responses to speech, we should also explore low-level and parsimonious explanations for apparent high-level phenomena.
Collapse
Affiliation(s)
- Christoph Daube
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK.
| | - Robin A A Ince
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK
| | - Joachim Gross
- Institute of Neuroscience and Psychology, University of Glasgow, 62 Hillhead Street, Glasgow G12 8QB, UK; Institute for Biomagnetism and Biosignalanalysis, University of Münster, Malmedyweg 15, 48149 Münster, Germany
| |
Collapse
|
40
|
Adaptation of the human auditory cortex to changing background noise. Nat Commun 2019; 10:2509. [PMID: 31175304 PMCID: PMC6555798 DOI: 10.1038/s41467-019-10611-4] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2018] [Accepted: 05/21/2019] [Indexed: 11/09/2022] Open
Abstract
Speech communication in real-world environments requires adaptation to changing acoustic conditions. How the human auditory cortex adapts as a new noise source appears in or disappears from the acoustic scene remain unclear. Here, we directly measured neural activity in the auditory cortex of six human subjects as they listened to speech with abruptly changing background noises. We report rapid and selective suppression of acoustic features of noise in the neural responses. This suppression results in enhanced representation and perception of speech acoustic features. The degree of adaptation to different background noises varies across neural sites and is predictable from the tuning properties and speech specificity of the sites. Moreover, adaptation to background noise is unaffected by the attentional focus of the listener. The convergence of these neural and perceptual effects reveals the intrinsic dynamic mechanisms that enable a listener to filter out irrelevant sound sources in a changing acoustic scene.
Collapse
|
41
|
Di Liberto GM, Wong D, Melnik GA, de Cheveigné A. Low-frequency cortical responses to natural speech reflect probabilistic phonotactics. Neuroimage 2019; 196:237-247. [PMID: 30991126 DOI: 10.1016/j.neuroimage.2019.04.037] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 03/18/2019] [Accepted: 04/11/2019] [Indexed: 11/19/2022] Open
Abstract
Humans comprehend speech despite the various challenges such as mispronunciation and noisy environments. Our auditory system is robust to these thanks to the integration of the sensory input with prior knowledge and expectations built on language-specific regularities. One such regularity regards the permissible phoneme sequences, which determine the likelihood that a word belongs to a given language (phonotactic probability; "blick" is more likely to be an English word than "bnick"). Previous research demonstrated that violations of these rules modulate brain-evoked responses. However, several fundamental questions remain unresolved, especially regarding the neural encoding and integration strategy of phonotactics in naturalistic conditions, when there are no (or few) violations. Here, we used linear modelling to assess the influence of phonotactic probabilities on the brain responses to narrative speech measured with non-invasive EEG. We found that the relationship between continuous speech and EEG responses is best described when the stimulus descriptor includes phonotactic probabilities. This indicates that low-frequency cortical signals (<9 Hz) reflect the integration of phonotactic information during natural speech perception, providing us with a measure of phonotactic processing at the individual subject-level. Furthermore, phonotactics-related signals showed the strongest speech-EEG interactions at latencies of 100-500 ms, supporting a pre-lexical role of phonotactic information.
Collapse
Affiliation(s)
- Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France.
| | - Daniel Wong
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France
| | - Gerda Ana Melnik
- Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France; Laboratoire de Sciences Cognitives et Psycholinguistique, ENS, EHESS, CNRS, France
| | - Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, France; UCL Ear Institute, London, United Kingdom
| |
Collapse
|
42
|
Xie Z, Reetzke R, Chandrasekaran B. Machine Learning Approaches to Analyze Speech-Evoked Neurophysiological Responses. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2019; 62:587-601. [PMID: 30950746 PMCID: PMC6802895 DOI: 10.1044/2018_jslhr-s-astm-18-0244] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2018] [Revised: 10/28/2018] [Accepted: 11/26/2018] [Indexed: 05/27/2023]
Abstract
Purpose Speech-evoked neurophysiological responses are often collected to answer clinically and theoretically driven questions concerning speech and language processing. Here, we highlight the practical application of machine learning (ML)-based approaches to analyzing speech-evoked neurophysiological responses. Method Two categories of ML-based approaches are introduced: decoding models, which generate a speech stimulus output using the features from the neurophysiological responses, and encoding models, which use speech stimulus features to predict neurophysiological responses. In this review, we focus on (a) a decoding model classification approach, wherein speech-evoked neurophysiological responses are classified as belonging to 1 of a finite set of possible speech events (e.g., phonological categories), and (b) an encoding model temporal response function approach, which quantifies the transformation of a speech stimulus feature to continuous neural activity. Results We illustrate the utility of the classification approach to analyze early electroencephalographic (EEG) responses to Mandarin lexical tone categories from a traditional experimental design, and to classify EEG responses to English phonemes evoked by natural continuous speech (i.e., an audiobook) into phonological categories (plosive, fricative, nasal, and vowel). We also demonstrate the utility of temporal response function to predict EEG responses to natural continuous speech from acoustic features. Neural metrics from the 3 examples all exhibit statistically significant effects at the individual level. Conclusion We propose that ML-based approaches can complement traditional analysis approaches to analyze neurophysiological responses to speech signals and provide a deeper understanding of natural speech and language processing using ecologically valid paradigms in both typical and clinical populations.
Collapse
Affiliation(s)
- Zilong Xie
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Rachel Reetzke
- Department of Communication Sciences and Disorders, The University of Texas at Austin
| | - Bharath Chandrasekaran
- Department of Communication Science and Disorders, School of Health and Rehabilitation Sciences, University of Pittsburgh
| |
Collapse
|
43
|
Liu ST, Montes-Lourido P, Wang X, Sadagopan S. Optimal features for auditory categorization. Nat Commun 2019; 10:1302. [PMID: 30899018 PMCID: PMC6428858 DOI: 10.1038/s41467-019-09115-y] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 02/20/2019] [Indexed: 01/13/2023] Open
Abstract
Humans and vocal animals use vocalizations to communicate with members of their species. A necessary function of auditory perception is to generalize across the high variability inherent in vocalization production and classify them into behaviorally distinct categories ('words' or 'call types'). Here, we demonstrate that detecting mid-level features in calls achieves production-invariant classification. Starting from randomly chosen marmoset call features, we use a greedy search algorithm to determine the most informative and least redundant features necessary for call classification. High classification performance is achieved using only 10-20 features per call type. Predictions of tuning properties of putative feature-selective neurons accurately match some observed auditory cortical responses. This feature-based approach also succeeds for call categorization in other species, and for other complex classification tasks such as caller identification. Our results suggest that high-level neural representations of sounds are based on task-dependent features optimized for specific computational goals.
Collapse
Affiliation(s)
- Shi Tong Liu
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, 15213, PA, USA
| | - Pilar Montes-Lourido
- Department of Neurobiology, University of Pittsburgh, Pittsburgh, 15213, PA, USA
| | - Xiaoqin Wang
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, 21205, MD, USA
| | - Srivatsun Sadagopan
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, 15213, PA, USA. .,Department of Neurobiology, University of Pittsburgh, Pittsburgh, 15213, PA, USA. .,Department of Otolaryngology, University of Pittsburgh, Pittsburgh, 15213, PA, USA.
| |
Collapse
|
44
|
Ogg M, Moraczewski D, Kuchinsky SE, Slevc LR. Separable neural representations of sound sources: Speaker identity and musical timbre. Neuroimage 2019; 191:116-126. [PMID: 30731247 DOI: 10.1016/j.neuroimage.2019.01.075] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Revised: 12/14/2018] [Accepted: 01/30/2019] [Indexed: 11/28/2022] Open
Abstract
Human listeners can quickly and easily recognize different sound sources (objects and events) in their environment. Understanding how this impressive ability is accomplished can improve signal processing and machine intelligence applications along with assistive listening technologies. However, it is not clear how the brain represents the many sounds that humans can recognize (such as speech and music) at the level of individual sources, categories and acoustic features. To examine the cortical organization of these representations, we used patterns of fMRI responses to decode 1) four individual speakers and instruments from one another (separately, within each category), 2) the superordinate category labels associated with each stimulus (speech or instrument), and 3) a set of simple synthesized sounds that could be differentiated entirely on their acoustic features. Data were collected using an interleaved silent steady state sequence to increase the temporal signal-to-noise ratio, and mitigate issues with auditory stimulus presentation in fMRI. Largely separable clusters of voxels in the temporal lobes supported the decoding of individual speakers and instruments from other stimuli in the same category. Decoding the superordinate category of each sound was more accurate and involved a larger portion of the temporal lobes. However, these clusters all overlapped with areas that could decode simple, acoustically separable stimuli. Thus, individual sound sources from different sound categories are represented in separate regions of the temporal lobes that are situated within regions implicated in more general acoustic processes. These results bridge an important gap in our understanding of cortical representations of sounds and their acoustics.
Collapse
Affiliation(s)
- Mattson Ogg
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA.
| | - Dustin Moraczewski
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| | - Stefanie E Kuchinsky
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Center for Advanced Study of Language, University of Maryland, College Park, MD, 20742, USA; Maryland Neuroimaging Center, University of Maryland, College Park, MD, 20742, USA
| | - L Robert Slevc
- Program in Neuroscience and Cognitive Science, University of Maryland, College Park, MD, 20742, USA; Department of Psychology, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
45
|
Towards reconstructing intelligible speech from the human auditory cortex. Sci Rep 2019; 9:874. [PMID: 30696881 PMCID: PMC6351601 DOI: 10.1038/s41598-018-37359-z] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/30/2018] [Indexed: 11/08/2022] Open
Abstract
Auditory stimulus reconstruction is a technique that finds the best approximation of the acoustic stimulus from the population of evoked neural activity. Reconstructing speech from the human auditory cortex creates the possibility of a speech neuroprosthetic to establish a direct communication with the brain and has been shown to be possible in both overt and covert conditions. However, the low quality of the reconstructed speech has severely limited the utility of this method for brain-computer interface (BCI) applications. To advance the state-of-the-art in speech neuroprosthesis, we combined the recent advances in deep learning with the latest innovations in speech synthesis technologies to reconstruct closed-set intelligible speech from the human auditory cortex. We investigated the dependence of reconstruction accuracy on linear and nonlinear (deep neural network) regression methods and the acoustic representation that is used as the target of reconstruction, including auditory spectrogram and speech synthesis parameters. In addition, we compared the reconstruction accuracy from low and high neural frequency ranges. Our results show that a deep neural network model that directly estimates the parameters of a speech synthesizer from all neural frequencies achieves the highest subjective and objective scores on a digit recognition task, improving the intelligibility by 65% over the baseline method which used linear regression to reconstruct the auditory spectrogram. These results demonstrate the efficacy of deep learning and speech synthesis algorithms for designing the next generation of speech BCI systems, which not only can restore communications for paralyzed patients but also have the potential to transform human-computer interaction technologies.
Collapse
|
46
|
McCloy DR, Lee AKC. Investigating the fit between phonological feature systems and brain responses to speech using EEG. LANGUAGE, COGNITION AND NEUROSCIENCE 2019; 34:662-676. [PMID: 32984429 PMCID: PMC7518517 DOI: 10.1080/23273798.2019.1569246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2018] [Accepted: 01/03/2019] [Indexed: 06/11/2023]
Abstract
This paper describes a technique to assess the correspondence between patterns of similarity in the brain's response to speech sounds and the patterns of similarity encoded in phonological feature systems, by quantifying the recoverability of phonological features from the neural data using supervised learning. The technique is applied to EEG recordings collected during passive listening to consonant-vowel syllables. Three published phonological feature systems are compared, and are shown to differ in their ability to recover certain speech sound contrasts from the neural data. For the phonological feature system that best reflects patterns of similarity in the neural data, a leave-one-out analysis indicates some consistency across subjects in which features have greatest impact on the fit, but considerable across-subject heterogeneity remains in the rank ordering of features in this regard.
Collapse
Affiliation(s)
- Daniel R McCloy
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| | - Adrian K C Lee
- University of Washington, Institute for Learning and Brain Sciences, Seattle, WA, United States
| |
Collapse
|
47
|
Semantic-hierarchical model improves classification of spoken-word evoked electrocorticography. J Neurosci Methods 2019; 311:253-258. [PMID: 30389490 DOI: 10.1016/j.jneumeth.2018.10.034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 10/22/2018] [Accepted: 10/26/2018] [Indexed: 11/24/2022]
Abstract
Classification of spoken word-evoked potentials is useful for both neuroscientific and clinical applications including brain-computer interfaces (BCIs). By evaluating whether adopting a biology-based structure improves a classifier's accuracy, we can investigate the importance of such structure in human brain circuitry, and advance BCI performance. In this study, we propose a semantic-hierarchical structure for classifying spoken word-evoked cortical responses. The proposed structure decodes the semantic grouping of the words first (e.g., a body part vs. a number) and then decodes which exact word was heard. The proposed classifier structure exhibited a consistent ∼10% improvement of classification accuracy when compared with a non-hierarchical structure. Our result provides a tool for investigating the neural representation of semantic hierarchy and the acoustic properties of spoken words in human brains. Our results suggest an improved algorithm for BCIs operated by decoding heard, and possibly imagined, words.
Collapse
|
48
|
Anderson AJ, Broderick MP, Lalor EC. Neuroscience: Great Expectations at the Speech–Language Interface. Curr Biol 2018; 28:R1396-R1398. [DOI: 10.1016/j.cub.2018.10.063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
49
|
de Cheveigné A, Di Liberto GM, Arzounian D, Wong DDE, Hjortkjær J, Fuglsang S, Parra LC. Multiway canonical correlation analysis of brain data. Neuroimage 2018; 186:728-740. [PMID: 30496819 DOI: 10.1016/j.neuroimage.2018.11.026] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 10/11/2018] [Accepted: 11/16/2018] [Indexed: 01/12/2023] Open
Abstract
Brain data recorded with electroencephalography (EEG), magnetoencephalography (MEG) and related techniques often have poor signal-to-noise ratios due to the presence of multiple competing sources and artifacts. A common remedy is to average responses over repeats of the same stimulus, but this is not applicable for temporally extended stimuli that are presented only once (speech, music, movies, natural sound). An alternative is to average responses over multiple subjects that were presented with identical stimuli, but differences in geometry of brain sources and sensors reduce the effectiveness of this solution. Multiway canonical correlation analysis (MCCA) brings a solution to this problem by allowing data from multiple subjects to be fused in such a way as to extract components common to all. This paper reviews the method, offers application examples that illustrate its effectiveness, and outlines the caveats and risks entailed by the method.
Collapse
Affiliation(s)
- Alain de Cheveigné
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France; UCL Ear Institute, London, United Kingdom.
| | - Giovanni M Di Liberto
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Dorothée Arzounian
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Daniel D E Wong
- Laboratoire des Systèmes Perceptifs, UMR 8248, CNRS, France; Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL University, Paris, France
| | - Jens Hjortkjær
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark; Danish Research Centre for Magnetic Resonance, Centre for Functional and Diagnostic Imaging and Research, Copenhagen University Hospital Hvidovre, Denmark
| | - Søren Fuglsang
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Denmark
| | | |
Collapse
|
50
|
Sankaran N, Swaminathan J, Micheyl C, Kalluri S, Carlile S. Tracking the dynamic representation of consonants from auditory periphery to cortex. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2462. [PMID: 30404465 DOI: 10.1121/1.5065492] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Accepted: 10/09/2018] [Indexed: 06/08/2023]
Abstract
In order to perceive meaningful speech, the auditory system must recognize different phonemes amidst a noisy and variable acoustic signal. To better understand the processing mechanisms underlying this ability, evoked cortical responses to different spoken consonants were measured with electroencephalography (EEG). Using multivariate pattern analysis (MVPA), binary classifiers attempted to discriminate between the EEG activity evoked by two given consonants at each peri-stimulus time sample, providing a dynamic measure of their cortical dissimilarity. To examine the relationship between representations at the auditory periphery and cortex, MVPA was also applied to modelled auditory-nerve (AN) responses of consonants, and time-evolving AN-based and EEG-based dissimilarities were compared with one another. Cortical dissimilarities between consonants were commensurate with their articulatory distinctions, particularly their manner of articulation, and to a lesser extent, their voicing. Furthermore, cortical distinctions between consonants in two periods of activity, centered at 130 and 400 ms after onset, aligned with their peripheral dissimilarities in distinct onset and post-onset periods, respectively. In relating speech representations across articulatory, peripheral, and cortical domains, the understanding of crucial transformations in the auditory pathway underlying the ability to perceive speech is advanced.
Collapse
Affiliation(s)
- Narayan Sankaran
- Auditory Neuroscience Laboratory, School of Medical Sciences, The University of Sydney, Sydney, New South Wales 2006, Australia
| | - Jayaganesh Swaminathan
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704, USA
| | - Christophe Micheyl
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704, USA
| | - Sridhar Kalluri
- Starkey Hearing Research Center, 2150 Shattuck Avenue, Suite 408, Berkeley, California 94704, USA
| | - Simon Carlile
- Auditory Neuroscience Laboratory, School of Medical Sciences, The University of Sydney, Sydney, New South Wales 2006, Australia
| |
Collapse
|