1
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
2
|
Guilleminot P, Graef C, Butters E, Reichenbach T. Audiotactile Stimulation Can Improve Syllable Discrimination through Multisensory Integration in the Theta Frequency Band. J Cogn Neurosci 2023; 35:1760-1772. [PMID: 37677062 DOI: 10.1162/jocn_a_02045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Syllables are an essential building block of speech. We recently showed that tactile stimuli linked to the perceptual centers of syllables in continuous speech can improve speech comprehension. The rate of syllables lies in the theta frequency range, between 4 and 8 Hz, and the behavioral effect appears linked to multisensory integration in this frequency band. Because this neural activity may be oscillatory, we hypothesized that a behavioral effect may also occur not only while but also after this activity has been evoked or entrained through vibrotactile pulses. Here, we show that audiotactile integration regarding the perception of single syllables, both on the neural and on the behavioral level, is consistent with this hypothesis. We first stimulated participants with a series of vibrotactile pulses and then presented them with a syllable in background noise. We show that, at a delay of 200 msec after the last vibrotactile pulse, audiotactile integration still occurred in the theta band and syllable discrimination was enhanced. Moreover, the dependence of both the neural multisensory integration as well as of the behavioral discrimination on the delay of the audio signal with respect to the last tactile pulse was consistent with a damped oscillation. In addition, the multisensory gain is correlated with the syllable discrimination score. Our results therefore evidence the role of the theta band in audiotactile integration and provide evidence that these effects may involve oscillatory activity that still persists after the tactile stimulation.
Collapse
|
3
|
Rizzi R, Bidelman GM. Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech. Cereb Cortex 2023; 33:10076-10086. [PMID: 37522248 PMCID: PMC10502779 DOI: 10.1093/cercor/bhad266] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 06/27/2023] [Accepted: 07/10/2023] [Indexed: 08/01/2023] Open
Abstract
So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- versus high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" versus "ga." The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| | - Gavin M Bidelman
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
- Cognitive Science Program, Indiana University, Bloomington, IN, United States
| |
Collapse
|
4
|
Rizzi R, Bidelman GM. Duplex perception reveals brainstem auditory representations are modulated by listeners' ongoing percept for speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.09.540018. [PMID: 37214801 PMCID: PMC10197666 DOI: 10.1101/2023.05.09.540018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
So-called duplex speech stimuli with perceptually ambiguous spectral cues to one ear and isolated low- vs. high-frequency third formant "chirp" to the opposite ear yield a coherent percept supporting their phonetic categorization. Critically, such dichotic sounds are only perceived categorically upon binaural integration. Here, we used frequency-following responses (FFRs), scalp-recorded potentials reflecting phase-locked subcortical activity, to investigate brainstem responses to fused speech percepts and to determine whether FFRs reflect binaurally integrated category-level representations. We recorded FFRs to diotic and dichotic stop-consonants (/da/, /ga/) that either did or did not require binaural fusion to properly label along with perceptually ambiguous sounds without clear phonetic identity. Behaviorally, listeners showed clear categorization of dichotic speech tokens confirming they were heard with a fused, phonetic percept. Neurally, we found FFRs were stronger for categorically perceived speech relative to category-ambiguous tokens but also differentiated phonetic categories for both diotically and dichotically presented speech sounds. Correlations between neural and behavioral data further showed FFR latency predicted the degree to which listeners labeled tokens as "da" vs. "ga". The presence of binaurally integrated, category-level information in FFRs suggests human brainstem processing reflects a surprisingly abstract level of the speech code typically circumscribed to much later cortical processing.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA
| | - Gavin M. Bidelman
- Department of Speech, Language, and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA
| |
Collapse
|
5
|
Bidelman GM, Carter JA. Continuous dynamics in behavior reveal interactions between perceptual warping in categorization and speech-in-noise perception. Front Neurosci 2023; 17:1032369. [PMID: 36937676 PMCID: PMC10014819 DOI: 10.3389/fnins.2023.1032369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Introduction Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context and stimulus precedence; listeners' percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category. Here, we investigated whether such perceptual warping-which amplify categorical hearing-might alter speech processing in noise-degraded listening scenarios. Methods We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Results Listeners' responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. Discussion These data describe interactions between perceptual warping in categorization and speech-in-noise perception: warping strengthens the behavioral attraction to relevant speech categories, making listeners more decisive (though not necessarily more accurate) in their decisions of both clean and noise-degraded speech.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
| | - Jared A. Carter
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Hearing Sciences – Scottish Section, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
6
|
Moinuddin KA, Havugimana F, Al-Fahad R, Bidelman GM, Yeasin M. Unraveling Spatial-Spectral Dynamics of Speech Categorization Speed Using Convolutional Neural Networks. Brain Sci 2022; 13:brainsci13010075. [PMID: 36672055 PMCID: PMC9856675 DOI: 10.3390/brainsci13010075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
The process of categorizing sounds into distinct phonetic categories is known as categorical perception (CP). Response times (RTs) provide a measure of perceptual difficulty during labeling decisions (i.e., categorization). The RT is quasi-stochastic in nature due to individuality and variations in perceptual tasks. To identify the source of RT variation in CP, we have built models to decode the brain regions and frequency bands driving fast, medium and slow response decision speeds. In particular, we implemented a parameter optimized convolutional neural network (CNN) to classify listeners' behavioral RTs from their neural EEG data. We adopted visual interpretation of model response using Guided-GradCAM to identify spatial-spectral correlates of RT. Our framework includes (but is not limited to): (i) a data augmentation technique designed to reduce noise and control the overall variance of EEG dataset; (ii) bandpower topomaps to learn the spatial-spectral representation using CNN; (iii) large-scale Bayesian hyper-parameter optimization to find best performing CNN model; (iv) ANOVA and posthoc analysis on Guided-GradCAM activation values to measure the effect of neural regions and frequency bands on behavioral responses. Using this framework, we observe that α-β (10-20 Hz) activity over left frontal, right prefrontal/frontal, and right cerebellar regions are correlated with RT variation. Our results indicate that attention, template matching, temporal prediction of acoustics, motor control, and decision uncertainty are the most probable factors in RT variation.
Collapse
Affiliation(s)
| | - Felix Havugimana
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| | - Rakib Al-Fahad
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA
| | - Mohammed Yeasin
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| |
Collapse
|
7
|
Mankel K, Shrestha U, Tipirneni-Sajja A, Bidelman GM. Functional Plasticity Coupled With Structural Predispositions in Auditory Cortex Shape Successful Music Category Learning. Front Neurosci 2022; 16:897239. [PMID: 35837119 PMCID: PMC9274125 DOI: 10.3389/fnins.2022.897239] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open
Abstract
Categorizing sounds into meaningful groups helps listeners more efficiently process the auditory scene and is a foundational skill for speech perception and language development. Yet, how auditory categories develop in the brain through learning, particularly for non-speech sounds (e.g., music), is not well understood. Here, we asked musically naïve listeners to complete a brief (∼20 min) training session where they learned to identify sounds from a musical interval continuum (minor-major 3rds). We used multichannel EEG to track behaviorally relevant neuroplastic changes in the auditory event-related potentials (ERPs) pre- to post-training. To rule out mere exposure-induced changes, neural effects were evaluated against a control group of 14 non-musicians who did not undergo training. We also compared individual categorization performance with structural volumetrics of bilateral Heschl's gyrus (HG) from MRI to evaluate neuroanatomical substrates of learning. Behavioral performance revealed steeper (i.e., more categorical) identification functions in the posttest that correlated with better training accuracy. At the neural level, improvement in learners' behavioral identification was characterized by smaller P2 amplitudes at posttest, particularly over right hemisphere. Critically, learning-related changes in the ERPs were not observed in control listeners, ruling out mere exposure effects. Learners also showed smaller and thinner HG bilaterally, indicating superior categorization was associated with structural differences in primary auditory brain regions. Collectively, our data suggest successful auditory categorical learning of music sounds is characterized by short-term functional changes (i.e., greater post-training efficiency) in sensory coding processes superimposed on preexisting structural differences in bilateral auditory cortex.
Collapse
Affiliation(s)
- Kelsey Mankel
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
- Center for Mind and Brain, University of California, Davis, Davis, CA, United States
| | - Utsav Shrestha
- Department of Biomedical Engineering, University of Memphis, Memphis, TN, United States
| | | | - Gavin M. Bidelman
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
| |
Collapse
|
8
|
Carter JA, Buder EH, Bidelman GM. Nonlinear dynamics in auditory cortical activity reveal the neural basis of perceptual warping in speech categorization. JASA EXPRESS LETTERS 2022; 2:045201. [PMID: 35434716 PMCID: PMC8984957 DOI: 10.1121/10.0009896] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Surrounding context influences speech listening, resulting in dynamic shifts to category percepts. To examine its neural basis, event-related potentials (ERPs) were recorded during vowel identification with continua presented in random, forward, and backward orders to induce perceptual warping. Behaviorally, sequential order shifted individual listeners' categorical boundary, versus random delivery, revealing perceptual warping (biasing) of the heard phonetic category dependent on recent stimulus history. ERPs revealed later (∼300 ms) activity localized to superior temporal and middle/inferior frontal gyri that predicted listeners' hysteresis/enhanced contrast magnitudes. Findings demonstrate that interactions between frontotemporal brain regions govern top-down, stimulus history effects on speech categorization.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, Tennessee 38152, USA
| | - Eugene H Buder
- School of Communication Sciences and Disorders, University of Memphis, Memphis, Tennessee 38152, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, , Bloomington, Indiana 47408, USA , ,
| |
Collapse
|
9
|
Hernández-Pérez H, Mikiel-Hunter J, McAlpine D, Dhar S, Boothalingam S, Monaghan JJM, McMahon CM. Understanding degraded speech leads to perceptual gating of a brainstem reflex in human listeners. PLoS Biol 2021; 19:e3001439. [PMID: 34669696 PMCID: PMC8559948 DOI: 10.1371/journal.pbio.3001439] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 11/01/2021] [Accepted: 10/07/2021] [Indexed: 11/19/2022] Open
Abstract
The ability to navigate "cocktail party" situations by focusing on sounds of interest over irrelevant, background sounds is often considered in terms of cortical mechanisms. However, subcortical circuits such as the pathway underlying the medial olivocochlear (MOC) reflex modulate the activity of the inner ear itself, supporting the extraction of salient features from auditory scene prior to any cortical processing. To understand the contribution of auditory subcortical nuclei and the cochlea in complex listening tasks, we made physiological recordings along the auditory pathway while listeners engaged in detecting non(sense) words in lists of words. Both naturally spoken and intrinsically noisy, vocoded speech-filtering that mimics processing by a cochlear implant (CI)-significantly activated the MOC reflex, but this was not the case for speech in background noise, which more engaged midbrain and cortical resources. A model of the initial stages of auditory processing reproduced specific effects of each form of speech degradation, providing a rationale for goal-directed gating of the MOC reflex based on enhancing the representation of the energy envelope of the acoustic waveform. Our data reveal the coexistence of 2 strategies in the auditory system that may facilitate speech understanding in situations where the signal is either intrinsically degraded or masked by extrinsic acoustic energy. Whereas intrinsically degraded streams recruit the MOC reflex to improve representation of speech cues peripherally, extrinsically masked streams rely more on higher auditory centres to denoise signals.
Collapse
Affiliation(s)
- Heivet Hernández-Pérez
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Jason Mikiel-Hunter
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - David McAlpine
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| | - Sumitrajit Dhar
- Department of Communication Sciences and Disorders, Northwestern University, Evanston, Illinois, United States of America
| | - Sriram Boothalingam
- University of Wisconsin-Madison, Madison, Wisconsin, United States of America
| | - Jessica J. M. Monaghan
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
- National Acoustic Laboratories, Sydney, Australia
| | - Catherine M. McMahon
- Department of Linguistics, The Australian Hearing Hub, Macquarie University, Sydney, Australia
| |
Collapse
|
10
|
OIsen T, Capurro A, Švent M, Pilati N, Large C, Hartell N, Hamann M. Sparsely Distributed, Pre-synaptic Kv3 K + Channels Control Spontaneous Firing and Cross-Unit Synchrony via the Regulation of Synaptic Noise in an Auditory Brainstem Circuit. Front Cell Neurosci 2021; 15:721371. [PMID: 34539351 PMCID: PMC8446535 DOI: 10.3389/fncel.2021.721371] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 08/02/2021] [Indexed: 11/13/2022] Open
Abstract
Spontaneous subthreshold activity in the central nervous system is fundamental to information processing and transmission, as it amplifies and optimizes sub-threshold signals, thereby improving action potential initiation and maintaining reliable firing. This form of spontaneous activity, which is frequently considered noise, is particularly important at auditory synapses where acoustic information is encoded by rapid and temporally precise firing rates. In contrast, when present in excess, this form of noise becomes detrimental to acoustic information as it contributes to the generation and maintenance of auditory disorders such as tinnitus. The most prominent contribution to subthreshold noise is spontaneous synaptic transmission (synaptic noise). Although numerous studies have examined the role of synaptic noise on single cell excitability, little is known about its pre-synaptic modulation owing in part to the difficulties of combining noise modulation with monitoring synaptic release. Here we study synaptic noise in the auditory brainstem dorsal cochlear nucleus (DCN) of mice and show that pharmacological potentiation of Kv3 K+ currents reduces the level of synaptic bombardment onto DCN principal fusiform cells. Using a transgenic mouse line (SyG37) expressing SyGCaMP2-mCherry, a calcium sensor that targets pre-synaptic terminals, we show that positive Kv3 K+ current modulation decreases calcium influx in a fifth of pre-synaptic boutons. Furthermore, while maintaining rapid and precise spike timing, positive Kv3 K+ current modulation increases the synchronization of local circuit neurons by reducing spontaneous activity. In conclusion, our study identifies a unique pre-synaptic mechanism which reduces synaptic noise at auditory synapses and contributes to the coherent activation of neurons in a local auditory brainstem circuit. This form of modulation highlights a new therapeutic target, namely the pre-synaptic bouton, for ameliorating the effects of hearing disorders which are dependent on aberrant spontaneous activity within the central auditory system.
Collapse
Affiliation(s)
- Timothy OIsen
- Department of Neuroscience, Psychology and Behaviour, College of Life Sciences, University of Leicester, Leicester, United Kingdom.,Department of Otolaryngology-Head and Neck Surgery, University of California, San Francisco, San Francisco, CA, United States
| | - Alberto Capurro
- Department of Neuroscience, Psychology and Behaviour, College of Life Sciences, University of Leicester, Leicester, United Kingdom.,Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Maša Švent
- Department of Neuroscience, Psychology and Behaviour, College of Life Sciences, University of Leicester, Leicester, United Kingdom
| | | | - Charles Large
- Autifony Therapeutics Limited, Stevenage Bioscience Catalyst, Stevenage, United Kingdom
| | - Nick Hartell
- Department of Neuroscience, Psychology and Behaviour, College of Life Sciences, University of Leicester, Leicester, United Kingdom
| | - Martine Hamann
- Department of Neuroscience, Psychology and Behaviour, College of Life Sciences, University of Leicester, Leicester, United Kingdom.,Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| |
Collapse
|
11
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18. [PMID: 33690177 DOI: 10.1101/2020.08.03.234997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 05/24/2023]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
12
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18:10.1088/1741-2552/abecf0. [PMID: 33690177 PMCID: PMC8738965 DOI: 10.1088/1741-2552/abecf0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 11/12/2022]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
13
|
Mahmud MS, Yeasin M, Bidelman GM. Speech categorization is better described by induced rather than evoked neural activity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1644. [PMID: 33765780 PMCID: PMC8267855 DOI: 10.1121/10.0003572] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Categorical perception (CP) describes how the human brain categorizes speech despite inherent acoustic variability. We examined neural correlates of CP in both evoked and induced electroencephalogram (EEG) activity to evaluate which mode best describes the process of speech categorization. Listeners labeled sounds from a vowel gradient while we recorded their EEGs. Using a source reconstructed EEG, we used band-specific evoked and induced neural activity to build parameter optimized support vector machine models to assess how well listeners' speech categorization could be decoded via whole-brain and hemisphere-specific responses. We found whole-brain evoked β-band activity decoded prototypical from ambiguous speech sounds with ∼70% accuracy. However, induced γ-band oscillations showed better decoding of speech categories with ∼95% accuracy compared to evoked β-band activity (∼70% accuracy). Induced high frequency (γ-band) oscillations dominated CP decoding in the left hemisphere, whereas lower frequencies (θ-band) dominated the decoding in the right hemisphere. Moreover, feature selection identified 14 brain regions carrying induced activity and 22 regions of evoked activity that were most salient in describing category-level speech representations. Among the areas and neural regimes explored, induced γ-band modulations were most strongly associated with listeners' behavioral CP. The data suggest that the category-level organization of speech is dominated by relatively high frequency induced brain rhythms.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, Tennessee 38152, USA
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, Tennessee 38152, USA
| | - Gavin M Bidelman
- School of Communication Sciences and Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| |
Collapse
|
14
|
Carter JA, Bidelman GM. Auditory cortex is susceptible to lexical influence as revealed by informational vs. energetic masking of speech categorization. Brain Res 2021; 1759:147385. [PMID: 33631210 DOI: 10.1016/j.brainres.2021.147385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 02/02/2023]
Abstract
Speech perception requires the grouping of acoustic information into meaningful phonetic units via the process of categorical perception (CP). Environmental masking influences speech perception and CP. However, it remains unclear at which stage of processing (encoding, decision, or both) masking affects listeners' categorization of speech signals. The purpose of this study was to determine whether linguistic interference influences the early acoustic-phonetic conversion process inherent to CP. To this end, we measured source level, event related brain potentials (ERPs) from auditory cortex (AC) and inferior frontal gyrus (IFG) as listeners rapidly categorized speech sounds along a /da/ to /ga/ continuum presented in three listening conditions: quiet, and in the presence of forward (informational masker) and time-reversed (energetic masker) 2-talker babble noise. Maskers were matched in overall SNR and spectral content and thus varied only in their degree of linguistic interference (i.e., informational masking). We hypothesized a differential effect of informational versus energetic masking on behavioral and neural categorization responses, where we predicted increased activation of frontal regions when disambiguating speech from noise, especially during lexical-informational maskers. We found (1) informational masking weakens behavioral speech phoneme identification above and beyond energetic masking; (2) low-level AC activity not only codes speech categories but is susceptible to higher-order lexical interference; (3) identifying speech amidst noise recruits a cross hemispheric circuit (ACleft → IFGright) whose engagement varies according to task difficulty. These findings provide corroborating evidence for top-down influences on the early acoustic-phonetic analysis of speech through a coordinated interplay between frontotemporal brain areas.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA.
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
| |
Collapse
|
15
|
Bidelman GM, Pearson C, Harrison A. Lexical Influences on Categorical Speech Perception Are Driven by a Temporoparietal Circuit. J Cogn Neurosci 2021; 33:840-852. [PMID: 33464162 DOI: 10.1162/jocn_a_01678] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Categorical judgments of otherwise identical phonemes are biased toward hearing words (i.e., "Ganong effect") suggesting lexical context influences perception of even basic speech primitives. Lexical biasing could manifest via late stage postperceptual mechanisms related to decision or, alternatively, top-down linguistic inference that acts on early perceptual coding. Here, we exploited the temporal sensitivity of EEG to resolve the spatiotemporal dynamics of these context-related influences on speech categorization. Listeners rapidly classified sounds from a /gɪ/-/kɪ/ gradient presented in opposing word-nonword contexts (GIFT-kift vs. giss-KISS), designed to bias perception toward lexical items. Phonetic perception shifted toward the direction of words, establishing a robust Ganong effect behaviorally. ERPs revealed a neural analog of lexical biasing emerging within ~200 msec. Source analyses uncovered a distributed neural network supporting the Ganong including middle temporal gyrus, inferior parietal lobe, and middle frontal cortex. Yet, among Ganong-sensitive regions, only left middle temporal gyrus and inferior parietal lobe predicted behavioral susceptibility to lexical influence. Our findings confirm lexical status rapidly constrains sublexical categorical representations for speech within several hundred milliseconds but likely does so outside the purview of canonical auditory-sensory brain areas.
Collapse
Affiliation(s)
- Gavin M Bidelman
- University of Memphis, TN.,University of Tennessee Health Sciences Center, Memphis, TN
| | | | | |
Collapse
|
16
|
Benítez-Barrera CR, Key AP, Ricketts TA, Tharpe AM. Central auditory system responses from children while listening to speech in noise. Hear Res 2021; 403:108165. [PMID: 33485110 DOI: 10.1016/j.heares.2020.108165] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2019] [Revised: 12/14/2020] [Accepted: 12/30/2020] [Indexed: 10/22/2022]
Abstract
Cortical auditory evoked potentials (CAEPs) have been successfully used to explore the effects of noise on speech processing at the cortical level in adults and children. The purpose of this study was to determine whether +15 dB signal-to-noise ratios (SNRs), often recommended for optimal speech perception in children, elicit higher amplitude CAEPs than more realistic SNRs encountered by children during their daily lives (+10 dB SNR). Moreover, we aimed to investigate whether cortical speech categorization is observable in children in quiet and in noise and whether CAEPs to speech in noise are related to behavioral speech perception in noise performance in children. CAEPs were measured during a passive speech-syllable task in 51 normal hearing children aged 8 to 11 years. The speech syllables /da/ and /ga/ were presented in quiet and in the presence of a 4-talker-babble noise at +15 dB and +10 dB SNR. N1 latencies and P2 amplitudes and latencies varied as a function of SNR, with poorer SNRs (+10 dB) eliciting significantly smaller P2 amplitudes and delayed N1 and P2 latencies relative to the higher SNR (+15 dB). Finally, speech categorization was present at the cortical level in this group of children in quiet and at both SNRs; however, N1 and P2 amplitudes and latencies were not related to behavioral speech-in-noise perception of children.
Collapse
Affiliation(s)
- Carlos R Benítez-Barrera
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, Tennessee, United States.
| | - Alexandra P Key
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, Tennessee, United States; Vanderbilt University Medical Center, Nashville, Tennessee, United States; Vanderbilt Kennedy Center, Vanderbilt University, Nashville, TN 37232, USA
| | - Todd A Ricketts
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, Tennessee, United States; Vanderbilt University Medical Center, Nashville, Tennessee, United States
| | - Anne Marie Tharpe
- Department of Hearing and Speech Sciences, Vanderbilt University, Nashville, Tennessee, United States; Vanderbilt University Medical Center, Nashville, Tennessee, United States
| |
Collapse
|