1
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. Brain Res 2024; 1844:149166. [PMID: 39151718 PMCID: PMC11399885 DOI: 10.1016/j.brainres.2024.149166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 07/26/2024] [Accepted: 08/13/2024] [Indexed: 08/19/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a more discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete vs. continuous hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were correlated with listeners' QuickSIN scores; shallower slopes corresponded with better speech in noise performance, suggesting a perceptual advantage to noise degraded speech comprehension conferred by a more gradient listening strategy. At the neural level, P2 amplitudes and latencies of the ERPs were modulated by task and noise; VAS responses were larger and showed greater noise-related latency delays than 2AFC responses. More gradient responders had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy modulates the categorical organization of speech and behavioral success, with more continuous/gradient listening being advantageous to sentential speech in noise perception.
Collapse
Affiliation(s)
- Rose Rizzi
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
2
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
3
|
Zhao G, Zhan Y, Zha J, Cao Y, Zhou F, He L. Abnormal intrinsic brain functional network dynamics in patients with cervical spondylotic myelopathy. Cogn Neurodyn 2023; 17:1201-1211. [PMID: 37786665 PMCID: PMC10542087 DOI: 10.1007/s11571-022-09807-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 03/15/2022] [Accepted: 04/01/2022] [Indexed: 11/03/2022] Open
Abstract
The specific topological changes in dynamic functional networks and their role in cervical spondylotic myelopathy (CSM) brain function reorganization remain unclear. This study aimed to investigate the dynamic functional connection (dFC) of patients with CSM, focusing on the temporal characteristics of the functional connection state patterns and the variability of network topological organization. Eighty-eight patients with CSM and 77 healthy controls (HCs) were recruited for resting-state functional magnetic resonance imaging. We applied the sliding time window analysis method and K-means clustering analysis to capture the dFC variability patterns of the two groups. The graph-theoretical approach was used to investigate the variance in the topological organization of whole-brain functional networks. All participants showed four types of dynamic functional connection states. The mean dwell time in state 2 was significantly different between the two groups. Particularly, the mean dwell time in state 2 was significantly longer in the CSM group than in the healthy control group. Among the four states, switching of relative brain networks mainly included the executive control network (ECN), salience network (SN), default mode network (DMN), language network (LN), visual network (VN), auditory network (AN), precuneus network (PN), and sensorimotor network (SMN). Additionally, the topological properties of the dynamic network were variable in patients with CSM. Dynamic functional connection states may offer new insights into intrinsic functional activities in CSM brain networks. The variance of topological organization may suggest instability of the brain networks in patients with CSM.
Collapse
Affiliation(s)
- Guoshu Zhao
- Department of Radiology, the First Affiliated Hospital of Nanchang University, No. 17 Yongwaizheng Street, Nanchang, Jiangxi 330006 People’s Republic of China
- Neuroimaging Lab, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006 People’s Republic of China
| | - Yaru Zhan
- Department of Radiology, the First Affiliated Hospital of Nanchang University, No. 17 Yongwaizheng Street, Nanchang, Jiangxi 330006 People’s Republic of China
- Neuroimaging Lab, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006 People’s Republic of China
| | - Jing Zha
- The 908th Hospital of Chinese People’s Liberation Army Joint Logistic Support Force, Fuzhou, 330006 People’s Republic of China
| | - Yuan Cao
- Department of Nuclear Medicine, West China Hospital of Sichuan University, Chengdu, 610041 People’s Republic of China
- Huaxi MR Research Center (HMRRC), Department of Radiology, West China Hospital of Sichuan University, Chengdu, 610041 People’s Republic of China
- Neuroimaging Lab, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006 People’s Republic of China
| | - Fuqing Zhou
- Department of Radiology, the First Affiliated Hospital of Nanchang University, No. 17 Yongwaizheng Street, Nanchang, Jiangxi 330006 People’s Republic of China
- Neuroimaging Lab, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006 People’s Republic of China
| | - Laichang He
- Department of Radiology, the First Affiliated Hospital of Nanchang University, No. 17 Yongwaizheng Street, Nanchang, Jiangxi 330006 People’s Republic of China
- Neuroimaging Lab, Jiangxi Province Medical Imaging Research Institute, Nanchang, 330006 People’s Republic of China
| |
Collapse
|
4
|
Carter JA, Bidelman GM. Perceptual warping exposes categorical representations for speech in human brainstem responses. Neuroimage 2023; 269:119899. [PMID: 36720437 PMCID: PMC9992300 DOI: 10.1016/j.neuroimage.2023.119899] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 01/17/2023] [Accepted: 01/22/2023] [Indexed: 01/30/2023] Open
Abstract
The brain transforms continuous acoustic events into discrete category representations to downsample the speech signal for our perceptual-cognitive systems. Such phonetic categories are highly malleable, and their percepts can change depending on surrounding stimulus context. Previous work suggests these acoustic-phonetic mapping and perceptual warping of speech emerge in the brain no earlier than auditory cortex. Here, we examined whether these auditory-category phenomena inherent to speech perception occur even earlier in the human brain, at the level of auditory brainstem. We recorded speech-evoked frequency following responses (FFRs) during a task designed to induce more/less warping of listeners' perceptual categories depending on stimulus presentation order of a speech continuum (random, forward, backward directions). We used a novel clustered stimulus paradigm to rapidly record the high trial counts needed for FFRs concurrent with active behavioral tasks. We found serial stimulus order caused perceptual shifts (hysteresis) near listeners' category boundary confirming identical speech tokens are perceived differentially depending on stimulus context. Critically, we further show neural FFRs during active (but not passive) listening are enhanced for prototypical vs. category-ambiguous tokens and are biased in the direction of listeners' phonetic label even for acoustically-identical speech stimuli. These findings were not observed in the stimulus acoustics nor model FFR responses generated via a computational model of cochlear and auditory nerve transduction, confirming a central origin to the effects. Our data reveal FFRs carry category-level information and suggest top-down processing actively shapes the neural encoding and categorization of speech at subcortical levels. These findings suggest the acoustic-phonetic mapping and perceptual warping in speech perception occur surprisingly early along the auditory neuroaxis, which might aid understanding by reducing ambiguity inherent to the speech signal.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, USA; Division of Clinical Neuroscience, School of Medicine, Hearing Sciences - Scottish Section, University of Nottingham, Glasgow, Scotland, UK
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA; Program in Neuroscience, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
5
|
Bidelman GM, Carter JA. Continuous dynamics in behavior reveal interactions between perceptual warping in categorization and speech-in-noise perception. Front Neurosci 2023; 17:1032369. [PMID: 36937676 PMCID: PMC10014819 DOI: 10.3389/fnins.2023.1032369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Introduction Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context and stimulus precedence; listeners' percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category. Here, we investigated whether such perceptual warping-which amplify categorical hearing-might alter speech processing in noise-degraded listening scenarios. Methods We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Results Listeners' responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. Discussion These data describe interactions between perceptual warping in categorization and speech-in-noise perception: warping strengthens the behavioral attraction to relevant speech categories, making listeners more decisive (though not necessarily more accurate) in their decisions of both clean and noise-degraded speech.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
| | - Jared A. Carter
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Hearing Sciences – Scottish Section, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
6
|
Papatzikis E, Agapaki M, Selvan RN, Pandey V, Zeba F. Quality standards and recommendations for research in music and neuroplasticity. Ann N Y Acad Sci 2023; 1520:20-33. [PMID: 36478395 DOI: 10.1111/nyas.14944] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Research on how music influences brain plasticity has gained momentum in recent years. Considering, however, the nonuniform methodological standards implemented, the findings end up being nonreplicable and less generalizable. To address the need for a standardized baseline of research quality, we gathered all the studies in the music and neuroplasticity field in 2019 and appraised their methodological rigor systematically and critically. The aim was to provide a preliminary and, at the minimum, acceptable quality threshold-and, ipso facto, suggested recommendations-whereupon further discussion and development may take place. Quality appraisal was performed on 89 articles by three independent raters, following a standardized scoring system. The raters' scoring was cross-referenced following an inter-rater reliability measure, and further studied by performing multiple ratings comparisons and matrix analyses. The results for methodological quality were at a quite good level (quantitative articles: mean = 0.737, SD = 0.084; qualitative articles: mean = 0.677, SD = 0.144), following a moderate but statistically significant level of agreement between the raters (W = 0.44, χ2 = 117.249, p = 0.020). We conclude that the standards for implementation and reporting are of high quality; however, certain improvements are needed to reach the stringent levels presumed for such an influential interdisciplinary scientific field.
Collapse
Affiliation(s)
- Efthymios Papatzikis
- Department of Early Childhood Education and Care, Oslo Metropolitan University, Oslo, Norway
| | - Maria Agapaki
- Department of Early Childhood Education and Care, Oslo Metropolitan University, Oslo, Norway
| | - Rosari Naveena Selvan
- Institute for Physics 3 - Biophysics and Bernstein Center for Computational Neuroscience (BCCN), University of Göttingen, Göttingen, Germany.,Department of Psychology, University of Münster, Münster, Germany
| | | | - Fathima Zeba
- School of Humanities and Social Sciences, Manipal Academy of Higher Education Dubai, Dubai, United Arab Emirates
| |
Collapse
|
7
|
Moinuddin KA, Havugimana F, Al-Fahad R, Bidelman GM, Yeasin M. Unraveling Spatial-Spectral Dynamics of Speech Categorization Speed Using Convolutional Neural Networks. Brain Sci 2022; 13:brainsci13010075. [PMID: 36672055 PMCID: PMC9856675 DOI: 10.3390/brainsci13010075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/22/2022] [Accepted: 12/24/2022] [Indexed: 12/31/2022] Open
Abstract
The process of categorizing sounds into distinct phonetic categories is known as categorical perception (CP). Response times (RTs) provide a measure of perceptual difficulty during labeling decisions (i.e., categorization). The RT is quasi-stochastic in nature due to individuality and variations in perceptual tasks. To identify the source of RT variation in CP, we have built models to decode the brain regions and frequency bands driving fast, medium and slow response decision speeds. In particular, we implemented a parameter optimized convolutional neural network (CNN) to classify listeners' behavioral RTs from their neural EEG data. We adopted visual interpretation of model response using Guided-GradCAM to identify spatial-spectral correlates of RT. Our framework includes (but is not limited to): (i) a data augmentation technique designed to reduce noise and control the overall variance of EEG dataset; (ii) bandpower topomaps to learn the spatial-spectral representation using CNN; (iii) large-scale Bayesian hyper-parameter optimization to find best performing CNN model; (iv) ANOVA and posthoc analysis on Guided-GradCAM activation values to measure the effect of neural regions and frequency bands on behavioral responses. Using this framework, we observe that α-β (10-20 Hz) activity over left frontal, right prefrontal/frontal, and right cerebellar regions are correlated with RT variation. Our results indicate that attention, template matching, temporal prediction of acoustics, motor control, and decision uncertainty are the most probable factors in RT variation.
Collapse
Affiliation(s)
| | - Felix Havugimana
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| | - Rakib Al-Fahad
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| | - Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN 47408, USA
| | - Mohammed Yeasin
- Department of EECE, University of Memphis, Memphis, TN 38152, USA
| |
Collapse
|
8
|
Carter JA, Buder EH, Bidelman GM. Nonlinear dynamics in auditory cortical activity reveal the neural basis of perceptual warping in speech categorization. JASA EXPRESS LETTERS 2022; 2:045201. [PMID: 35434716 PMCID: PMC8984957 DOI: 10.1121/10.0009896] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 03/03/2022] [Indexed: 06/14/2023]
Abstract
Surrounding context influences speech listening, resulting in dynamic shifts to category percepts. To examine its neural basis, event-related potentials (ERPs) were recorded during vowel identification with continua presented in random, forward, and backward orders to induce perceptual warping. Behaviorally, sequential order shifted individual listeners' categorical boundary, versus random delivery, revealing perceptual warping (biasing) of the heard phonetic category dependent on recent stimulus history. ERPs revealed later (∼300 ms) activity localized to superior temporal and middle/inferior frontal gyri that predicted listeners' hysteresis/enhanced contrast magnitudes. Findings demonstrate that interactions between frontotemporal brain regions govern top-down, stimulus history effects on speech categorization.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, Tennessee 38152, USA
| | - Eugene H Buder
- School of Communication Sciences and Disorders, University of Memphis, Memphis, Tennessee 38152, USA
| | - Gavin M Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, , Bloomington, Indiana 47408, USA , ,
| |
Collapse
|
9
|
Fan Y, Fang K, Sun R, Shen D, Yang J, Tang Y, Fang G. Hierarchical auditory perception for species discrimination and individual recognition in the music frog. Curr Zool 2021; 68:581-591. [DOI: 10.1093/cz/zoab085] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2021] [Accepted: 10/01/2021] [Indexed: 11/12/2022] Open
Abstract
Abstract
The ability to discriminate species and recognize individuals is crucial for reproductive success and/or survival in most animals. However, the temporal order and neural localization of these decision-making processes has remained unclear. In this study, event-related potentials (ERPs) were measured in the telencephalon, diencephalon, and mesencephalon of the music frog Nidirana daunchina. These ERPs were elicited by calls from 1 group of heterospecifics (recorded from a sympatric anuran species) and 2 groups of conspecifics that differed in their fundamental frequencies. In terms of the polarity and position within the ERP waveform, auditory ERPs generally consist of 4 main components that link to selective attention (N1), stimulus evaluation (P2), identification (N2), and classification (P3). These occur around 100, 200, 250, and 300 ms after stimulus onset, respectively. Our results show that the N1 amplitudes differed significantly between the heterospecific and conspecific calls, but not between the 2 groups of conspecific calls that differed in fundamental frequency. On the other hand, the N2 amplitudes were significantly different between the 2 groups of conspecific calls, suggesting that the music frogs discriminated the species first, followed by individual identification, since N1 and N2 relate to selective attention and stimuli identification, respectively. Moreover, the P2 amplitudes evoked in females were significantly greater than those in males, indicating the existence of sexual dimorphism in auditory discrimination. In addition, both the N1 amplitudes in the left diencephalon and the P2 amplitudes in the left telencephalon were greater than in other brain areas, suggesting left hemispheric dominance in auditory perception. Taken together, our results support the hypothesis that species discrimination and identification of individual characteristics are accomplished sequentially, and that auditory perception exhibits differences between sexes and in spatial dominance.
Collapse
Affiliation(s)
- Yanzhu Fan
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ke Fang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- School of Life Science, Anhui University, Hefei 230601, China
| | - Ruolei Sun
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- School of Life Science, Anhui University, Hefei 230601, China
| | - Di Shen
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Jing Yang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Yezhong Tang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guangzhan Fang
- Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu 610041, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
10
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18. [PMID: 33690177 DOI: 10.1101/2020.08.03.234997] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 05/24/2023]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
11
|
Mahmud MS, Yeasin M, Bidelman GM. Data-driven machine learning models for decoding speech categorization from evoked brain responses. J Neural Eng 2021; 18:10.1088/1741-2552/abecf0. [PMID: 33690177 PMCID: PMC8738965 DOI: 10.1088/1741-2552/abecf0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Accepted: 03/09/2021] [Indexed: 11/12/2022]
Abstract
Objective.Categorical perception (CP) of audio is critical to understand how the human brain perceives speech sounds despite widespread variability in acoustic properties. Here, we investigated the spatiotemporal characteristics of auditory neural activity that reflects CP for speech (i.e. differentiates phonetic prototypes from ambiguous speech sounds).Approach.We recorded 64-channel electroencephalograms as listeners rapidly classified vowel sounds along an acoustic-phonetic continuum. We used support vector machine classifiers and stability selection to determine when and where in the brain CP was best decoded across space and time via source-level analysis of the event-related potentials.Main results. We found that early (120 ms) whole-brain data decoded speech categories (i.e. prototypical vs. ambiguous tokens) with 95.16% accuracy (area under the curve 95.14%;F1-score 95.00%). Separate analyses on left hemisphere (LH) and right hemisphere (RH) responses showed that LH decoding was more accurate and earlier than RH (89.03% vs. 86.45% accuracy; 140 ms vs. 200 ms). Stability (feature) selection identified 13 regions of interest (ROIs) out of 68 brain regions [including auditory cortex, supramarginal gyrus, and inferior frontal gyrus (IFG)] that showed categorical representation during stimulus encoding (0-260 ms). In contrast, 15 ROIs (including fronto-parietal regions, IFG, motor cortex) were necessary to describe later decision stages (later 300-800 ms) of categorization but these areas were highly associated with the strength of listeners' categorical hearing (i.e. slope of behavioral identification functions).Significance.Our data-driven multivariate models demonstrate that abstract categories emerge surprisingly early (∼120 ms) in the time course of speech processing and are dominated by engagement of a relatively compact fronto-temporal-parietal brain network.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States of America
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States of America
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States of America
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, United States of America
| |
Collapse
|
12
|
Mahmud MS, Yeasin M, Bidelman GM. Speech categorization is better described by induced rather than evoked neural activity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1644. [PMID: 33765780 PMCID: PMC8267855 DOI: 10.1121/10.0003572] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Categorical perception (CP) describes how the human brain categorizes speech despite inherent acoustic variability. We examined neural correlates of CP in both evoked and induced electroencephalogram (EEG) activity to evaluate which mode best describes the process of speech categorization. Listeners labeled sounds from a vowel gradient while we recorded their EEGs. Using a source reconstructed EEG, we used band-specific evoked and induced neural activity to build parameter optimized support vector machine models to assess how well listeners' speech categorization could be decoded via whole-brain and hemisphere-specific responses. We found whole-brain evoked β-band activity decoded prototypical from ambiguous speech sounds with ∼70% accuracy. However, induced γ-band oscillations showed better decoding of speech categories with ∼95% accuracy compared to evoked β-band activity (∼70% accuracy). Induced high frequency (γ-band) oscillations dominated CP decoding in the left hemisphere, whereas lower frequencies (θ-band) dominated the decoding in the right hemisphere. Moreover, feature selection identified 14 brain regions carrying induced activity and 22 regions of evoked activity that were most salient in describing category-level speech representations. Among the areas and neural regimes explored, induced γ-band modulations were most strongly associated with listeners' behavioral CP. The data suggest that the category-level organization of speech is dominated by relatively high frequency induced brain rhythms.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, Tennessee 38152, USA
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, 3815 Central Avenue, Memphis, Tennessee 38152, USA
| | - Gavin M Bidelman
- School of Communication Sciences and Disorders, University of Memphis, 4055 North Park Loop, Memphis, Tennessee 38152, USA
| |
Collapse
|
13
|
Carter JA, Bidelman GM. Auditory cortex is susceptible to lexical influence as revealed by informational vs. energetic masking of speech categorization. Brain Res 2021; 1759:147385. [PMID: 33631210 DOI: 10.1016/j.brainres.2021.147385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 02/02/2023]
Abstract
Speech perception requires the grouping of acoustic information into meaningful phonetic units via the process of categorical perception (CP). Environmental masking influences speech perception and CP. However, it remains unclear at which stage of processing (encoding, decision, or both) masking affects listeners' categorization of speech signals. The purpose of this study was to determine whether linguistic interference influences the early acoustic-phonetic conversion process inherent to CP. To this end, we measured source level, event related brain potentials (ERPs) from auditory cortex (AC) and inferior frontal gyrus (IFG) as listeners rapidly categorized speech sounds along a /da/ to /ga/ continuum presented in three listening conditions: quiet, and in the presence of forward (informational masker) and time-reversed (energetic masker) 2-talker babble noise. Maskers were matched in overall SNR and spectral content and thus varied only in their degree of linguistic interference (i.e., informational masking). We hypothesized a differential effect of informational versus energetic masking on behavioral and neural categorization responses, where we predicted increased activation of frontal regions when disambiguating speech from noise, especially during lexical-informational maskers. We found (1) informational masking weakens behavioral speech phoneme identification above and beyond energetic masking; (2) low-level AC activity not only codes speech categories but is susceptible to higher-order lexical interference; (3) identifying speech amidst noise recruits a cross hemispheric circuit (ACleft → IFGright) whose engagement varies according to task difficulty. These findings provide corroborating evidence for top-down influences on the early acoustic-phonetic analysis of speech through a coordinated interplay between frontotemporal brain areas.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA.
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
| |
Collapse
|
14
|
Auditory categorical processing for speech is modulated by inherent musical listening skills. Neuroreport 2021; 31:162-166. [PMID: 31834142 DOI: 10.1097/wnr.0000000000001369] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
During successful auditory perception, the human brain classifies diverse acoustic information into meaningful groupings, a process known as categorical perception (CP). Intense auditory experiences (e.g., musical training and language expertise) shape categorical representations necessary for speech identification and novel sound-to-meaning learning, but little is known concerning the role of innate auditory function in CP. Here, we tested whether listeners vary in their intrinsic abilities to categorize complex sounds and individual differences in the underlying auditory brain mechanisms. To this end, we recorded EEGs in individuals without formal music training but who differed in their inherent auditory perceptual abilities (i.e., musicality) as they rapidly categorized sounds along a speech vowel continuum. Behaviorally, individuals with naturally more adept listening skills ("musical sleepers") showed enhanced speech categorization in the form of faster identification. At the neural level, inverse modeling parsed EEG data into different sources to evaluate the contribution of region-specific activity [i.e., auditory cortex (AC)] to categorical neural coding. We found stronger categorical processing in musical sleepers around the timeframe of P2 (~180 ms) in the right AC compared to those with poorer musical listening abilities. Our data show that listeners with naturally more adept auditory skills map sound to meaning more efficiently than their peers, which may aid novel sound learning related to language and music acquisition.
Collapse
|
15
|
Bidelman GM, Pearson C, Harrison A. Lexical Influences on Categorical Speech Perception Are Driven by a Temporoparietal Circuit. J Cogn Neurosci 2021; 33:840-852. [PMID: 33464162 DOI: 10.1162/jocn_a_01678] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
Categorical judgments of otherwise identical phonemes are biased toward hearing words (i.e., "Ganong effect") suggesting lexical context influences perception of even basic speech primitives. Lexical biasing could manifest via late stage postperceptual mechanisms related to decision or, alternatively, top-down linguistic inference that acts on early perceptual coding. Here, we exploited the temporal sensitivity of EEG to resolve the spatiotemporal dynamics of these context-related influences on speech categorization. Listeners rapidly classified sounds from a /gɪ/-/kɪ/ gradient presented in opposing word-nonword contexts (GIFT-kift vs. giss-KISS), designed to bias perception toward lexical items. Phonetic perception shifted toward the direction of words, establishing a robust Ganong effect behaviorally. ERPs revealed a neural analog of lexical biasing emerging within ~200 msec. Source analyses uncovered a distributed neural network supporting the Ganong including middle temporal gyrus, inferior parietal lobe, and middle frontal cortex. Yet, among Ganong-sensitive regions, only left middle temporal gyrus and inferior parietal lobe predicted behavioral susceptibility to lexical influence. Our findings confirm lexical status rapidly constrains sublexical categorical representations for speech within several hundred milliseconds but likely does so outside the purview of canonical auditory-sensory brain areas.
Collapse
Affiliation(s)
- Gavin M Bidelman
- University of Memphis, TN.,University of Tennessee Health Sciences Center, Memphis, TN
| | | | | |
Collapse
|
16
|
Mahmud MS, Ahmed F, Al-Fahad R, Moinuddin KA, Yeasin M, Alain C, Bidelman GM. Decoding Hearing-Related Changes in Older Adults' Spatiotemporal Neural Processing of Speech Using Machine Learning. Front Neurosci 2020; 14:748. [PMID: 32765215 PMCID: PMC7378401 DOI: 10.3389/fnins.2020.00748] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Accepted: 06/25/2020] [Indexed: 12/25/2022] Open
Abstract
Speech perception in noisy environments depends on complex interactions between sensory and cognitive systems. In older adults, such interactions may be affected, especially in those individuals who have more severe age-related hearing loss. Using a data-driven approach, we assessed the temporal (when in time) and spatial (where in the brain) characteristics of cortical speech-evoked responses that distinguish older adults with or without mild hearing loss. We performed source analyses to estimate cortical surface signals from the EEG recordings during a phoneme discrimination task conducted under clear and noise-degraded conditions. We computed source-level ERPs (i.e., mean activation within each ROI) from each of the 68 ROIs of the Desikan-Killiany (DK) atlas, averaged over a randomly chosen 100 trials without replacement to form feature vectors. We adopted a multivariate feature selection method called stability selection and control to choose features that are consistent over a range of model parameters. We use parameter optimized support vector machine (SVM) as a classifiers to investigate the time course and brain regions that segregate groups and speech clarity. For clear speech perception, whole-brain data revealed a classification accuracy of 81.50% [area under the curve (AUC) 80.73%; F1-score 82.00%], distinguishing groups within ∼60 ms after speech onset (i.e., as early as the P1 wave). We observed lower accuracy of 78.12% [AUC 77.64%; F1-score 78.00%] and delayed classification performance when speech was embedded in noise, with group segregation at 80 ms. Separate analysis using left (LH) and right hemisphere (RH) regions showed that LH speech activity was better at distinguishing hearing groups than activity measured in the RH. Moreover, stability selection analysis identified 12 brain regions (among 1428 total spatiotemporal features from 68 regions) where source activity segregated groups with >80% accuracy (clear speech); whereas 16 regions were critical for noise-degraded speech to achieve a comparable level of group segregation (78.7% accuracy). Our results identify critical time-courses and brain regions that distinguish mild hearing loss from normal hearing in older adults and confirm a larger number of active areas, particularly in RH, when processing noise-degraded speech information.
Collapse
Affiliation(s)
- Md Sultan Mahmud
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Faruk Ahmed
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Rakib Al-Fahad
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Kazi Ashraf Moinuddin
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, The University of Memphis, Memphis, TN, United States
| | - Claude Alain
- Rotman Research Institute-Baycrest Centre for Geriatric Care, Toronto, ON, Canada.,Department of Psychology, University of Toronto, Toronto, ON, Canada.,Institute of Medical Sciences, University of Toronto, Toronto, ON, Canada
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Science Center, Memphis, TN, United States
| |
Collapse
|
17
|
Bidelman GM, Bush LC, Boudreaux AM. Effects of Noise on the Behavioral and Neural Categorization of Speech. Front Neurosci 2020; 14:153. [PMID: 32180700 PMCID: PMC7057933 DOI: 10.3389/fnins.2020.00153] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 02/10/2020] [Indexed: 02/02/2023] Open
Abstract
We investigated whether the categorical perception (CP) of speech might also provide a mechanism that aids its perception in noise. We varied signal-to-noise ratio (SNR) [clear, 0 dB, -5 dB] while listeners classified an acoustic-phonetic continuum (/u/ to /a/). Noise-related changes in behavioral categorization were only observed at the lowest SNR. Event-related brain potentials (ERPs) differentiated category vs. category-ambiguous speech by the P2 wave (~180-320 ms). Paralleling behavior, neural responses to speech with clear phonetic status (i.e., continuum endpoints) were robust to noise down to -5 dB SNR, whereas responses to ambiguous tokens declined with decreasing SNR. Results demonstrate that phonetic speech representations are more resistant to degradation than corresponding acoustic representations. Findings suggest the mere process of binning speech sounds into categories provides a robust mechanism to aid figure-ground speech perception by fortifying abstract categories from the acoustic signal and making the speech code more resistant to external interferences.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Sciences Center, Memphis, TN, United States
| | - Lauren C Bush
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| | - Alex M Boudreaux
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| |
Collapse
|
18
|
Al-Fahad R, Yeasin M, Bidelman GM. Decoding of single-trial EEG reveals unique states of functional brain connectivity that drive rapid speech categorization decisions. J Neural Eng 2020; 17:016045. [PMID: 31822643 PMCID: PMC7004853 DOI: 10.1088/1741-2552/ab6040] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
OBJECTIVE Categorical perception (CP) is an inherent property of speech perception. The response time (RT) of listeners' perceptual speech identification is highly sensitive to individual differences. While the neural correlates of CP have been well studied in terms of the regional contributions of the brain to behavior, functional connectivity patterns that signify individual differences in listeners' speed (RT) for speech categorization is less clear. In this study, we introduce a novel approach to address these questions. APPROACH We applied several computational approaches to the EEG, including graph mining, machine learning (i.e., support vector machine), and stability selection to investigate the unique brain states (functional neural connectivity) that predict the speed of listeners' behavioral decisions. MAIN RESULTS We infer that (i) the listeners' perceptual speed is directly related to dynamic variations in their brain connectomics, (ii) global network assortativity and efficiency distinguished fast, medium, and slow RTs, (iii) the functional network underlying speeded decisions increases in negative assortativity (i.e., became disassortative) for slower RTs, (iv) slower categorical speech decisions cause excessive use of neural resources and more aberrant information flow within the CP circuitry, (v) slower responders tended to utilize functional brain networks excessively (or inappropriately) whereas fast responders (with lower global efficiency) utilized the same neural pathways but with more restricted organization. SIGNIFICANCE Findings show that neural classifiers (SVM) coupled with stability selection correctly classify behavioral RTs from functional connectivity alone with over 92% accuracy (AUC = 0.9). Our results corroborate previous studies by supporting the engagement of similar temporal (STG), parietal, motor, and prefrontal regions in CP using an entirely data-driven approach.
Collapse
Affiliation(s)
- Rakib Al-Fahad
- Department of Electrical and Computer Engineering, University of Memphis, Memphis, 38152 TN, USA
| | - Mohammed Yeasin
- Department of Electrical and Computer Engineering, University of Memphis, Memphis, 38152 TN, USA
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
| | - Gavin M. Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA
- School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA
| |
Collapse
|
19
|
Bidelman GM, Myers MH. Frontal cortex selectively overrides auditory processing to bias perception for looming sonic motion. Brain Res 2019; 1726:146507. [PMID: 31606413 DOI: 10.1016/j.brainres.2019.146507] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2019] [Revised: 10/02/2019] [Accepted: 10/09/2019] [Indexed: 12/18/2022]
Abstract
Rising intensity sounds signal approaching objects traveling toward an observer. A variety of species preferentially respond to looming over receding auditory motion, reflecting an evolutionary perceptual bias for recognizing approaching threats. We probed the neural origins of this stark perceptual anisotropy to reveal how the brain creates privilege for auditory looming events. While recording neural activity via electroencephalography (EEG), human listeners rapidly judged whether dynamic (intensity varying) tones were looming or receding in percept. Behaviorally, listeners responded faster to auditory looms confirming a perceptual bias for approaching signals. EEG source analysis revealed sensory activation localized to primary auditory cortex (PAC) and decision-related activity in prefrontal cortex (PFC) within 200 ms after sound onset followed by additional expansive PFC activation by 500 ms. Notably, early PFC (but not PAC) activity rapidly differentiated looming and receding stimuli and this effect roughly co-occurred with sound arrival in auditory cortex. Brain-behavior correlations revealed an association between PFC neural latencies and listeners' speed of sonic motion judgments. Directed functional connectivity revealed stronger information flow from PFC → PAC during looming vs. receding sounds. Our electrophysiological data reveal a critical, previously undocumented role of prefrontal cortex in judging dynamic sonic motion. Both faster neural bias and a functional override of obligatory sensory processing via selective, directional PFC signaling toward auditory system establish the perceptual privilege for approaching looming sounds.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
| | - Mark H Myers
- University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA
| |
Collapse
|