1
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
2
|
Bidelman GM, Bernard F, Skubic K. Hearing in categories aids speech streaming at the "cocktail party". BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.03.587795. [PMID: 38617284 PMCID: PMC11014555 DOI: 10.1101/2024.04.03.587795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/16/2024]
Abstract
Our perceptual system bins elements of the speech signal into categories to make speech perception manageable. Here, we aimed to test whether hearing speech in categories (as opposed to a continuous/gradient fashion) affords yet another benefit to speech recognition: parsing noisy speech at the "cocktail party." We measured speech recognition in a simulated 3D cocktail party environment. We manipulated task difficulty by varying the number of additional maskers presented at other spatial locations in the horizontal soundfield (1-4 talkers) and via forward vs. time-reversed maskers, promoting more and less informational masking (IM), respectively. In separate tasks, we measured isolated phoneme categorization using two-alternative forced choice (2AFC) and visual analog scaling (VAS) tasks designed to promote more/less categorical hearing and thus test putative links between categorization and real-world speech-in-noise skills. We first show that listeners can only monitor up to ~3 talkers despite up to 5 in the soundscape and streaming is not related to extended high-frequency hearing thresholds (though QuickSIN scores are). We then confirm speech streaming accuracy and speed decline with additional competing talkers and amidst forward compared to reverse maskers with added IM. Dividing listeners into "discrete" vs. "continuous" categorizers based on their VAS labeling (i.e., whether responses were binary or continuous judgments), we then show the degree of IM experienced at the cocktail party is predicted by their degree of categoricity in phoneme labeling; more discrete listeners are less susceptible to IM than their gradient responding peers. Our results establish a link between speech categorization skills and cocktail party processing, with a categorical (rather than gradient) listening strategy benefiting degraded speech perception. These findings imply figure-ground deficits common in many disorders might arise through a surprisingly simple mechanism: a failure to properly bin sounds into categories.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, USA
- Program in Neuroscience, Indiana University, Bloomington, IN, USA
- Cognitive Science Program, Indiana University, Bloomington, IN, USA
| | - Fallon Bernard
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| | - Kimberly Skubic
- School of Communication Sciences & Disorders, University of Memphis, Memphis TN, USA
| |
Collapse
|
3
|
Bidelman GM, Carter JA. Continuous dynamics in behavior reveal interactions between perceptual warping in categorization and speech-in-noise perception. Front Neurosci 2023; 17:1032369. [PMID: 36937676 PMCID: PMC10014819 DOI: 10.3389/fnins.2023.1032369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 02/14/2023] [Indexed: 03/05/2023] Open
Abstract
Introduction Spoken language comprehension requires listeners map continuous features of the speech signal to discrete category labels. Categories are however malleable to surrounding context and stimulus precedence; listeners' percept can dynamically shift depending on the sequencing of adjacent stimuli resulting in a warping of the heard phonetic category. Here, we investigated whether such perceptual warping-which amplify categorical hearing-might alter speech processing in noise-degraded listening scenarios. Methods We measured continuous dynamics in perception and category judgments of an acoustic-phonetic vowel gradient via mouse tracking. Tokens were presented in serial vs. random orders to induce more/less perceptual warping while listeners categorized continua in clean and noise conditions. Results Listeners' responses were faster and their mouse trajectories closer to the ultimate behavioral selection (marked visually on the screen) in serial vs. random order, suggesting increased perceptual attraction to category exemplars. Interestingly, order effects emerged earlier and persisted later in the trial time course when categorizing speech in noise. Discussion These data describe interactions between perceptual warping in categorization and speech-in-noise perception: warping strengthens the behavioral attraction to relevant speech categories, making listeners more decisive (though not necessarily more accurate) in their decisions of both clean and noise-degraded speech.
Collapse
Affiliation(s)
- Gavin M. Bidelman
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, IN, United States
- Program in Neuroscience, Indiana University, Bloomington, IN, United States
| | - Jared A. Carter
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
- Hearing Sciences – Scottish Section, Division of Clinical Neuroscience, School of Medicine, University of Nottingham, Glasgow, United Kingdom
| |
Collapse
|
4
|
Carter JA, Bidelman GM. Auditory cortex is susceptible to lexical influence as revealed by informational vs. energetic masking of speech categorization. Brain Res 2021; 1759:147385. [PMID: 33631210 DOI: 10.1016/j.brainres.2021.147385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2020] [Revised: 02/15/2021] [Accepted: 02/16/2021] [Indexed: 02/02/2023]
Abstract
Speech perception requires the grouping of acoustic information into meaningful phonetic units via the process of categorical perception (CP). Environmental masking influences speech perception and CP. However, it remains unclear at which stage of processing (encoding, decision, or both) masking affects listeners' categorization of speech signals. The purpose of this study was to determine whether linguistic interference influences the early acoustic-phonetic conversion process inherent to CP. To this end, we measured source level, event related brain potentials (ERPs) from auditory cortex (AC) and inferior frontal gyrus (IFG) as listeners rapidly categorized speech sounds along a /da/ to /ga/ continuum presented in three listening conditions: quiet, and in the presence of forward (informational masker) and time-reversed (energetic masker) 2-talker babble noise. Maskers were matched in overall SNR and spectral content and thus varied only in their degree of linguistic interference (i.e., informational masking). We hypothesized a differential effect of informational versus energetic masking on behavioral and neural categorization responses, where we predicted increased activation of frontal regions when disambiguating speech from noise, especially during lexical-informational maskers. We found (1) informational masking weakens behavioral speech phoneme identification above and beyond energetic masking; (2) low-level AC activity not only codes speech categories but is susceptible to higher-order lexical interference; (3) identifying speech amidst noise recruits a cross hemispheric circuit (ACleft → IFGright) whose engagement varies according to task difficulty. These findings provide corroborating evidence for top-down influences on the early acoustic-phonetic analysis of speech through a coordinated interplay between frontotemporal brain areas.
Collapse
Affiliation(s)
- Jared A Carter
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA.
| | - Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
| |
Collapse
|
5
|
Auditory categorical processing for speech is modulated by inherent musical listening skills. Neuroreport 2021; 31:162-166. [PMID: 31834142 DOI: 10.1097/wnr.0000000000001369] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
During successful auditory perception, the human brain classifies diverse acoustic information into meaningful groupings, a process known as categorical perception (CP). Intense auditory experiences (e.g., musical training and language expertise) shape categorical representations necessary for speech identification and novel sound-to-meaning learning, but little is known concerning the role of innate auditory function in CP. Here, we tested whether listeners vary in their intrinsic abilities to categorize complex sounds and individual differences in the underlying auditory brain mechanisms. To this end, we recorded EEGs in individuals without formal music training but who differed in their inherent auditory perceptual abilities (i.e., musicality) as they rapidly categorized sounds along a speech vowel continuum. Behaviorally, individuals with naturally more adept listening skills ("musical sleepers") showed enhanced speech categorization in the form of faster identification. At the neural level, inverse modeling parsed EEG data into different sources to evaluate the contribution of region-specific activity [i.e., auditory cortex (AC)] to categorical neural coding. We found stronger categorical processing in musical sleepers around the timeframe of P2 (~180 ms) in the right AC compared to those with poorer musical listening abilities. Our data show that listeners with naturally more adept auditory skills map sound to meaning more efficiently than their peers, which may aid novel sound learning related to language and music acquisition.
Collapse
|
6
|
Bidelman GM, Bush LC, Boudreaux AM. Effects of Noise on the Behavioral and Neural Categorization of Speech. Front Neurosci 2020; 14:153. [PMID: 32180700 PMCID: PMC7057933 DOI: 10.3389/fnins.2020.00153] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 02/10/2020] [Indexed: 02/02/2023] Open
Abstract
We investigated whether the categorical perception (CP) of speech might also provide a mechanism that aids its perception in noise. We varied signal-to-noise ratio (SNR) [clear, 0 dB, -5 dB] while listeners classified an acoustic-phonetic continuum (/u/ to /a/). Noise-related changes in behavioral categorization were only observed at the lowest SNR. Event-related brain potentials (ERPs) differentiated category vs. category-ambiguous speech by the P2 wave (~180-320 ms). Paralleling behavior, neural responses to speech with clear phonetic status (i.e., continuum endpoints) were robust to noise down to -5 dB SNR, whereas responses to ambiguous tokens declined with decreasing SNR. Results demonstrate that phonetic speech representations are more resistant to degradation than corresponding acoustic representations. Findings suggest the mere process of binning speech sounds into categories provides a robust mechanism to aid figure-ground speech perception by fortifying abstract categories from the acoustic signal and making the speech code more resistant to external interferences.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Sciences Center, Memphis, TN, United States
| | - Lauren C Bush
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| | - Alex M Boudreaux
- School of Communication Sciences and Disorders, University of Memphis, Memphis, TN, United States
| |
Collapse
|
7
|
Lewis GA, Bidelman GM. Autonomic Nervous System Correlates of Speech Categorization Revealed Through Pupillometry. Front Neurosci 2020; 13:1418. [PMID: 31998068 PMCID: PMC6967406 DOI: 10.3389/fnins.2019.01418] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 12/16/2019] [Indexed: 02/06/2023] Open
Abstract
Human perception requires the many-to-one mapping between continuous sensory elements and discrete categorical representations. This grouping operation underlies the phenomenon of categorical perception (CP)-the experience of perceiving discrete categories rather than gradual variations in signal input. Speech perception requires CP because acoustic cues do not share constant relations with perceptual-phonetic representations. Beyond facilitating perception of unmasked speech, we reasoned CP might also aid the extraction of target speech percepts from interfering sound sources (i.e., noise) by generating additional perceptual constancy and reducing listening effort. Specifically, we investigated how noise interference impacts cognitive load and perceptual identification of unambiguous (i.e., categorical) vs. ambiguous stimuli. Listeners classified a speech vowel continuum (/u/-/a/) at various signal-to-noise ratios (SNRs [unmasked, 0 and -5 dB]). Continuous recordings of pupil dilation measured processing effort, with larger, later dilations reflecting increased listening demand. Critical comparisons were between time-locked changes in eye data in response to unambiguous (i.e., continuum endpoints) tokens vs. ambiguous tokens (i.e., continuum midpoint). Unmasked speech elicited faster responses and sharper psychometric functions, which steadily declined in noise. Noise increased pupil dilation across stimulus conditions, but not straightforwardly. Noise-masked speech modulated peak pupil size (i.e., [0 and -5 dB] > unmasked). In contrast, peak dilation latency varied with both token and SNR. Interestingly, categorical tokens elicited earlier pupil dilation relative to ambiguous tokens. Our pupillary data suggest CP reconstructs auditory percepts under challenging listening conditions through interactions between stimulus salience and listeners' internalized effort and/or arousal.
Collapse
Affiliation(s)
- Gwyneth A Lewis
- Institute for Intelligent Systems, The University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN, United States
| | - Gavin M Bidelman
- Institute for Intelligent Systems, The University of Memphis, Memphis, TN, United States.,School of Communication Sciences and Disorders, The University of Memphis, Memphis, TN, United States.,Department of Anatomy and Neurobiology, University of Tennessee Health Sciences Center, Memphis, TN, United States
| |
Collapse
|
8
|
Bidelman GM, Walker B. Plasticity in auditory categorization is supported by differential engagement of the auditory-linguistic network. Neuroimage 2019; 201:116022. [PMID: 31310863 DOI: 10.1016/j.neuroimage.2019.116022] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2019] [Revised: 06/30/2019] [Accepted: 07/12/2019] [Indexed: 12/21/2022] Open
Abstract
To construct our perceptual world, the brain categorizes variable sensory cues into behaviorally-relevant groupings. Categorical representations are apparent within a distributed fronto-temporo-parietal brain network but how this neural circuitry is shaped by experience remains undefined. Here, we asked whether speech and music categories might be formed within different auditory-linguistic brain regions depending on listeners' auditory expertise. We recorded EEG in highly skilled (musicians) vs. less experienced (nonmusicians) perceivers as they rapidly categorized speech and musical sounds. Musicians showed perceptual enhancements across domains, yet source EEG data revealed a double dissociation in the neurobiological mechanisms supporting categorization between groups. Whereas musicians coded categories in primary auditory cortex (PAC), nonmusicians recruited non-auditory regions (e.g., inferior frontal gyrus, IFG) to generate category-level information. Functional connectivity confirmed nonmusicians' increased left IFG involvement reflects stronger routing of signal from PAC directed to IFG, presumably because sensory coding is insufficient to construct categories in less experienced listeners. Our findings establish auditory experience modulates specific engagement and inter-regional communication in the auditory-linguistic network supporting categorical perception. Whereas early canonical PAC representations are sufficient to generate categories in highly trained ears, less experienced perceivers broadcast information downstream to higher-order linguistic brain areas (IFG) to construct abstract sound labels.
Collapse
Affiliation(s)
- Gavin M Bidelman
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; School of Communication Sciences & Disorders, University of Memphis, Memphis, TN, USA; University of Tennessee Health Sciences Center, Department of Anatomy and Neurobiology, Memphis, TN, USA.
| | - Breya Walker
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, USA; Department of Psychology, University of Memphis, Memphis, TN, USA; Department of Mathematical Sciences, University of Memphis, Memphis, TN, USA
| |
Collapse
|