1
|
Crinnion AM, Heffner CC, Myers EB. Individual differences in the use of top-down versus bottom-up cues to resolve phonetic ambiguity. Atten Percept Psychophys 2024:10.3758/s13414-024-02889-4. [PMID: 38811489 DOI: 10.3758/s13414-024-02889-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2024] [Indexed: 05/31/2024]
Abstract
How listeners weight a wide variety of information to interpret ambiguities in the speech signal is a question of interest in speech perception, particularly when understanding how listeners process speech in the context of phrases or sentences. Dominant views of cue use for language comprehension posit that listeners integrate multiple sources of information to interpret ambiguities in the speech signal. Here, we study how semantic context, sentence rate, and vowel length all influence identification of word-final stops. We find that while at the group level all sources of information appear to influence how listeners interpret ambiguities in speech, at the level of the individual listener, we observe systematic differences in cue reliance, such that some individual listeners favor certain cues (e.g., speech rate and vowel length) to the exclusion of others (e.g., semantic context). While listeners exhibit a range of cue preferences, across participants we find a negative relationship between individuals' weighting of semantic and acoustic-phonetic (sentence rate, vowel length) cues. Additionally, we find that these weightings are stable within individuals over a period of 1 month. Taken as a whole, these findings suggest that theories of cue integration and speech processing may fail to capture the rich individual differences that exist between listeners, which could arise due to mechanistic differences between individuals in speech perception.
Collapse
|
2
|
Rizzi R, Bidelman GM. Functional benefits of continuous vs. categorical listening strategies on the neural encoding and perception of noise-degraded speech. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594387. [PMID: 38798410 PMCID: PMC11118460 DOI: 10.1101/2024.05.15.594387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Acoustic information in speech changes continuously, yet listeners form discrete perceptual categories to ease the demands of perception. Being a more continuous/gradient as opposed to a discrete/categorical listener may be further advantageous for understanding speech in noise by increasing perceptual flexibility and resolving ambiguity. The degree to which a listener's responses to a continuum of speech sounds are categorical versus continuous can be quantified using visual analog scaling (VAS) during speech labeling tasks. Here, we recorded event-related brain potentials (ERPs) to vowels along an acoustic-phonetic continuum (/u/ to /a/) while listeners categorized phonemes in both clean and noise conditions. Behavior was assessed using standard two alternative forced choice (2AFC) and VAS paradigms to evaluate categorization under task structures that promote discrete (2AFC) vs. continuous (VAS) hearing, respectively. Behaviorally, identification curves were steeper under 2AFC vs. VAS categorization but were relatively immune to noise, suggesting robust access to abstract, phonetic categories even under signal degradation. Behavioral slopes were positively correlated with listeners' QuickSIN scores, suggesting a behavioral advantage for speech in noise comprehension conferred by gradient listening strategy. At the neural level, electrode level data revealed P2 peak amplitudes of the ERPs were modulated by task and noise; responses were larger under VAS vs. 2AFC categorization and showed larger noise-related delay in latency in the VAS vs. 2AFC condition. More gradient responders also had smaller shifts in ERP latency with noise, suggesting their neural encoding of speech was more resilient to noise degradation. Interestingly, source-resolved ERPs showed that more gradient listening was also correlated with stronger neural responses in left superior temporal gyrus. Our results demonstrate that listening strategy (i.e., being a discrete vs. continuous listener) modulates the categorical organization of speech and behavioral success, with continuous/gradient listening being more advantageous to speech in noise perception.
Collapse
|
3
|
Sorensen E, Oleson J, Kutlu E, McMurray B. A Bayesian hierarchical model for the analysis of visual analogue scaling tasks. Stat Methods Med Res 2024:9622802241242319. [PMID: 38573790 DOI: 10.1177/09622802241242319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
In psychophysics and psychometrics, an integral method to the discipline involves charting how a person's response pattern changes according to a continuum of stimuli. For instance, in hearing science, Visual Analog Scaling tasks are experiments in which listeners hear sounds across a speech continuum and give a numeric rating between 0 and 100 conveying whether the sound they heard was more like word "a" or more like word "b" (i.e. each participant is giving a continuous categorization response). By taking all the continuous categorization responses across the speech continuum, a parametric curve model can be fit to the data and used to analyze any individual's response pattern by speech continuum. Standard statistical modeling techniques are not able to accommodate all of the specific requirements needed to analyze these data. Thus, Bayesian hierarchical modeling techniques are employed to accommodate group-level non-linear curves, individual-specific non-linear curves, continuum-level random effects, and a subject-specific variance that is predicted by other model parameters. In this paper, a Bayesian hierarchical model is constructed to model the data from a Visual Analog Scaling task study of mono-lingual and bi-lingual participants. Any nonlinear curve function could be used and we demonstrate the technique using the 4-parameter logistic function. Overall, the model was found to fit particularly well to the data from the study and results suggested that the magnitude of the slope was what most defined the differences in response patterns between continua.
Collapse
Affiliation(s)
- Eldon Sorensen
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, USA
- Department of Linguistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
4
|
Sarrett ME, Toscano JC. Decoding speech sounds from neurophysiological data: Practical considerations and theoretical implications. Psychophysiology 2024; 61:e14475. [PMID: 37947235 DOI: 10.1111/psyp.14475] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Revised: 09/13/2023] [Accepted: 10/04/2023] [Indexed: 11/12/2023]
Abstract
Machine learning techniques have proven to be a useful tool in cognitive neuroscience. However, their implementation in scalp-recorded electroencephalography (EEG) is relatively limited. To address this, we present three analyses using data from a previous study that examined event-related potential (ERP) responses to a wide range of naturally-produced speech sounds. First, we explore which features of the EEG signal best maximize machine learning accuracy for a voicing distinction, using a support vector machine (SVM). We manipulate three dimensions of the EEG signal as input to the SVM: number of trials averaged, number of time points averaged, and polynomial fit. We discuss the trade-offs in using different feature sets and offer some recommendations for researchers using machine learning. Next, we use SVMs to classify specific pairs of phonemes, finding that we can detect differences in the EEG signal that are not otherwise detectable using conventional ERP analyses. Finally, we characterize the timecourse of phonetic feature decoding across three phonological dimensions (voicing, manner of articulation, and place of articulation), and find that voicing and manner are decodable from neural activity, whereas place of articulation is not. This set of analyses addresses both practical considerations in the application of machine learning to EEG, particularly for speech studies, and also sheds light on current issues regarding the nature of perceptual representations of speech.
Collapse
Affiliation(s)
- McCall E Sarrett
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
- Psychology Department, Gonzaga University, Spokane, Washington, USA
| | - Joseph C Toscano
- Department of Psychological and Brain Sciences, Villanova University, Villanova, Pennsylvania, USA
| |
Collapse
|
5
|
Mai A, Riès S, Ben-Haim S, Shih JJ, Gentner TQ. Acoustic and language-specific sources for phonemic abstraction from speech. Nat Commun 2024; 15:677. [PMID: 38263364 PMCID: PMC10805762 DOI: 10.1038/s41467-024-44844-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 01/03/2024] [Indexed: 01/25/2024] Open
Abstract
Spoken language comprehension requires abstraction of linguistic information from speech, but the interaction between auditory and linguistic processing of speech remains poorly understood. Here, we investigate the nature of this abstraction using neural responses recorded intracranially while participants listened to conversational English speech. Capitalizing on multiple, language-specific patterns where phonological and acoustic information diverge, we demonstrate the causal efficacy of the phoneme as a unit of analysis and dissociate the unique contributions of phonemic and spectrographic information to neural responses. Quantitive higher-order response models also reveal that unique contributions of phonological information are carried in the covariance structure of the stimulus-response relationship. This suggests that linguistic abstraction is shaped by neurobiological mechanisms that involve integration across multiple spectro-temporal features and prior phonological information. These results link speech acoustics to phonology and morphosyntax, substantiating predictions about abstractness in linguistic theory and providing evidence for the acoustic features that support that abstraction.
Collapse
Affiliation(s)
- Anna Mai
- University of California, San Diego, Linguistics, 9500 Gilman Dr., La Jolla, CA, 92093, USA.
| | - Stephanie Riès
- San Diego State University, School of Speech, Language, and Hearing Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
- San Diego State University, Center for Clinical and Cognitive Sciences, 5500 Campanile Drive, San Diego, CA, 92182, USA
| | - Sharona Ben-Haim
- University of California, San Diego, Neurological Surgery, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Jerry J Shih
- University of California, San Diego, Neurosciences, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| | - Timothy Q Gentner
- University of California, San Diego, Psychology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Neurobiology, 9500 Gilman Dr., La Jolla, CA, 92093, USA
- University of California, San Diego, Kavli Institute for Brain and Mind, 9500 Gilman Dr., La Jolla, CA, 92093, USA
| |
Collapse
|
6
|
Kocsis Z, Jenison RL, Taylor PN, Calmus RM, McMurray B, Rhone AE, Sarrett ME, Deifelt Streese C, Kikuchi Y, Gander PE, Berger JI, Kovach CK, Choi I, Greenlee JD, Kawasaki H, Cope TE, Griffiths TD, Howard MA, Petkov CI. Immediate neural impact and incomplete compensation after semantic hub disconnection. Nat Commun 2023; 14:6264. [PMID: 37805497 PMCID: PMC10560235 DOI: 10.1038/s41467-023-42088-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 09/28/2023] [Indexed: 10/09/2023] Open
Abstract
The human brain extracts meaning using an extensive neural system for semantic knowledge. Whether broadly distributed systems depend on or can compensate after losing a highly interconnected hub is controversial. We report intracranial recordings from two patients during a speech prediction task, obtained minutes before and after neurosurgical treatment requiring disconnection of the left anterior temporal lobe (ATL), a candidate semantic knowledge hub. Informed by modern diaschisis and predictive coding frameworks, we tested hypotheses ranging from solely neural network disruption to complete compensation by the indirectly affected language-related and speech-processing sites. Immediately after ATL disconnection, we observed neurophysiological alterations in the recorded frontal and auditory sites, providing direct evidence for the importance of the ATL as a semantic hub. We also obtained evidence for rapid, albeit incomplete, attempts at neural network compensation, with neural impact largely in the forms stipulated by the predictive coding framework, in specificity, and the modern diaschisis framework, more generally. The overall results validate these frameworks and reveal an immediate impact and capability of the human brain to adjust after losing a brain hub.
Collapse
Affiliation(s)
- Zsuzsanna Kocsis
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA.
- Biosciences Institute, Newcastle University Medical School, Newcastle upon Tyne, UK.
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Rick L Jenison
- Departments of Neuroscience and Psychology, University of Wisconsin, Madison, WI, USA
| | - Peter N Taylor
- CNNP Lab, Interdisciplinary Computing and Complex BioSystems Group, School of Computing, Newcastle University, Newcastle upon Tyne, UK
- UCL Institute of Neurology, Queen Square, London, UK
| | - Ryan M Calmus
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
- Biosciences Institute, Newcastle University Medical School, Newcastle upon Tyne, UK
| | - Bob McMurray
- Department of Psychological and Brain Science, University of Iowa, Iowa City, IA, USA
| | - Ariane E Rhone
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
| | | | | | - Yukiko Kikuchi
- Biosciences Institute, Newcastle University Medical School, Newcastle upon Tyne, UK
| | - Phillip E Gander
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
- Department of Radiology, University of Iowa, Iowa City, IA, USA
- Iowa Neuroscience Institute, University of Iowa, Iowa City, IA, USA
| | - Joel I Berger
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
| | | | - Inyong Choi
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA
| | | | - Hiroto Kawasaki
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
| | - Thomas E Cope
- Department of Clinical Neurosciences, Cambridge University, Cambridge, UK
- MRC Cognition and Brain Sciences Unit, Cambridge University, Cambridge, UK
| | - Timothy D Griffiths
- Biosciences Institute, Newcastle University Medical School, Newcastle upon Tyne, UK
| | - Matthew A Howard
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA
| | - Christopher I Petkov
- Department of Neurosurgery, University of Iowa, Iowa City, IA, USA.
- Biosciences Institute, Newcastle University Medical School, Newcastle upon Tyne, UK.
| |
Collapse
|
7
|
Berger JI, Gander PE, Kim S, Schwalje AT, Woo J, Na YM, Holmes A, Hong JM, Dunn CC, Hansen MR, Gantz BJ, McMurray B, Griffiths TD, Choi I. Neural Correlates of Individual Differences in Speech-in-Noise Performance in a Large Cohort of Cochlear Implant Users. Ear Hear 2023; 44:1107-1120. [PMID: 37144890 PMCID: PMC10426791 DOI: 10.1097/aud.0000000000001357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 01/11/2023] [Indexed: 05/06/2023]
Abstract
OBJECTIVES Understanding speech-in-noise (SiN) is a complex task that recruits multiple cortical subsystems. Individuals vary in their ability to understand SiN. This cannot be explained by simple peripheral hearing profiles, but recent work by our group ( Kim et al. 2021 , Neuroimage ) highlighted central neural factors underlying the variance in SiN ability in normal hearing (NH) subjects. The present study examined neural predictors of SiN ability in a large cohort of cochlear-implant (CI) users. DESIGN We recorded electroencephalography in 114 postlingually deafened CI users while they completed the California consonant test: a word-in-noise task. In many subjects, data were also collected on two other commonly used clinical measures of speech perception: a word-in-quiet task (consonant-nucleus-consonant) word and a sentence-in-noise task (AzBio sentences). Neural activity was assessed at a vertex electrode (Cz), which could help maximize eventual generalizability to clinical situations. The N1-P2 complex of event-related potentials (ERPs) at this location were included in multiple linear regression analyses, along with several other demographic and hearing factors as predictors of SiN performance. RESULTS In general, there was a good agreement between the scores on the three speech perception tasks. ERP amplitudes did not predict AzBio performance, which was predicted by the duration of device use, low-frequency hearing thresholds, and age. However, ERP amplitudes were strong predictors for performance for both word recognition tasks: the California consonant test (which was conducted simultaneously with electroencephalography recording) and the consonant-nucleus-consonant (conducted offline). These correlations held even after accounting for known predictors of performance including residual low-frequency hearing thresholds. In CI-users, better performance was predicted by an increased cortical response to the target word, in contrast to previous reports in normal-hearing subjects in whom speech perception ability was accounted for by the ability to suppress noise. CONCLUSIONS These data indicate a neurophysiological correlate of SiN performance, thereby revealing a richer profile of an individual's hearing performance than shown by psychoacoustic measures alone. These results also highlight important differences between sentence and word recognition measures of performance and suggest that individual differences in these measures may be underwritten by different mechanisms. Finally, the contrast with prior reports of NH listeners in the same task suggests CI-users performance may be explained by a different weighting of neural processes than NH listeners.
Collapse
Affiliation(s)
- Joel I. Berger
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Phillip E. Gander
- Department of Neurosurgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Subong Kim
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Adam T. Schwalje
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Jihwan Woo
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
| | - Young-min Na
- Department of Biomedical Engineering, University of Ulsan, Ulsan, South Korea
| | - Ann Holmes
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky, USA
| | - Jean M. Hong
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Camille C. Dunn
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Marlan R. Hansen
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Bruce J. Gantz
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
| | - Bob McMurray
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, Iowa, USA
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, Iowa, USA
| | - Timothy D. Griffiths
- Biosciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Inyong Choi
- Department of Otolaryngology – Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa, USA
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, Iowa, USA
| |
Collapse
|
8
|
McMurray B. I'm not sure that curve means what you think it means: Toward a [more] realistic understanding of the role of eye-movement generation in the Visual World Paradigm. Psychon Bull Rev 2023; 30:102-146. [PMID: 35962241 PMCID: PMC10964151 DOI: 10.3758/s13423-022-02143-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/29/2022] [Indexed: 11/08/2022]
Abstract
The Visual World Paradigm (VWP) is a powerful experimental paradigm for language research. Listeners respond to speech in a "visual world" containing potential referents of the speech. Fixations to these referents provides insight into the preliminary states of language processing as decisions unfold. The VWP has become the dominant paradigm in psycholinguistics and extended to every level of language, development, and disorders. Part of its impact is the impressive data visualizations which reveal the millisecond-by-millisecond time course of processing, and advances have been made in developing new analyses that precisely characterize this time course. All theoretical and statistical approaches make the tacit assumption that the time course of fixations is closely related to the underlying activation in the system. However, given the serial nature of fixations and their long refractory period, it is unclear how closely the observed dynamics of the fixation curves are actually coupled to the underlying dynamics of activation. I investigated this assumption with a series of simulations. Each simulation starts with a set of true underlying activation functions and generates simulated fixations using a simple stochastic sampling procedure that respects the sequential nature of fixations. I then analyzed the results to determine the conditions under which the observed fixations curves match the underlying functions, the reliability of the observed data, and the implications for Type I error and power. These simulations demonstrate that even under the simplest fixation-based models, observed fixation curves are systematically biased relative to the underlying activation functions, and they are substantially noisier, with important implications for reliability and power. I then present a potential generative model that may ultimately overcome many of these issues.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, 278 PBSB, University of Iowa, Iowa City, IA, 52242, USA.
- Department of Communication Sciences and Disorders, University of Iowa, Iowa City, IA, USA.
- Department of Linguistics, University of Iowa, Iowa City, IA, USA.
- Department of Otolaryngology, University of Iowa, Iowa City, IA, USA.
| |
Collapse
|
9
|
McMurray B. The myth of categorical perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3819. [PMID: 36586868 PMCID: PMC9803395 DOI: 10.1121/10.0016614] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 11/26/2022] [Accepted: 12/06/2022] [Indexed: 05/29/2023]
Abstract
Categorical perception (CP) is likely the single finding from speech perception with the biggest impact on cognitive science. However, within speech perception, it is widely known to be an artifact of task demands. CP is empirically defined as a relationship between phoneme identification and discrimination. As discrimination tasks do not appear to require categorization, this was thought to support the claim that listeners perceive speech solely in terms of linguistic categories. However, 50 years of work using discrimination tasks, priming, the visual world paradigm, and event related potentials has rejected the strongest forms of CP and provided little strong evidence for any form of it. This paper reviews the origins and impact of this scientific meme and the work challenging it. It discusses work showing that the encoding of auditory input is largely continuous, not categorical, and describes the modern theoretical synthesis in which listeners preserve fine-grained detail to enable more flexible processing. This synthesis is fundamentally inconsistent with CP. This leads to a different understanding of how to use and interpret the most basic paradigms in speech perception-phoneme identification along a continuum-and has implications for understanding language and hearing disorders, development, and multilingualism.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
10
|
Apfelbaum KS, Kutlu E, McMurray B, Kapnoula EC. Don't force it! Gradient speech categorization calls for continuous categorization tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3728. [PMID: 36586841 PMCID: PMC9894657 DOI: 10.1121/10.0015201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 05/29/2023]
Abstract
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses that have not been thoroughly investigated. Here, we identify critical challenges in the link between these tasks and theories of speech categorization. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a visual analog scale that better differentiates between processes at play in speech categorization, and we review some recent findings that show how this task can be used to better inform our theories.
Collapse
Affiliation(s)
- Keith S Apfelbaum
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Efthymia C Kapnoula
- BCBL, Basque Center on Cognition, Brain and Language, Mikeletegi 69, 20009 Donostia, Spain
| |
Collapse
|
11
|
Kutlu E, Chiu S, McMurray B. Moving away from deficiency models: Gradiency in bilingual speech categorization. Front Psychol 2022; 13:1033825. [PMID: 36507048 PMCID: PMC9730410 DOI: 10.3389/fpsyg.2022.1033825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/03/2022] [Indexed: 11/25/2022] Open
Abstract
For much of its history, categorical perception was treated as a foundational theory of speech perception, which suggested that quasi-discrete categorization was a goal of speech perception. This had a profound impact on bilingualism research which adopted similar tasks to use as measures of nativeness or native-like processing, implicitly assuming that any deviation from discreteness was a deficit. This is particularly problematic for listeners like heritage speakers whose language proficiency, both in their heritage language and their majority language, is questioned. However, we now know that in the monolingual listener, speech perception is gradient and listeners use this gradiency to adjust subphonetic details, recover from ambiguity, and aid learning and adaptation. This calls for new theoretical and methodological approaches to bilingualism. We present the Visual Analogue Scaling task which avoids the discrete and binary assumptions of categorical perception and can capture gradiency more precisely than other measures. Our goal is to provide bilingualism researchers new conceptual and empirical tools that can help examine speech categorization in different bilingual communities without the necessity of forcing their speech categorization into discrete units and without assuming a deficit model.
Collapse
Affiliation(s)
- Ethan Kutlu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States,Department of Linguistics, University of Iowa, Iowa City, IA, United States,*Correspondence: Ethan Kutlu,
| | - Samantha Chiu
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States
| | - Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, IA, United States,Department of Linguistics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
12
|
McMurray B, Sarrett ME, Chiu S, Black AK, Wang A, Canale R, Aslin RN. Decoding the temporal dynamics of spoken word and nonword processing from EEG. Neuroimage 2022; 260:119457. [PMID: 35842096 PMCID: PMC10875705 DOI: 10.1016/j.neuroimage.2022.119457] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 07/02/2022] [Accepted: 07/06/2022] [Indexed: 11/23/2022] Open
Abstract
The efficiency of spoken word recognition is essential for real-time communication. There is consensus that this efficiency relies on an implicit process of activating multiple word candidates that compete for recognition as the acoustic signal unfolds in real-time. However, few methods capture the neural basis of this dynamic competition on a msec-by-msec basis. This is crucial for understanding the neuroscience of language, and for understanding hearing, language and cognitive disorders in people for whom current behavioral methods are not suitable. We applied machine-learning techniques to standard EEG signals to decode which word was heard on each trial and analyzed the patterns of confusion over time. Results mirrored psycholinguistic findings: Early on, the decoder was equally likely to report the target (e.g., baggage) or a similar sounding competitor (badger), but by around 500 msec, competitors were suppressed. Follow up analyses show that this is robust across EEG systems (gel and saline), with fewer channels, and with fewer trials. Results are robust within individuals and show high reliability. This suggests a powerful and simple paradigm that can assess the neural dynamics of speech decoding, with potential applications for understanding lexical development in a variety of clinical disorders.
Collapse
Affiliation(s)
- Bob McMurray
- Dept. of Psychological and Brain Sciences, Dept. of Communication Sciences and Disorders, Dept. of Linguistics and Dept. of Otolaryngology, University of Iowa.
| | - McCall E Sarrett
- Interdisciplinary Graduate Program in Neuroscience, Unviersity of Iowa
| | - Samantha Chiu
- Dept. of Psychological and Brain Sciences, University of Iowa
| | - Alexis K Black
- School of Audiology and Speech Sciences, University of British Columbia, Haskins Laboratories
| | - Alice Wang
- Dept. of Psychology, University of Oregon, Haskins Laboratories
| | - Rebecca Canale
- Dept. of Psychological Sciences, University of Connecticut, Haskins Laboratories
| | - Richard N Aslin
- Haskins Laboratories, Department of Psychology and Child Study Center, Yale University, Department of Psychology, University of Connecticut
| |
Collapse
|
13
|
Lexical Access Changes Based on Listener Needs: Real-Time Word Recognition in Continuous Speech in Cochlear Implant Users. Ear Hear 2022; 43:1487-1501. [PMID: 35067570 PMCID: PMC9300769 DOI: 10.1097/aud.0000000000001203] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
OBJECTIVES A key challenge in word recognition is the temporary ambiguity created by the fact that speech unfolds over time. In normal hearing (NH) listeners, this temporary ambiguity is resolved through incremental processing and competition among lexical candidates. Post-lingually deafened cochlear implant (CI) users show similar incremental processing and competition but with slight delays. However, even brief delays could lead to drastic changes when compounded across multiple words in a phrase. This study asks whether words presented in non-informative continuous speech (a carrier phrase) are processed differently than in isolation and whether NH listeners and CI users exhibit different effects of a carrier phrase. DESIGN In a Visual World Paradigm experiment, listeners heard words either in isolation or in non-informative carrier phrases (e.g., "click on the…" ). Listeners selected the picture corresponding to the target word from among four items including the target word (e.g., mustard ), a cohort competitor (e.g., mustache ), a rhyme competitor (e.g., custard ), and an unrelated item (e.g., penguin ). Eye movements were tracked as an index of the relative activation of each lexical candidate as competition unfolds over the course of word recognition. Participants included 21 post-lingually deafened cochlear implant users and 21 NH controls. A replication experiment presented in the Supplemental Digital Content, http://links.lww.com/EANDH/A999 included an additional 22 post-lingually deafened CI users and 18 NH controls. RESULTS Both CI users and the NH controls were accurate at recognizing the words both in continuous speech and in isolation. The time course of lexical activation (indexed by the fixations) differed substantially between groups. CI users were delayed in fixating the target relative to NH controls. Additionally, CI users showed less competition from cohorts than NH controls (even as previous studies have often report increased competition). However, CI users took longer to suppress the cohort and suppressed it less fully than the NH controls. For both CI users and NH controls, embedding words in carrier phrases led to more immediacy in lexical access as observed by increases in cohort competition relative to when words were presented in isolation. However, CI users were not differentially affected by the carriers. CONCLUSIONS Unlike prior work, CI users appeared to exhibit "wait-and-see" profile, in which lexical access is delayed minimizing early competition. However, CI users simultaneously sustained competitor activation late in the trial, possibly to preserve flexibility. This hybrid profile has not been observed previously. When target words are heard in continuous speech, both CI users and NH controls more heavily weight early information. However, CI users (but not NH listeners) also commit less fully to the target, potentially keeping options open if they need to recover from a misperception. This mix of patterns reflects a lexical system that is extremely flexible and adapts to fit the needs of a listener.
Collapse
|
14
|
Kapnoula EC, McMurray B. Idiosyncratic use of bottom-up and top-down information leads to differences in speech perception flexibility: Converging evidence from ERPs and eye-tracking. BRAIN AND LANGUAGE 2021; 223:105031. [PMID: 34628259 DOI: 10.1016/j.bandl.2021.105031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/29/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]
Abstract
Listeners generally categorize speech sounds in a gradient manner. However, recent work, using a visual analogue scaling (VAS) task, suggests that some listeners show more categorical performance, leading to less flexible cue integration and poorer recovery from misperceptions (Kapnoula et al., 2017, 2021). We asked how individual differences in speech gradiency can be reconciled with the well-established gradiency in the modal listener, showing how VAS performance relates to both Visual World Paradigm and EEG measures of gradiency. We also investigated three potential sources of these individual differences: inhibitory control; lexical inhibition; and early cue encoding. We used the N1 ERP component to track pre-categorical encoding of Voice Onset Time (VOT). The N1 linearly tracked VOT, reflecting a fundamentally gradient speech perception; however, for less gradient listeners, this linearity was disrupted near the boundary. Thus, while all listeners are gradient, they may show idiosyncratic encoding of specific cues, affecting downstream processing.
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Basque Center on Cognition, Brain and Language, Spain.
| | - Bob McMurray
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Dept. of Communication Sciences and Disorders, DeLTA Center, University of Iowa, United States; Dept. of Linguistics, DeLTA Center, University of Iowa, United States
| |
Collapse
|
15
|
Apfelbaum KS, Klein-Packard J, McMurray B. The pictures who shall not be named: Empirical support for benefits of preview in the Visual World Paradigm. JOURNAL OF MEMORY AND LANGUAGE 2021; 121:104279. [PMID: 34326570 PMCID: PMC8315347 DOI: 10.1016/j.jml.2021.104279] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
A common critique of the Visual World Paradigm (VWP) in psycholinguistic studies is that what is designed as a measure of language processes is meaningfully altered by the visual context of the task. This is crucial, particularly in studies of spoken word recognition, where the displayed images are usually seen as just a part of the measure and are not of fundamental interest. Many variants of the VWP allow participants to sample the visual scene before a trial begins. However, this could bias their interpretations of the later speech or even lead to abnormal processing strategies (e.g., comparing the input to only preactivated working memory representations). Prior work has focused only on whether preview duration changes fixation patterns. However, preview could affect a number of processes, such as visual search, that would not challenge the interpretation of the VWP. The present study uses a series of targeted manipulations of the preview period to ask if preview alters looking behavior during a trial, and why. Results show that evidence of incremental processing and phonological competition seen in the VWP are not dependent on preview, and are not enhanced by manipulations that directly encourage phonological prenaming. Moreover, some forms of preview can eliminate nuisance variance deriving from object recognition and visual search demands in order to produce a more sensitive measure of linguistic processing. These results deepen our understanding of how the visual scene interacts with language processing to drive fixations patterns in the VWP, and reinforce the value of the VWP as a tool for measuring real-time language processing. Stimuli, data and analysis scripts are available at https://osf.io/b7q65/.
Collapse
Affiliation(s)
| | | | - Bob McMurray
- Dept. of Psychological and Brain Sciences University of Iowa
- Dept. of Communication Sciences and Disorders, Dept. of Linguistics, Dept. of Otolaryngology, University of Iowa
| |
Collapse
|