1
|
Crinnion AM, Heffner CC, Myers EB. Individual differences in the use of top-down versus bottom-up cues to resolve phonetic ambiguity. Atten Percept Psychophys 2024:10.3758/s13414-024-02889-4. [PMID: 38811489 DOI: 10.3758/s13414-024-02889-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/24/2024] [Indexed: 05/31/2024]
Abstract
How listeners weight a wide variety of information to interpret ambiguities in the speech signal is a question of interest in speech perception, particularly when understanding how listeners process speech in the context of phrases or sentences. Dominant views of cue use for language comprehension posit that listeners integrate multiple sources of information to interpret ambiguities in the speech signal. Here, we study how semantic context, sentence rate, and vowel length all influence identification of word-final stops. We find that while at the group level all sources of information appear to influence how listeners interpret ambiguities in speech, at the level of the individual listener, we observe systematic differences in cue reliance, such that some individual listeners favor certain cues (e.g., speech rate and vowel length) to the exclusion of others (e.g., semantic context). While listeners exhibit a range of cue preferences, across participants we find a negative relationship between individuals' weighting of semantic and acoustic-phonetic (sentence rate, vowel length) cues. Additionally, we find that these weightings are stable within individuals over a period of 1 month. Taken as a whole, these findings suggest that theories of cue integration and speech processing may fail to capture the rich individual differences that exist between listeners, which could arise due to mechanistic differences between individuals in speech perception.
Collapse
|
2
|
Papoutsi C, Zimianiti E, Bosker HR, Frost RLA. Statistical learning at a virtual cocktail party. Psychon Bull Rev 2024; 31:849-861. [PMID: 37783898 PMCID: PMC11061050 DOI: 10.3758/s13423-023-02384-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2023] [Indexed: 10/04/2023]
Abstract
Statistical learning - the ability to extract distributional regularities from input - is suggested to be key to language acquisition. Yet, evidence for the human capacity for statistical learning comes mainly from studies conducted in carefully controlled settings without auditory distraction. While such conditions permit careful examination of learning, they do not reflect the naturalistic language learning experience, which is replete with auditory distraction - including competing talkers. Here, we examine how statistical language learning proceeds in a virtual cocktail party environment, where the to-be-learned input is presented alongside a competing speech stream with its own distributional regularities. During exposure, participants in the Dual Talker group concurrently heard two novel languages, one produced by a female talker and one by a male talker, with each talker virtually positioned at opposite sides of the listener (left/right) using binaural acoustic manipulations. Selective attention was manipulated by instructing participants to attend to only one of the two talkers. At test, participants were asked to distinguish words from part-words for both the attended and the unattended languages. Results indicated that participants' accuracy was significantly higher for trials from the attended vs. unattended language. Further, the performance of this Dual Talker group was no different compared to a control group who heard only one language from a single talker (Single Talker group). We thus conclude that statistical learning is modulated by selective attention, being relatively robust against the additional cognitive load provided by competing speech, emphasizing its efficiency in naturalistic language learning situations.
Collapse
Affiliation(s)
- Christina Papoutsi
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
| | - Eleni Zimianiti
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Rebecca L A Frost
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
- Edge Hill University, Edge Hill, UK
| |
Collapse
|
3
|
King CJ, Sharpe CM, Shorey AE, Stilp CE. The effects of variability on context effects and psychometric function slopes in speaking rate normalizationa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2099-2113. [PMID: 38483206 PMCID: PMC10942802 DOI: 10.1121/10.0025292] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Revised: 02/23/2024] [Accepted: 02/27/2024] [Indexed: 03/17/2024]
Abstract
Acoustic context influences speech perception, but contextual variability restricts this influence. Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)] demonstrated that when categorizing vowels, variability in who spoke the preceding context sentence on each trial but not the sentence contents diminished the resulting spectral contrast effects (perceptual shifts in categorization stemming from spectral differences between sounds). Yet, how such contextual variability affects temporal contrast effects (TCEs) (also known as speaking rate normalization; categorization shifts stemming from temporal differences) is unknown. Here, stimuli were the same context sentences and conditions (one talker saying one sentence, one talker saying 200 sentences, 200 talkers saying 200 sentences) used in Assgari and Stilp [J. Acoust. Soc. Am. 138, 3023-3032 (2015)], but set to fast or slow speaking rates to encourage perception of target words as "tier" or "deer," respectively. In Experiment 1, sentence variability and talker variability each diminished TCE magnitudes; talker variability also produced shallower psychometric function slopes. In Experiment 2, when speaking rates were matched across the 200-sentences conditions, neither TCE magnitudes nor slopes differed across conditions. In Experiment 3, matching slow and fast rates across all conditions failed to produce equal TCEs and slopes everywhere. Results suggest a complex interplay between acoustic, talker, and sentence variability in shaping TCEs in speech perception.
Collapse
Affiliation(s)
- Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Chloe M Sharpe
- School of Psychology, Xavier University, Cincinnati, Ohio 45207, USA
| | - Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
4
|
Derawi H, Reinisch E, Gabay Y. Internal Cognitive Load Differentially Influences Acoustic and Lexical Context Effects in Speech Perception: Evidence From a Population With Attention-Deficit/Hyperactivity Disorder. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3721-3734. [PMID: 37696049 DOI: 10.1044/2023_jslhr-23-00188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/13/2023]
Abstract
BACKGROUND To overcome variability in spoken language, listeners utilize various types of context information for disambiguating speech sounds. Context effects have been shown to be affected by cognitive load. However, previous results are mixed regarding the influence of cognitive load on the use of context information in speech perception. PURPOSE We tested a population characterized by an attention-deficit/hyperactivity disorder (ADHD) to better understand the relationship between attention (or internal cognitive load) and context effects. METHOD The use of acoustic versus lexical properties of the surrounding signal to disambiguate speech sounds was examined in listeners with ADHD and neurotypical listeners. RESULTS Compared to neurotypicals, individuals with ADHD relied more strongly on lexical context for speech perception; however, reliance on acoustic context information from speech rate did not differ. CONCLUSION These findings confirm that cognitive load impacts the use of high-level but not low-level context information in speech and imply that speech recognition deficits in ADHD likely arise due to impaired higher order cognitive processes.
Collapse
Affiliation(s)
- Hadeer Derawi
- Department of Special Education, University of Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel
| | - Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| | - Yafit Gabay
- Department of Special Education, University of Haifa, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel
| |
Collapse
|
5
|
Gabay Y, Reinisch E, Even D, Binur N, Hadad BS. Intact Utilization of Contextual Information in Speech Categorization in Autism. J Autism Dev Disord 2023:10.1007/s10803-023-06106-3. [PMID: 37787847 DOI: 10.1007/s10803-023-06106-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2023] [Indexed: 10/04/2023]
Abstract
Current theories of Autism Spectrum Disorder (ASD) suggest atypical use of context in ASD, but little is known about how these atypicalities influence speech perception. We examined the influence of contextual information (lexical, spectral, and temporal) on phoneme categorization of people with ASD and in typically developed (TD) people. Across three experiments, we found that people with ASD used all types of contextual information for disambiguating speech sounds to the same extent as TD; yet they exhibited a shallower identification curve when phoneme categorization required temporal processing. Overall, the results suggest that the observed atypicalities in speech perception in ASD, including the reduced sensitivity observed here, cannot be attributed merely to the limited ability to utilize context during speech perception.
Collapse
Affiliation(s)
- Yafit Gabay
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel.
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel.
| | - Eva Reinisch
- Acoustics Research Institute, Austrian Academy of Sciences, Wohllebengasse 12-14, Vienna, 1040, Austria
| | - Dana Even
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| | - Nahal Binur
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| | - Bat-Sheva Hadad
- Department of Special Education, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
- Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, 199 Abba Khoushy Ave, Haifa, 3498838, Israel
| |
Collapse
|
6
|
Encoding speech rate in challenging listening conditions: White noise and reverberation. Atten Percept Psychophys 2022; 84:2303-2318. [PMID: 35996057 PMCID: PMC9481500 DOI: 10.3758/s13414-022-02554-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Temporal contrasts in speech are perceived relative to the speech rate of the surrounding context. That is, following a fast context sentence, listeners interpret a given target sound as longer than following a slow context, and vice versa. This rate effect, often referred to as "rate-dependent speech perception," has been suggested to be the result of a robust, low-level perceptual process, typically examined in quiet laboratory settings. However, speech perception often occurs in more challenging listening conditions. Therefore, we asked whether rate-dependent perception would be (partially) compromised by signal degradation relative to a clear listening condition. Specifically, we tested effects of white noise and reverberation, with the latter specifically distorting temporal information. We hypothesized that signal degradation would reduce the precision of encoding the speech rate in the context and thereby reduce the rate effect relative to a clear context. This prediction was borne out for both types of degradation in Experiment 1, where the context sentences but not the subsequent target words were degraded. However, in Experiment 2, which compared rate effects when contexts and targets were coherent in terms of signal quality, no reduction of the rate effect was found. This suggests that, when confronted with coherently degraded signals, listeners adapt to challenging listening situations, eliminating the difference between rate-dependent perception in clear and degraded conditions. Overall, the present study contributes towards understanding the consequences of different types of listening environments on the functioning of low-level perceptual processes that listeners use during speech perception.
Collapse
|
7
|
Heffner CC, Myers EB, Gracco VL. Impaired perceptual phonetic plasticity in Parkinson's disease. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:511. [PMID: 35931533 PMCID: PMC9299957 DOI: 10.1121/10.0012884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 07/05/2022] [Accepted: 07/06/2022] [Indexed: 06/08/2023]
Abstract
Parkinson's disease (PD) is a neurodegenerative condition primarily associated with its motor consequences. Although much of the focus within the speech domain has focused on PD's consequences for production, people with PD have been shown to differ in the perception of emotional prosody, loudness, and speech rate from age-matched controls. The current study targeted the effect of PD on perceptual phonetic plasticity, defined as the ability to learn and adjust to novel phonetic input, both in second language and native language contexts. People with PD were compared to age-matched controls (and, for three of the studies, a younger control population) in tasks of explicit non-native speech learning and adaptation to variation in native speech (compressed rate, accent, and the use of timing information within a sentence to parse ambiguities). The participants with PD showed significantly worse performance on the task of compressed rate and used the duration of an ambiguous fricative to segment speech to a lesser degree than age-matched controls, indicating impaired speech perceptual abilities. Exploratory comparisons also showed people with PD who were on medication performed significantly worse than their peers off medication on those two tasks and the task of explicit non-native learning.
Collapse
Affiliation(s)
- Christopher C Heffner
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269, USA
| | - Emily B Myers
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269, USA
| | | |
Collapse
|
8
|
Listeners track talker-specific prosody to deal with talker-variability. Brain Res 2021; 1769:147605. [PMID: 34363790 DOI: 10.1016/j.brainres.2021.147605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]
Abstract
One of the challenges in speech perception is that listeners must deal with considerable segmental and suprasegmental variability in the acoustic signal due to differences between talkers. Most previous studies have focused on how listeners deal with segmental variability. In this EEG experiment, we investigated whether listeners track talker-specific usage of suprasegmental cues to lexical stress to recognize spoken words correctly. In a three-day training phase, Dutch participants learned to map non-word minimal stress pairs onto different object referents (e.g., USklot meant "lamp"; usKLOT meant "train"). These non-words were produced by two male talkers. Critically, each talker used only one suprasegmental cue to signal stress (e.g., Talker A used only F0 and Talker B only intensity). We expected participants to learn which talker used which cue to signal stress. In the test phase, participants indicated whether spoken sentences including these non-words were correct ("The word for lamp is…"). We found that participants were slower to indicate that a stimulus was correct if the non-word was produced with the unexpected cue (e.g., Talker A using intensity). That is, if in training Talker A used F0 to signal stress, participants experienced a mismatch between predicted and perceived phonological word-forms if, at test, Talker A unexpectedly used intensity to cue stress. In contrast, the N200 amplitude, an event-related potential related to phonological prediction, was not modulated by the cue mismatch. Theoretical implications of these contrasting results are discussed. The behavioral findings illustrate talker-specific prediction of prosodic cues, picked up through perceptual learning during training.
Collapse
|
9
|
Rhythmic and speech rate effects in the perception of durational cues. Atten Percept Psychophys 2021; 83:3162-3182. [PMID: 34254267 DOI: 10.3758/s13414-021-02334-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/25/2021] [Indexed: 11/08/2022]
Abstract
Listeners' perception of temporal contrasts in spoken language is highly sensitive to contextual information, such as variation in speech rate. The present study tests how rate-dependent perception is also mediated by distal (i.e., temporally removed) rhythmic patterns. In four experiments the role of rhythmic alternations and their interaction with speech rate effects are tested. Experiment 1 shows proximal speech rate (contrast) effects obtain based on changes in local context. Experiment 2 shows that these effects disappear with the addition of distal rhythmic alternations, indicating that rhythmic grouping shifts listeners' perception, even when proximal context conflicts. Experiments 3 and 4 explore how orthogonal variation in overall speech rate impacts these effects and finds that trial-to-trial (i.e., global) speech rate variation eliminates rhythmic grouping effects, both with and without variation in proximal (immediately preceding) context. Together, these results suggest a role for rhythmic patterning in listeners' processing of durational cues in speech, which interacts in various ways with proximal, distal, and global rate contexts.
Collapse
|
10
|
Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behav Res Methods 2021; 53:1945-1953. [PMID: 33694079 PMCID: PMC8516752 DOI: 10.3758/s13428-021-01542-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2021] [Indexed: 11/08/2022]
Abstract
Many studies of speech perception assess the intelligibility of spoken sentence stimuli by means of transcription tasks ('type out what you hear'). The intelligibility of a given stimulus is then often expressed in terms of percentage of words correctly reported from the target sentence. Yet scoring the participants' raw responses for words correctly identified from the target sentence is a time-consuming task, and hence resource-intensive. Moreover, there is no consensus among speech scientists about what specific protocol to use for the human scoring, limiting the reliability of human scores. The present paper evaluates various forms of fuzzy string matching between participants' responses and target sentences, as automated metrics of listener transcript accuracy. We demonstrate that one particular metric, the token sort ratio, is a consistent, highly efficient, and accurate metric for automated assessment of listener transcripts, as evidenced by high correlations with human-generated scores (best correlation: r = 0.940) and a strong relationship to acoustic markers of speech intelligibility. Thus, fuzzy string matching provides a practical tool for assessment of listener transcript accuracy in large-scale speech intelligibility studies. See https://tokensortratio.netlify.app for an online implementation.
Collapse
|
11
|
Abstract
Beat gestures-spontaneously produced biphasic movements of the hand-are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world's languages, how beat gestures impact spoken word recognition is unclear. Can these simple 'flicks of the hand' influence speech perception? Across a range of experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g. distinguishing OBject from obJECT), and in turn can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: relatively simple and widely occurring hand movements influence which speech sounds we hear.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - David Peeters
- Department of Communication and Cognition, TiCC Tilburg University, Tilburg, The Netherlands
| |
Collapse
|