1
|
Papoutsi C, Zimianiti E, Bosker HR, Frost RLA. Statistical learning at a virtual cocktail party. Psychon Bull Rev 2024; 31:849-861. [PMID: 37783898 PMCID: PMC11061050 DOI: 10.3758/s13423-023-02384-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/10/2023] [Indexed: 10/04/2023]
Abstract
Statistical learning - the ability to extract distributional regularities from input - is suggested to be key to language acquisition. Yet, evidence for the human capacity for statistical learning comes mainly from studies conducted in carefully controlled settings without auditory distraction. While such conditions permit careful examination of learning, they do not reflect the naturalistic language learning experience, which is replete with auditory distraction - including competing talkers. Here, we examine how statistical language learning proceeds in a virtual cocktail party environment, where the to-be-learned input is presented alongside a competing speech stream with its own distributional regularities. During exposure, participants in the Dual Talker group concurrently heard two novel languages, one produced by a female talker and one by a male talker, with each talker virtually positioned at opposite sides of the listener (left/right) using binaural acoustic manipulations. Selective attention was manipulated by instructing participants to attend to only one of the two talkers. At test, participants were asked to distinguish words from part-words for both the attended and the unattended languages. Results indicated that participants' accuracy was significantly higher for trials from the attended vs. unattended language. Further, the performance of this Dual Talker group was no different compared to a control group who heard only one language from a single talker (Single Talker group). We thus conclude that statistical learning is modulated by selective attention, being relatively robust against the additional cognitive load provided by competing speech, emphasizing its efficiency in naturalistic language learning situations.
Collapse
Affiliation(s)
- Christina Papoutsi
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
| | - Eleni Zimianiti
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition, and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Rebecca L A Frost
- Max Planck Institute for Psycholinguistics, PO Box 9104, 6500 HE, Nijmegen, The Netherlands
- Edge Hill University, Edge Hill, UK
| |
Collapse
|
2
|
Wu M, Bosker HR, Riecke L. Sentential Contextual Facilitation of Auditory Word Processing Builds Up during Sentence Tracking. J Cogn Neurosci 2023:1-17. [PMID: 37172122 DOI: 10.1162/jocn_a_02007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
While listening to meaningful speech, auditory input is processed more rapidly near the end (vs. beginning) of sentences. Although several studies have shown such word-to-word changes in auditory input processing, it is still unclear from which processing level these word-to-word dynamics originate. We investigated whether predictions derived from sentential context can result in auditory word-processing dynamics during sentence tracking. We presented healthy human participants with auditory stimuli consisting of word sequences, arranged into either predictable (coherent sentences) or less predictable (unstructured, random word sequences) 42-Hz amplitude-modulated speech, and a continuous 25-Hz amplitude-modulated distractor tone. We recorded RTs and frequency-tagged neuroelectric responses 1(auditory steady-state responses) to individual words at multiple temporal positions within the sentences, and quantified sentential context effects at each position while controlling for individual word characteristics (i.e., phonetics, frequency, and familiarity). We found that sentential context increasingly facilitates auditory word processing as evidenced by accelerated RTs and increased auditory steady-state responses to later-occurring words within sentences. These purely top-down contextually driven auditory word-processing dynamics occurred only when listeners focused their attention on the speech and did not transfer to the auditory processing of the concurrent distractor tone. These findings indicate that auditory word-processing dynamics during sentence tracking can originate from sentential predictions. The predictions depend on the listeners' attention to the speech, and affect only the processing of the parsed speech, not that of concurrently presented auditory streams.
Collapse
Affiliation(s)
- Min Wu
- Maastricht University, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, The Netherlands
- Radboud University, Nijmegen, The Netherlands
| | | |
Collapse
|
3
|
Severijnen GGA, Di Dona G, Bosker HR, McQueen JM. Tracking talker-specific cues to lexical stress: Evidence from perceptual learning. J Exp Psychol Hum Percept Perform 2023; 49:549-565. [PMID: 37184938 DOI: 10.1037/xhp0001105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
When recognizing spoken words, listeners are confronted by variability in the speech signal caused by talker differences. Previous research has focused on segmental talker variability; less is known about how suprasegmental variability is handled. Here we investigated the use of perceptual learning to deal with between-talker differences in lexical stress. Two groups of participants heard Dutch minimal stress pairs (e.g., VOORnaam vs. voorNAAM, "first name" vs. "respectable") spoken by two male talkers. Group 1 heard Talker 1 use only F0 to signal stress (intensity and duration values were ambiguous), while Talker 2 used only intensity (F0 and duration were ambiguous). Group 2 heard the reverse talker-cue mappings. After training, participants were tested on words from both talkers containing conflicting stress cues ("mixed items"; e.g., one spoken by Talker 1 with F0 signaling initial stress and intensity signaling final stress). We found that listeners used previously learned information about which talker used which cue to interpret the mixed items. For example, the mixed item described above tended to be interpreted as having initial stress by Group 1 but as having final stress by Group 2. This demonstrates that listeners learn how individual talkers signal stress and use that knowledge in spoken-word recognition. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Giuseppe Di Dona
- Dipartimento di Psicologia e Scienze Cognitive, Università degli Studi di Trento
| | - Hans Rutger Bosker
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| | - James M McQueen
- Donders Institute for Brain, Cognition and Behaviour, Radboud University
| |
Collapse
|
4
|
Abstract
Individuals vary in how they produce speech. This variability affects both the segments (vowels and consonants) and the suprasegmental properties of their speech (prosody). Previous literature has demonstrated that listeners can adapt to variability in how different talkers pronounce the segments of speech. This study shows that listeners can also adapt to variability in how talkers produce lexical stress. Experiment 1 demonstrates a selective adaptation effect in lexical stress perception: repeatedly hearing Dutch trochaic words biased perception of a subsequent lexical stress continuum towards more iamb responses. Experiment 2 demonstrates a recalibration effect in lexical stress perception: when ambiguous suprasegmental cues to lexical stress were disambiguated by lexical orthographic context as signaling a trochaic word in an exposure phase, Dutch participants categorized a subsequent test continuum as more trochee-like. Moreover, the selective adaptation and recalibration effects generalized to novel words, not encountered during exposure. Together, the experiments demonstrate that listeners also flexibly adapt to variability in the suprasegmental properties of speech, thus expanding our understanding of the utility of listener adaptation in speech perception. Moreover, the combined outcomes speak for an architecture of spoken word recognition involving abstract prosodic representations at a prelexical level of analysis.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Hans Rutger Bosker, Max Planck
Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The
Netherlands.
| |
Collapse
|
5
|
Affiliation(s)
- Hans Rutger Bosker
- Psychology of Language Department, Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | | | - Martin Corley
- Psychology, PPLS, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
6
|
Abstract
Beat gestures-spontaneously produced biphasic movements of the hand-are among the most frequently encountered co-speech gestures in human communication. They are closely temporally aligned to the prosodic characteristics of the speech signal, typically occurring on lexically stressed syllables. Despite their prevalence across speakers of the world's languages, how beat gestures impact spoken word recognition is unclear. Can these simple 'flicks of the hand' influence speech perception? Across a range of experiments, we demonstrate that beat gestures influence the explicit and implicit perception of lexical stress (e.g. distinguishing OBject from obJECT), and in turn can influence what vowels listeners hear. Thus, we provide converging evidence for a manual McGurk effect: relatively simple and widely occurring hand movements influence which speech sounds we hear.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - David Peeters
- Department of Communication and Cognition, TiCC Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
7
|
Abstract
To comprehend speech sounds, listeners tune in to speech rate information in the proximal (immediately adjacent), distal (nonadjacent), and global context (further removed preceding and following sentences). Effects of global contextual speech rate cues on speech perception have been shown to follow constraints not found for proximal and distal speech rate. Therefore, listeners may process such global cues at distinct time points during word recognition. We conducted a printed-word eye-tracking experiment to compare the time courses of distal and global rate effects. Results indicated that the distal rate effect emerged immediately after target sound presentation, in line with a general-auditory account. The global rate effect, however, arose more than 200 ms later than the distal rate effect, indicating that distal and global context effects involve distinct processing mechanisms. Results are interpreted in a 2-stage model of acoustic context effects. This model posits that distal context effects involve very early perceptual processes, while global context effects arise at a later stage, involving cognitive adjustments conditioned by higher-level information. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
|
8
|
Abstract
Speech sounds are perceived relative to spectral properties of surrounding speech. For instance, target words that are ambiguous between /bɪt/ (with low F1) and /bɛt/ (with high F1) are more likely to be perceived as "bet" after a "low F1" sentence, but as "bit" after a "high F1" sentence. However, it is unclear how these spectral contrast effects (SCEs) operate in multi-talker listening conditions. Recently, Feng and Oxenham (J.Exp.Psychol.-Hum.Percept.Perform. 44(9), 1447-1457, 2018b) reported that selective attention affected SCEs to a small degree, using two simultaneously presented sentences produced by a single talker. The present study assessed the role of selective attention in more naturalistic "cocktail party" settings, with 200 lexically unique sentences, 20 target words, and different talkers. Results indicate that selective attention to one talker in one ear (while ignoring another talker in the other ear) modulates SCEs in such a way that only the spectral properties of the attended talker influences target perception. However, SCEs were much smaller in multi-talker settings (Experiment 2) than those in single-talker settings (Experiment 1). Therefore, the influence of SCEs on speech comprehension in more naturalistic settings (i.e., with competing talkers) may be smaller than estimated based on studies without competing talkers.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Institute of General Linguistics, Ludwig Maximilian University Munich, Munich, Germany
| |
Collapse
|
9
|
Kösem A, Bosker HR, Jensen O, Hagoort P, Riecke L. Biasing the Perception of Spoken Words with Transcranial Alternating Current Stimulation. J Cogn Neurosci 2020; 32:1428-1437. [PMID: 32427072 DOI: 10.1162/jocn_a_01579] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Recent neuroimaging evidence suggests that the frequency of entrained oscillations in auditory cortices influences the perceived duration of speech segments, impacting word perception [Kösem, A., Bosker, H. R., Takashima, A., Meyer, A., Jensen, O., & Hagoort, P. Neural entrainment determines the words we hear. Current Biology, 28, 2867-2875, 2018]. We further tested the causal influence of neural entrainment frequency during speech processing, by manipulating entrainment with continuous transcranial alternating current stimulation (tACS) at distinct oscillatory frequencies (3 and 5.5 Hz) above the auditory cortices. Dutch participants listened to speech and were asked to report their percept of a target Dutch word, which contained a vowel with an ambiguous duration. Target words were presented either in isolation (first experiment) or at the end of spoken sentences (second experiment). We predicted that the tACS frequency would influence neural entrainment and therewith how speech is perceptually sampled, leading to a perceptual overestimation or underestimation of the vowel's duration. Whereas results from Experiment 1 did not confirm this prediction, results from Experiment 2 suggested a small effect of tACS frequency on target word perception: Faster tACS leads to more long-vowel word percepts, in line with the previous neuroimaging findings. Importantly, the difference in word perception induced by the different tACS frequencies was significantly larger in Experiment 1 versus Experiment 2, suggesting that the impact of tACS is dependent on the sensory context. tACS may have a stronger effect on spoken word perception when the words are presented in continuous speech as compared to when they are isolated, potentially because prior (stimulus-induced) entrainment of brain oscillations might be a prerequisite for tACS to be effective.
Collapse
Affiliation(s)
- Anne Kösem
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Radboud University, Nijmegen, The Netherlands.,Université Lyon 1
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Radboud University, Nijmegen, The Netherlands
| | | | - Peter Hagoort
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Radboud University, Nijmegen, The Netherlands
| | | |
Collapse
|
10
|
Abstract
Two fundamental properties of perception are selective attention and perceptual contrast, but how these two processes interact remains unknown. Does an attended stimulus history exert a larger contrastive influence on the perception of a following target than unattended stimuli? Dutch listeners categorized target sounds with a reduced prefix "ge-" marking tense (e.g., ambiguous between gegaan-gaan "gone-go"). In 'single talker' Experiments 1-2, participants perceived the reduced syllable (reporting gegaan) when the target was heard after a fast sentence, but not after a slow sentence (reporting gaan). In 'selective attention' Experiments 3-5, participants listened to two simultaneous sentences from two different talkers, followed by the same target sounds, with instructions to attend only one of the two talkers. Critically, the speech rates of attended and unattended talkers were found to equally influence target perception - even when participants could watch the attended talker speak. In fact, participants' target perception in 'selective attention' Experiments 3-5 did not differ from participants who were explicitly instructed to divide their attention equally across the two talkers (Experiment 6). This suggests that contrast effects of speech rate are immune to selective attention, largely operating prior to attentional stream segregation in the auditory processing hierarchy.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands.
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.
| | - Matthias J Sjerps
- Max Planck Institute for Psycholinguistics, PO Box 310, 6500 AH, Nijmegen, The Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University Munich, Munich, Germany
- Acoustics Research Institute, Austrian Academy of Sciences, Vienna, Austria
| |
Collapse
|
11
|
Abstract
Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two "Go Fish"-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants' target categorisation responses. These findings contribute to a better understanding of how what we see influences what we hear.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| | - David Peeters
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands.,Tilburg Center for Cognition and Communication (TiCC), Department of Communication and Cognition, Tilburg University, Tilburg, The Netherlands
| | - Judith Holler
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
| |
Collapse
|
12
|
Kaufeld G, Ravenschlag A, Meyer AS, Martin AE, Bosker HR. Knowledge-based and signal-based cues are weighted flexibly during spoken language comprehension. ACTA ACUST UNITED AC 2020; 46:549-562. [DOI: 10.1037/xlm0000744] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
13
|
Bosker HR, Cooke M. Enhanced amplitude modulations contribute to the Lombard intelligibility benefit: Evidence from the Nijmegen Corpus of Lombard Speech. J Acoust Soc Am 2020; 147:721. [PMID: 32113258 DOI: 10.1121/10.0000646] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 01/10/2020] [Indexed: 06/10/2023]
Abstract
Speakers adjust their voice when talking in noise, which is known as Lombard speech. These acoustic adjustments facilitate speech comprehension in noise relative to plain speech (i.e., speech produced in quiet). However, exactly which characteristics of Lombard speech drive this intelligibility benefit in noise remains unclear. This study assessed the contribution of enhanced amplitude modulations to the Lombard speech intelligibility benefit by demonstrating that (1) native speakers of Dutch in the Nijmegen Corpus of Lombard Speech produce more pronounced amplitude modulations in noise vs in quiet; (2) more enhanced amplitude modulations correlate positively with intelligibility in a speech-in-noise perception experiment; (3) transplanting the amplitude modulations from Lombard speech onto plain speech leads to an intelligibility improvement, suggesting that enhanced amplitude modulations in Lombard speech contribute towards intelligibility in noise. Results are discussed in light of recent neurobiological models of speech perception with reference to neural oscillators phase-locking to the amplitude modulations in speech, guiding the processing of speech.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Psychology of Language department, Max Planck Institute for Psycholinguistics, Wundtlaan 1, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Martin Cooke
- Language and Speech Laboratory, Universidad del País Vasco, calle Justo Vélez de Elorriaga 1, Vitoria, 01006, Spain
| |
Collapse
|
14
|
Rodd J, Bosker HR, Ernestus M, Alday PM, Meyer AS, Ten Bosch L. Control of speaking rate is achieved by switching between qualitatively distinct cognitive "gaits": Evidence from simulation. Psychol Rev 2019; 127:281-304. [PMID: 31886696 DOI: 10.1037/rev0000172] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
That speakers can vary their speaking rate is evident, but how they accomplish this has hardly been studied. Consider this analogy: When walking, speed can be continuously increased, within limits, but to speed up further, humans must run. Are there multiple qualitatively distinct speech "gaits" that resemble walking and running? Or is control achieved by continuous modulation of a single gait? This study investigates these possibilities through simulations of a new connectionist computational model of the cognitive process of speech production, EPONA, that borrows from Dell, Burger, and Svec's (1997) model. The model has parameters that can be adjusted to fit the temporal characteristics of speech at different speaking rates. We trained the model on a corpus of disyllabic Dutch words produced at different speaking rates. During training, different clusters of parameter values (regimes) were identified for different speaking rates. In a 1-gait system, the regimes used to achieve fast and slow speech are qualitatively similar, but quantitatively different. In a multiple gait system, there is no linear relationship between the parameter settings associated with each gait, resulting in an abrupt shift in parameter values to move from speaking slowly to speaking fast. After training, the model achieved good fits in all three speaking rates. The parameter settings associated with each speaking rate were not linearly related, suggesting the presence of cognitive gaits. Thus, we provide the first computationally explicit account of the ability to modulate the speech production system to achieve different speaking styles. (PsycINFO Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
- Joe Rodd
- Psychology of Language Department
| | | | | | | | | | | |
Collapse
|
15
|
Maslowski M, Meyer AS, Bosker HR. Listeners normalize speech for contextual speech rate even without an explicit recognition task. J Acoust Soc Am 2019; 146:179. [PMID: 31370593 DOI: 10.1121/1.5116004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Accepted: 06/17/2019] [Indexed: 06/10/2023]
Abstract
Speech can be produced at different rates. Listeners take this rate variation into account by normalizing vowel duration for contextual speech rate: An ambiguous Dutch word /m?t/ is perceived as short /mɑt/ when embedded in a slow context, but long /ma:t/ in a fast context. While some have argued that this rate normalization involves low-level automatic perceptual processing, there is also evidence that it arises at higher-level cognitive processing stages, such as decision making. Prior research on rate-dependent speech perception has only used explicit recognition tasks to investigate the phenomenon, involving both perceptual processing and decision making. This study tested whether speech rate normalization can be observed without explicit decision making, using a cross-modal repetition priming paradigm. Results show that a fast precursor sentence makes an embedded ambiguous prime (/m?t/) sound (implicitly) more /a:/-like, facilitating lexical access to the long target word "maat" in a (explicit) lexical decision task. This result suggests that rate normalization is automatic, taking place even in the absence of an explicit recognition task. Thus, rate normalization is placed within the realm of everyday spoken conversation, where explicit categorization of ambiguous sounds is rare.
Collapse
Affiliation(s)
- Merel Maslowski
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Antje S Meyer
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| |
Collapse
|
16
|
Rodd J, Bosker HR, Ten Bosch L, Ernestus M. Deriving the onset and offset times of planning units from acoustic and articulatory measurements. J Acoust Soc Am 2019; 145:EL161. [PMID: 30823812 DOI: 10.1121/1.5089456] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Accepted: 01/18/2019] [Indexed: 06/09/2023]
Abstract
Many psycholinguistic models of speech sequence planning make claims about the onset and offset times of planning units, such as words, syllables, and phonemes. These predictions typically go untested, however, since psycholinguists have assumed that the temporal dynamics of the speech signal is a poor index of the temporal dynamics of the underlying speech planning process. This article argues that this problem is tractable, and presents and validates two simple metrics that derive planning unit onset and offset times from the acoustic signal and articulatographic data.
Collapse
Affiliation(s)
- Joe Rodd
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Louis Ten Bosch
- Radboud University, Centre for Language Studies, Nijmegen, The , , ,
| | - Mirjam Ernestus
- Radboud University, Centre for Language Studies, Nijmegen, The , , ,
| |
Collapse
|
17
|
|
18
|
Abstract
Recently, the world's attention was caught by an audio clip that was perceived as "Laurel" or "Yanny." Opinions were sharply split: many could not believe others heard something different from their perception. However, a crowd-source experiment with >500 participants shows that it is possible to make people hear Laurel, where they previously heard Yanny, by manipulating preceding acoustic context. This study is not only the first to reveal within-listener variation in Laurel/Yanny percepts, but also to demonstrate contrast effects for global spectral information in larger frequency regions. Thus, it highlights the intricacies of human perception underlying these social media phenomena.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| |
Collapse
|
19
|
Abstract
Speakers adjust their voice when talking in noise (known as Lombard speech), facilitating speech comprehension. Recent neurobiological models of speech perception emphasize the role of amplitude modulations in speech-in-noise comprehension, helping neural oscillators to "track" the attended speech. This study tested whether talkers produce more pronounced amplitude modulations in noise. Across four different corpora, modulation spectra showed greater power in amplitude modulations below 4 Hz in Lombard speech compared to matching plain speech. This suggests that noise-induced speech contains more pronounced amplitude modulations, potentially helping the listening brain to entrain to the attended talker, aiding comprehension.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for Psycholinguistics, P.O. Box 310, 6500 AH, Nijmegen, The Netherlands
| | - Martin Cooke
- Language and Speech Laboratory, Universidad del País Vasco, Vitoria, 01006, Spain
| |
Collapse
|
20
|
Abstract
Anecdotal evidence suggests that unfamiliar languages sound faster than one’s native language. Empirical evidence for this impression has, so far, come from explicit rate judgments. The aim of the present study was to test whether such perceived rate differences between native and foreign languages (FLs) have effects on implicit speech processing. Our measure of implicit rate perception was “normalization for speech rate”: an ambiguous vowel between short /a/ and long /a:/ is interpreted as /a:/ following a fast but as /a/ following a slow carrier sentence. That is, listeners did not judge speech rate itself; instead, they categorized ambiguous vowels whose perception was implicitly affected by the rate of the context. We asked whether a bias towards long /a:/ might be observed when the context is not actually faster but simply spoken in a FL. A fully symmetrical experimental design was used: Dutch and German participants listened to rate matched (fast and slow) sentences in both languages spoken by the same bilingual speaker. Sentences were followed by non-words that contained vowels from an /a-a:/ duration continuum. Results from Experiments 1 and 2 showed a consistent effect of rate normalization for both listener groups. Moreover, for German listeners, across the two experiments, foreign sentences triggered more /a:/ responses than (rate matched) native sentences, suggesting that foreign sentences were indeed perceived as faster. Moreover, this FL effect was modulated by participants’ ability to understand the FL: those participants that scored higher on a FL translation task showed less of a FL effect. However, opposite effects were found for the Dutch listeners. For them, their native rather than the FL induced more /a:/ responses. Nevertheless, this reversed effect could be reduced when additional spectral properties of the context were controlled for. Experiment 3, using explicit rate judgments, replicated the effect for German but not Dutch listeners. We therefore conclude that the subjective impression that FLs sound fast may have an effect on implicit speech processing, with implications for how language learners perceive spoken segments in a FL.
Collapse
Affiliation(s)
- Hans Rutger Bosker
- Max Planck Institute for PsycholinguisticsNijmegen, Netherlands.,Donders Institute for Brain, Cognition and Behaviour, Radboud UniversityNijmegen, Netherlands
| | - Eva Reinisch
- Institute of Phonetics and Speech Processing, Ludwig Maximilian University of MunichMunich, Germany
| |
Collapse
|
21
|
Abstract
In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record
Collapse
|