1
|
McLaughlin DJ, Van Engen KJ. Social Priming: Exploring the Effects of Speaker Race and Ethnicity on Perception of Second Language Accents. LANGUAGE AND SPEECH 2024; 67:821-845. [PMID: 37772514 DOI: 10.1177/00238309231199245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/30/2023]
Abstract
Listeners use more than just acoustic information when processing speech. Social information, such as a speaker's perceived race or ethnicity, can also affect the processing of the speech signal, in some cases facilitating perception ("social priming"). We aimed to replicate and extend this line of inquiry, examining effects of multiple social primes (i.e., a Middle Eastern, White, or East Asian face, or a control silhouette image) on the perception of Mandarin Chinese-accented English and Arabic-accented English. By including uncommon priming combinations (e.g., a Middle Eastern prime for a Mandarin accent), we aimed to test the specificity of social primes: For example, can a Middle Eastern face facilitate perception of both Arabic-accented English and Mandarin-accented English? Contrary to our predictions, our results indicated no facilitative social priming effects for either of the second language (L2) accents. Results for our examination of specificity were mixed. Trends in the data indicated that the combination of an East Asian prime with Arabic accent resulted in lower accuracy as compared with a White prime, but the combination of a Middle Eastern prime with a Mandarin accent did not (and may have actually benefited listeners to some degree). We conclude that the specificity of priming effects may depend on listeners' level of familiarity with a given accent and/or racial/ethnic group and that the mixed outcomes in the current work motivate further inquiries to determine whether social priming effects for L2-accented speech may be smaller than previously hypothesized and/or highly dependent on listener experience.
Collapse
Affiliation(s)
- Drew J McLaughlin
- Department of Psychological & Brain Sciences, Washington University in St. Louis, USA; Basque Center on Cognition, Brain and Language, Spain
| | - Kristin J Van Engen
- Department of Psychological & Brain Sciences, Washington University in St. Louis, USA
| |
Collapse
|
2
|
Caudrelier T, Ménard L, Beausoleil MM, Martin CD, Samuel AG. When Jack isn't Jacques: Simultaneous opposite language-specific speech perceptual learning in French-English bilinguals. PNAS NEXUS 2024; 3:pgae354. [PMID: 39246670 PMCID: PMC11378075 DOI: 10.1093/pnasnexus/pgae354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 08/02/2024] [Indexed: 09/10/2024]
Abstract
Humans are remarkably good at understanding spoken language, despite the huge variability of the signal as a function of the talker, the situation, and the environment. This success relies on having access to stable representations based on years of speech input, coupled with the ability to adapt to short-term deviations from these norms, e.g. accented speech or speech altered by ambient noise. In the last two decades, there has been a robust research effort focused on a possible mechanism for adjusting to accented speech. In these studies, listeners typically hear 15 - 20 words in which a speech sound has been altered, creating a short-term deviation from its longer-term representation. After exposure to these items, listeners demonstrate "lexically driven phonetic recalibration"-they alter their categorization of speech sounds, expanding a speech category to take into account the recently heard deviations from their long-term representations. In the current study, we investigate such adjustments by bilingual listeners. French-English bilinguals were first exposed to nonstandard pronunciations of a sound (/s/ or /f/) in one language and tested for recalibration in both languages. Then, the exposure continued with both the original type of mispronunciation in the same language, plus mispronunciations in the other language, in the opposite direction. In a final test, we found simultaneous recalibration in opposite directions for the two languages-listeners shifted their French perception in one direction and their English in the other: Bilinguals can maintain separate adjustments, for the same sounds, when a talker's speech differs across two languages.
Collapse
Affiliation(s)
- Tiphaine Caudrelier
- Laboratoire d'Etude des Mécanismes Cognitifs, Université Lumière Lyon 2, 5 avenue Pierre Mendès France, 69676 BRON Cedex, Lyon, France
- Basque Center on Cognition Brain and Language (BCBL), Paseo Mikeletegi 69, Gipuzkoa, San Sebastian 20009, Spain
| | - Lucie Ménard
- Département de Linguistique, Pavillon Hubert-Aquin, A-3405, 400 rue Sainte-Catherine Est, Université du Québec à Montréal, Montréal, QC H2L 2C5, Canada
- Center for Research on Brain, Language, and Music (CRBLM), 2001 av McGill College, 6th Floor, Montréal, QC H3A 1G1, Canada
| | - Marie-Michèle Beausoleil
- Département de Linguistique, Pavillon Hubert-Aquin, A-3405, 400 rue Sainte-Catherine Est, Université du Québec à Montréal, Montréal, QC H2L 2C5, Canada
- Center for Research on Brain, Language, and Music (CRBLM), 2001 av McGill College, 6th Floor, Montréal, QC H3A 1G1, Canada
| | - Clara D Martin
- Basque Center on Cognition Brain and Language (BCBL), Paseo Mikeletegi 69, Gipuzkoa, San Sebastian 20009, Spain
- Ikerbasque, Basque Foundation for Science, Plaza Euskadi 5, 48009 Bilbao, Bizkaia, Spain
| | - Arthur G Samuel
- Basque Center on Cognition Brain and Language (BCBL), Paseo Mikeletegi 69, Gipuzkoa, San Sebastian 20009, Spain
- Ikerbasque, Basque Foundation for Science, Plaza Euskadi 5, 48009 Bilbao, Bizkaia, Spain
- Department of Psychology, Stony Brook University, 100 Nicolls Road, Stony Brook, NY 11794, USA
| |
Collapse
|
3
|
Wilcox EG, Pimentel T, Meister C, Cotterell R. An information-theoretic analysis of targeted regressions during reading. Cognition 2024; 249:105765. [PMID: 38772254 DOI: 10.1016/j.cognition.2024.105765] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 02/23/2024] [Accepted: 02/29/2024] [Indexed: 05/23/2024]
Abstract
Regressions, or backward saccades, are common during reading, accounting for between 5% and 20% of all saccades. And yet, relatively little is known about what causes them. We provide an information-theoretic operationalization for two previous qualitative hypotheses about regressions, which we dub reactivation and reanalysis. We argue that these hypotheses make different predictions about the pointwise mutual information or pmi between a regression's source and target. Intuitively, the pmi between two words measures how much more (or less) likely one word is to be present given the other. On one hand, the reactivation hypothesis predicts that regressions occur between words that are associated, implying high positive values of pmi. On the other hand, the reanalysis hypothesis predicts that regressions should occur between words that are not associated with each other, implying negative, low values of pmi. As a second theoretical contribution, we expand on previous theories by considering not only pmi but also expected values of pmi, E[pmi], where the expectation is taken over all possible realizations of the regression's target. The rationale for this is that language processing involves making inferences under uncertainty, and readers may be uncertain about what they have read, especially if a previous word was skipped. To test both theories, we use contemporary language models to estimate pmi-based statistics over word pairs in three corpora of eye tracking data in English, as well as in six languages across three language families (Indo-European, Uralic, and Turkic). Our results are consistent across languages and models tested: Positive values of pmi and E[pmi] consistently help to predict the patterns of regressions during reading, whereas negative values of pmi and E[pmi] do not. Our information-theoretic interpretation increases the predictive scope of both theories and our studies present the first systematic crosslinguistic analysis of regressions in the literature. Our results support the reactivation hypothesis and, more broadly, they expand the number of language processing behaviors that can be linked to information-theoretic principles.
Collapse
Affiliation(s)
| | - Tiago Pimentel
- Department of Computer Science, ETH Zürich, Switzerland; Department of Computer Science, University of Cambridge, United Kingdom.
| | - Clara Meister
- Department of Computer Science, ETH Zürich, Switzerland.
| | - Ryan Cotterell
- Department of Computer Science, ETH Zürich, Switzerland.
| |
Collapse
|
4
|
Drouin JR, Davis CP. Individual differences in visual pattern completion predict adaptation to degraded speech. BRAIN AND LANGUAGE 2024; 255:105449. [PMID: 39083999 DOI: 10.1016/j.bandl.2024.105449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 03/18/2024] [Accepted: 07/23/2024] [Indexed: 08/02/2024]
Abstract
Recognizing acoustically degraded speech relies on predictive processing whereby incomplete auditory cues are mapped to stored linguistic representations via pattern recognition processes. While listeners vary in their ability to recognize degraded speech, performance improves when a written transcription is presented, allowing completion of the partial sensory pattern to preexisting representations. Building on work characterizing predictive processing as pattern completion, we examined the relationship between domain-general pattern recognition and individual variation in degraded speech learning. Participants completed a visual pattern recognition task to measure individual-level tendency towards pattern completion. Participants were also trained to recognize noise-vocoded speech with written transcriptions and tested on speech recognition pre- and post-training using a retrieval-based transcription task. Listeners significantly improved in recognizing speech after training, and pattern completion on the visual task predicted improvement for novel items. The results implicate pattern completion as a domain-general learning mechanism that can facilitate speech adaptation in challenging contexts.
Collapse
Affiliation(s)
- Julia R Drouin
- Division of Speech and Hearing Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Communication Sciences and Disorders, California State University Fullerton, Fullerton, CA 92831, USA.
| | - Charles P Davis
- Department of Psychology & Neuroscience, Duke University, Durham, NC 27708, USA
| |
Collapse
|
5
|
Shin GH. Good-enough processing, home language proficiency, cognitive skills, and task effects for Korean heritage speakers' sentence comprehension. Front Psychol 2024; 15:1382668. [PMID: 39149703 PMCID: PMC11324561 DOI: 10.3389/fpsyg.2024.1382668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/10/2024] [Indexed: 08/17/2024] Open
Abstract
The present study investigates how heritage speakers conduct good-enough processing at the interface of home-language proficiency, cognitive skills (inhibitory control; working memory), and task types (acceptability judgement; self-paced reading). For this purpose, we employ two word-order patterns (verb-final vs. verb-initial) of two clausal constructions in Korean-suffixal passive and morphological causative-which contrast pertaining to the mapping between thematic roles and case-marking and the interpretive procedures driven by verbal morphology. We find that, while Korean heritage speakers demonstrate the same kind of acceptability-rating behaviour as monolingual Korean speakers do, their reading-time patterns are notably modulated by construction-specific properties, cognitive skills, and proficiency. This suggests a heritage speaker's ability and willingness to conduct both parsing routes, induced by linguistic cues in a non-dominant language, which are proportional to the computational complexity involving these cues. Implications of this study are expected to advance our understanding of a learner's mind for underrepresented languages and populations in the field.
Collapse
Affiliation(s)
- Gyu-Ho Shin
- Department of Linguistics, University of Illinois at Chicago, Chicago, IL, United States
| |
Collapse
|
6
|
Xu K, Zeng T. Cross-linguistic syntactic priming as rational expectation for syntactic repetition in the bilingual environment. PLoS One 2024; 19:e0307504. [PMID: 39028739 PMCID: PMC11259290 DOI: 10.1371/journal.pone.0307504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 07/05/2024] [Indexed: 07/21/2024] Open
Abstract
Recent research suggests that syntactic priming in language comprehension-the facilitated processing of repeated syntactic structures-arises from the expectation for syntactic repetition due to rational adaptation to the linguistic environment. To further evaluate the generalizability of this expectation adaptation account in cross-linguistic syntactic priming and explore the influence of second language (L2) proficiency, we conducted a self-paced reading study with Chinese L2 learners of English by utilizing the sentential complement-direct object (SC-DO) ambiguity. The results showed that participants exposed to clusters of SC structures subsequently processed repetitions of this structure more rapidly (i.e., larger priming effects) than those exposed to the same number of SC structures but spaced in time, despite the prime and target being in two different languages (Chinese and English). Furthermore, this difference in priming strength was more pronounced for participants with higher L2 (English) proficiency. These findings demonstrate that cross-linguistic syntactic priming is consistent with the expectation for syntactic repetition that rationally adapts to syntactic clustering properties in surrounding bilingual environments, and such adaptation is enhanced as L2 proficiency increases. Taken together, our study extends the expectation adaptation account to cross-linguistic syntactic priming and integrates the role of L2 proficiency, which can shed new light on the mechanisms underlying syntactic priming, bilingual shared syntactic representations and expectation-based sentence processing.
Collapse
Affiliation(s)
- Kexin Xu
- College of Foreign Languages, Hunan University, Changsha, China
| | - Tao Zeng
- College of Foreign Languages, Hunan University, Changsha, China
- Hunan Provincial Research Center for Language and Cognition, Changsha, China
| |
Collapse
|
7
|
Kurumada C, Rivera R, Allen P, Bennetto L. Perception and adaptation of receptive prosody in autistic adolescents. Sci Rep 2024; 14:16409. [PMID: 39013983 PMCID: PMC11252140 DOI: 10.1038/s41598-024-66569-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 07/01/2024] [Indexed: 07/18/2024] Open
Abstract
A fundamental aspect of language processing is inferring others' minds from subtle variations in speech. The same word or sentence can often convey different meanings depending on its tempo, timing, and intonation-features often referred to as prosody. Although autistic children and adults are known to experience difficulty in making such inferences, the science remains unclear as to why. We hypothesize that detail-oriented perception in autism may interfere with the inference process if it lacks the adaptivity required to cope with the variability ubiquitous in human speech. Using a novel prosodic continuum that shifts the sentence meaning gradiently from a statement (e.g., "It's raining") to a question (e.g., "It's raining?"), we have investigated the perception and adaptation of receptive prosody in autistic adolescents and two groups of non-autistic controls. Autistic adolescents showed attenuated adaptivity in categorizing prosody, whereas they were equivalent to controls in terms of discrimination accuracy. Combined with recent findings in segmental (e.g., phoneme) recognition, the current results provide the basis for an emerging research framework for attenuated flexibility and reduced influence of contextual feedback as a possible source of deficits that hinder linguistic and social communication in autism.
Collapse
Affiliation(s)
- Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, 14627, USA.
| | - Rachel Rivera
- Psychology, University of Rochester, Rochester, 14627, USA
| | - Paul Allen
- Psychology, University of Rochester, Rochester, 14627, USA
- Otolaryngology, University of Rochester Medical Center, Rochester, 14642, USA
| | - Loisa Bennetto
- Psychology, University of Rochester, Rochester, 14627, USA
| |
Collapse
|
8
|
Chen AM, Palacci A, Vélez N, Hawkins RD, Gershman SJ. A Hierarchical Bayesian Model of Adaptive Teaching. Cogn Sci 2024; 48:e13477. [PMID: 38980989 DOI: 10.1111/cogs.13477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 06/05/2024] [Accepted: 06/08/2024] [Indexed: 07/11/2024]
Abstract
How do teachers learn about what learners already know? How do learners aid teachers by providing them with information about their background knowledge and what they find confusing? We formalize this collaborative reasoning process using a hierarchical Bayesian model of pedagogy. We then evaluate this model in two online behavioral experiments (N = 312 adults). In Experiment 1, we show that teachers select examples that account for learners' background knowledge, and adjust their examples based on learners' feedback. In Experiment 2, we show that learners strategically provide more feedback when teachers' examples deviate from their background knowledge. These findings provide a foundation for extending computational accounts of pedagogy to richer interactive settings.
Collapse
Affiliation(s)
- Alicia M Chen
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology
| | | | | | | | - Samuel J Gershman
- Department of Psychology, Harvard University
- Center for Brains, Minds, and Machines, Massachusetts Institute of Technology
| |
Collapse
|
9
|
Murphy TK, Nozari N, Holt LL. Transfer of statistical learning from passive speech perception to speech production. Psychon Bull Rev 2024; 31:1193-1205. [PMID: 37884779 PMCID: PMC11192850 DOI: 10.3758/s13423-023-02399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2023] [Indexed: 10/28/2023]
Abstract
Communicating with a speaker with a different accent can affect one's own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In the canonical condition, /b/-/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In the reverse condition, the F0xVOT relationship reversed to create an "accent" with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners' own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners' own speech productions.
Collapse
Affiliation(s)
- Timothy K Murphy
- Department of Psychology, Carnegie Mellon University, Baker Hall, Floor 3, Frew St, Pittsburgh, PA, 15213, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, 15213, USA.
| | - Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
| | - Lori L Holt
- Department of Psychology, University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
10
|
Zhao C, Ong JH, Veic A, Patel AD, Jiang C, Fogel AR, Wang L, Hou Q, Das D, Crasto C, Chakrabarti B, Williams TI, Loutrari A, Liu F. Predictive processing of music and language in autism: Evidence from Mandarin and English speakers. Autism Res 2024; 17:1230-1257. [PMID: 38651566 DOI: 10.1002/aur.3133] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 04/01/2024] [Indexed: 04/25/2024]
Abstract
Atypical predictive processing has been associated with autism across multiple domains, based mainly on artificial antecedents and consequents. As structured sequences where expectations derive from implicit learning of combinatorial principles, language and music provide naturalistic stimuli for investigating predictive processing. In this study, we matched melodic and sentence stimuli in cloze probabilities and examined musical and linguistic prediction in Mandarin- (Experiment 1) and English-speaking (Experiment 2) autistic and non-autistic individuals using both production and perception tasks. In the production tasks, participants listened to unfinished melodies/sentences and then produced the final notes/words to complete these items. In the perception tasks, participants provided expectedness ratings of the completed melodies/sentences based on the most frequent notes/words in the norms. While Experiment 1 showed intact musical prediction but atypical linguistic prediction in autism in the Mandarin sample that demonstrated imbalanced musical training experience and receptive vocabulary skills between groups, the group difference disappeared in a more closely matched sample of English speakers in Experiment 2. These findings suggest the importance of taking an individual differences approach when investigating predictive processing in music and language in autism, as the difficulty in prediction in autism may not be due to generalized problems with prediction in any type of complex sequence processing.
Collapse
Affiliation(s)
- Chen Zhao
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Jia Hoong Ong
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Anamarija Veic
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Aniruddh D Patel
- Department of Psychology, Tufts University, Medford, Massachusetts, USA
- Program in Brain, Mind, and Consciousness, Canadian Institute for Advanced Research (CIFAR), Toronto, Canada
| | - Cunmei Jiang
- Music College, Shanghai Normal University, Shanghai, China
| | - Allison R Fogel
- Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Li Wang
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Qingqi Hou
- Department of Music and Dance, Nanjing Normal University of Special Education, Nanjing, China
| | - Dipsikha Das
- School of Psychology, Keele University, Staffordshire, UK
| | - Cara Crasto
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Bhismadev Chakrabarti
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Tim I Williams
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Ariadne Loutrari
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| | - Fang Liu
- School of Psychology and Clinical Language Sciences, University of Reading, Reading, UK
| |
Collapse
|
11
|
Xie X, Kurumada C. From first encounters to longitudinal exposure: a repeated exposure-test paradigm for monitoring speech adaptation. Front Psychol 2024; 15:1383904. [PMID: 38873525 PMCID: PMC11169900 DOI: 10.3389/fpsyg.2024.1383904] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 05/08/2024] [Indexed: 06/15/2024] Open
Abstract
Perceptual difficulty with an unfamiliar accent can dissipate within short time scales (e.g., within minutes), reflecting rapid adaptation effects. At the same time, long-term familiarity with an accent is also known to yield stable perceptual benefits. However, whether the long-term effects reflect sustained, cumulative progression from shorter-term adaptation remains unknown. To fill this gap, we developed a web-based, repeated exposure-test paradigm. In this paradigm, short test blocks alternate with exposure blocks, and this exposure-test sequence is repeated multiple times. This design allows for the testing of adaptive speech perception both (a) within the first moments of encountering an unfamiliar accent and (b) over longer time scales such as days and weeks. In addition, we used a Bayesian ideal observer approach to select natural speech stimuli that increase the statistical power to detect adaptation. The current report presents results from a first application of this paradigm, investigating changes in the recognition accuracy of Mandarin-accented speech by native English listeners over five sessions spanning 3 weeks. We found that the recognition of an accent feature (a syllable-final /d/, as in feed, sounding/t/-like) improved steadily over the three-week period. Unexpectedly, however, the improvement was seen with or without exposure to the accent. We discuss possible reasons for this result and implications for conducting future longitudinal studies with repeated exposure and testing.
Collapse
Affiliation(s)
- Xin Xie
- Department of Language Science, University of California, Irvine, Irvine, CA, United States
| | - Chigusa Kurumada
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY, United States
| |
Collapse
|
12
|
de Zubicaray GI, Arciuli J, Guenther FH, McMahon KL, Kearney E. Non-arbitrary mappings between size and sound of English words: Form typicality effects during lexical access and memory. Q J Exp Psychol (Hove) 2024; 77:943-963. [PMID: 37332149 PMCID: PMC11032636 DOI: 10.1177/17470218231184940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Revised: 05/04/2023] [Accepted: 06/09/2023] [Indexed: 06/20/2023]
Abstract
A century of research has provided evidence of limited size sound symbolism in English, that is, certain vowels are non-arbitrarily associated with words denoting small versus large referents (e.g., /i/ as in teensy and /ɑ/ as in tall). In the present study, we investigated more extensive statistical regularities between surface form properties of English words and ratings of their semantic size, that is, form typicality, and its impact on language and memory processing. Our findings provide the first evidence of significant word form typicality for semantic size. In five empirical studies using behavioural megastudy data sets of performance on written and auditory lexical decision, reading aloud, semantic decision, and recognition memory tasks, we show that form typicality for size is a stronger and more consistent predictor of lexical access during word comprehension and production than semantic size, in addition to playing a significant role in verbal memory. The empirical results demonstrate that statistical information about non-arbitrary form-size mappings is accessed automatically during language and verbal memory processing, unlike semantic size that is largely dependent on task contexts that explicitly require participants to access size knowledge. We discuss how a priori knowledge about non-arbitrary form-meaning associations in the lexicon might be incorporated in models of language processing that implement Bayesian statistical inference.
Collapse
Affiliation(s)
- Greig I de Zubicaray
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology, Brisbane, QLD, Australia
| | - Joanne Arciuli
- College of Nursing and Health Sciences, Flinders University, Adelaide, SA, Australia
| | - Frank H Guenther
- Department of Speech, Language & Hearing Sciences, Boston University, Boston, MA, USA
- Department of Biomedical Engineering, Boston University, Boston, MA, USA
| | - Katie L McMahon
- School of Clinical Sciences, Centre for Biomedical Technologies, Queensland University of Technology, Brisbane, QLD, Australia
- Herston Imaging Research Facility, Royal Brisbane and Women’s Hospital, Herston, QLD, Australia
| | - Elaine Kearney
- School of Psychology and Counselling, Faculty of Health, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
13
|
Nour Eddine S, Brothers T, Wang L, Spratling M, Kuperberg GR. A predictive coding model of the N400. Cognition 2024; 246:105755. [PMID: 38428168 PMCID: PMC10984641 DOI: 10.1016/j.cognition.2024.105755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 02/14/2024] [Accepted: 02/19/2024] [Indexed: 03/03/2024]
Abstract
The N400 event-related component has been widely used to investigate the neural mechanisms underlying real-time language comprehension. However, despite decades of research, there is still no unifying theory that can explain both its temporal dynamics and functional properties. In this work, we show that predictive coding - a biologically plausible algorithm for approximating Bayesian inference - offers a promising framework for characterizing the N400. Using an implemented predictive coding computational model, we demonstrate how the N400 can be formalized as the lexico-semantic prediction error produced as the brain infers meaning from the linguistic form of incoming words. We show that the magnitude of lexico-semantic prediction error mirrors the functional sensitivity of the N400 to various lexical variables, priming, contextual effects, as well as their higher-order interactions. We further show that the dynamics of the predictive coding algorithm provides a natural explanation for the temporal dynamics of the N400, and a biologically plausible link to neural activity. Together, these findings directly situate the N400 within the broader context of predictive coding research. More generally, they raise the possibility that the brain may use the same computational mechanism for inference across linguistic and non-linguistic domains.
Collapse
Affiliation(s)
- Samer Nour Eddine
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America.
| | - Trevor Brothers
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychology, North Carolina A&T, United States of America
| | - Lin Wang
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, United States of America
| | | | - Gina R Kuperberg
- Department of Psychology and Center for Cognitive Science, Tufts University, United States of America; Department of Psychiatry and the Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Harvard Medical School, United States of America
| |
Collapse
|
14
|
Magnotti JF, Lado A, Zhang Y, Maasø A, Nath A, Beauchamp MS. Repeatedly experiencing the McGurk effect induces long-lasting changes in auditory speech perception. COMMUNICATIONS PSYCHOLOGY 2024; 2:25. [PMID: 39242734 PMCID: PMC11332120 DOI: 10.1038/s44271-024-00073-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 03/07/2024] [Indexed: 09/09/2024]
Abstract
In the McGurk effect, presentation of incongruent auditory and visual speech evokes a fusion percept different than either component modality. We show that repeatedly experiencing the McGurk effect for 14 days induces a change in auditory-only speech perception: the auditory component of the McGurk stimulus begins to evoke the fusion percept, even when presented on its own without accompanying visual speech. This perceptual change, termed fusion-induced recalibration (FIR), was talker-specific and syllable-specific and persisted for a year or more in some participants without any additional McGurk exposure. Participants who did not experience the McGurk effect did not experience FIR, showing that recalibration was driven by multisensory prediction error. A causal inference model of speech perception incorporating multisensory cue conflict accurately predicted individual differences in FIR. Just as the McGurk effect demonstrates that visual speech can alter the perception of auditory speech, FIR shows that these alterations can persist for months or years. The ability to induce seemingly permanent changes in auditory speech perception will be useful for studying plasticity in brain networks for language and may provide new strategies for improving language learning.
Collapse
Affiliation(s)
- John F Magnotti
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anastasia Lado
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yue Zhang
- Department of Neurosurgery, Baylor College of Medicine, Houston, TX, USA
| | - Arnt Maasø
- Institute for Media and Communications, University of Oslo, Oslo, Norway
| | - Audrey Nath
- Department of Neurosurgery, University of Texas Medical Branch, Galveston, TX, USA
| | - Michael S Beauchamp
- Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
15
|
Kim SG, De Martino F, Overath T. Linguistic modulation of the neural encoding of phonemes. Cereb Cortex 2024; 34:bhae155. [PMID: 38687241 PMCID: PMC11059272 DOI: 10.1093/cercor/bhae155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 03/21/2024] [Accepted: 03/22/2024] [Indexed: 05/02/2024] Open
Abstract
Speech comprehension entails the neural mapping of the acoustic speech signal onto learned linguistic units. This acousto-linguistic transformation is bi-directional, whereby higher-level linguistic processes (e.g. semantics) modulate the acoustic analysis of individual linguistic units. Here, we investigated the cortical topography and linguistic modulation of the most fundamental linguistic unit, the phoneme. We presented natural speech and "phoneme quilts" (pseudo-randomly shuffled phonemes) in either a familiar (English) or unfamiliar (Korean) language to native English speakers while recording functional magnetic resonance imaging. This allowed us to dissociate the contribution of acoustic vs. linguistic processes toward phoneme analysis. We show that (i) the acoustic analysis of phonemes is modulated by linguistic analysis and (ii) that for this modulation, both of acoustic and phonetic information need to be incorporated. These results suggest that the linguistic modulation of cortical sensitivity to phoneme classes minimizes prediction error during natural speech perception, thereby aiding speech comprehension in challenging listening situations.
Collapse
Affiliation(s)
- Seung-Goo Kim
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Federico De Martino
- Faculty of Psychology and Neuroscience, University of Maastricht, Universiteitssingel 40, 6229 ER Maastricht, Netherlands
| | - Tobias Overath
- Department of Psychology and Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Duke Institute for Brain Sciences, Duke University, 308 Research Dr, Durham, NC 27708, United States
- Center for Cognitive Neuroscience, Duke University, 308 Research Dr, Durham, NC 27708, United States
| |
Collapse
|
16
|
Crinnion AM, Luthra S, Gaston P, Magnuson JS. Resolving competing predictions in speech: How qualitatively different cues and cue reliability contribute to phoneme identification. Atten Percept Psychophys 2024; 86:942-961. [PMID: 38383914 PMCID: PMC11233028 DOI: 10.3758/s13414-024-02849-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2024] [Indexed: 02/23/2024]
Abstract
Listeners have many sources of information available in interpreting speech. Numerous theoretical frameworks and paradigms have established that various constraints impact the processing of speech sounds, but it remains unclear how listeners might simultaneously consider multiple cues, especially those that differ qualitatively (i.e., with respect to timing and/or modality) or quantitatively (i.e., with respect to cue reliability). Here, we establish that cross-modal identity priming can influence the interpretation of ambiguous phonemes (Exp. 1, N = 40) and show that two qualitatively distinct cues - namely, cross-modal identity priming and auditory co-articulatory context - have additive effects on phoneme identification (Exp. 2, N = 40). However, we find no effect of quantitative variation in a cue - specifically, changes in the reliability of the priming cue did not influence phoneme identification (Exp. 3a, N = 40; Exp. 3b, N = 40). Overall, we find that qualitatively distinct cues can additively influence phoneme identification. While many existing theoretical frameworks address constraint integration to some degree, our results provide a step towards understanding how information that differs in both timing and modality is integrated in online speech perception.
Collapse
Affiliation(s)
| | | | | | - James S Magnuson
- University of Connecticut, Storrs, CT, USA
- BCBL. Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Spain
- Ikerbasque. Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
17
|
Sala M, Vespignani F, Casalino L, Peressotti F. I know how you'll say it: evidence of speaker-specific speech prediction. Psychon Bull Rev 2024:10.3758/s13423-024-02488-2. [PMID: 38528302 DOI: 10.3758/s13423-024-02488-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 03/27/2024]
Abstract
Most models of language comprehension assume that the linguistic system is able to pre-activate phonological information. However, the evidence for phonological prediction is mixed and controversial. In this study, we implement a paradigm that capitalizes on the fact that foreign speakers usually make phonological errors. We investigate whether speaker identity (native vs. foreign) is used to make specific phonological predictions. Fifty-two participants were recruited to read sentence frames followed by a last spoken word which was uttered by either a native or a foreign speaker. They were required to perform a lexical decision on the last spoken word, which could be either semantically predictable or not. Speaker identity (native vs. foreign) may or may not be cued by the face of the speaker. We observed that the face cue is effective in speeding up the lexical decision when the word is predictable, but it is not effective when the word is not predictable. This result shows that speech prediction takes into account the phonological variability between speakers, suggesting that it is possible to pre-activate in a detailed and specific way the phonological representation of a predictable word.
Collapse
Affiliation(s)
- Marco Sala
- Departement of Developmental Psychology and Socialization, University of Padua, Padova, Italy.
| | - Francesco Vespignani
- Departement of Developmental Psychology and Socialization, University of Padua, Padova, Italy
| | - Laura Casalino
- Departement of Developmental Psychology and Socialization, University of Padua, Padova, Italy
| | - Francesca Peressotti
- Departement of Developmental Psychology and Socialization, University of Padua, Padova, Italy.
- Padua Neuroscience Center, University of Padua, Padova, Italy.
| |
Collapse
|
18
|
Blanchette F, Flannery E, Jackson C, Reed P. Adaptation at the Syntax-Semantics Interface: Evidence From a Vernacular Structure. LANGUAGE AND SPEECH 2024; 67:140-165. [PMID: 37161280 PMCID: PMC10916346 DOI: 10.1177/00238309231164972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Expanding on psycholinguistic research on linguistic adaptation, the phenomenon whereby speakers change how they comprehend or produce structures as a result of cumulative exposure to less frequent or unfamiliar linguistic structures, this study asked whether speakers can learn semantic and syntactic properties of the American English vernacular negative auxiliary inversion (NAI) structure (e.g., didn't everybody eat, meaning "not everybody ate") during the course of an experiment. Formal theoretical analyses of NAI informed the design of a task in which American English-speaking participants unfamiliar with this structure were exposed to NAI sentences in either semantically ambiguous or unambiguous contexts. Participants rapidly adapted to the interpretive properties of NAI, selecting responses similar to what would be expected of a native speaker after only limited exposure to semantically ambiguous input. On a separate ratings task, participants displayed knowledge of syntactic restrictions on NAI subject type, despite having no previous exposure. We discuss the results in the context of other experimental studies of adaptation and suggest the implementation of top-down strategies via analogy to other familiar structure types as possible explanations for the behaviors observed in this study. The study illustrates the value of integrating insights from formal theoretical research and psycholinguistic methods in research on adaptation and highlights the need for more interdisciplinary and cross-disciplinary work in both experimental and naturalistic contexts to understand this phenomenon.
Collapse
Affiliation(s)
- Frances Blanchette
- Frances Blanchette, Department of Psychology, The Pennsylvania State University, 111 Moore Building, University Park, PA 16802, USA.
| | | | | | | |
Collapse
|
19
|
Dempsey J, Liu Q, Christianson K. Syntactic adaptation leads to updated knowledge for local structural frequencies. Q J Exp Psychol (Hove) 2024; 77:363-382. [PMID: 37082989 DOI: 10.1177/17470218231172908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/22/2023]
Abstract
Syntactic adaptation has been shown to occur for various temporarily ambiguous structures, wherein an initially unexpected resolution becomes easier to process after repeated exposure. More controversial and less replicated is the claim that this adaptation towards a locally frequent structure occurs due to a strategic shifting of expectations to match short-term statistical regularities such that readers adapt away from the a priori more frequent structure. Experiment 1 replicates the initial adaptation towards a coordination garden path structure using self-paced reading; however, this paradigm has been criticised for its low reliability for detecting such small effects. To this end, Experiments 2 and 3 use a combination of self-paced reading and sentence completion tasks to replicate initial adaptation towards both coordination and reduced relative garden path structures and show evidence for a preference for these structures over their a priori more frequent alternatives. Together, these data reveal that participants may be tracking local structural statistics in real time; however, they may not be able to rapidly use that information to update processing behaviours.
Collapse
Affiliation(s)
- Jack Dempsey
- Department of Educational Psychology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Qiawen Liu
- Department of Psychology, University of Wisconsin-Madison, Madison, WI, USA
| | - Kiel Christianson
- Department of Educational Psychology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science & Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
20
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
21
|
Merritt B, Bent T, Kilgore R, Eads C. Auditory free classification of gender diverse speakersa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1422-1436. [PMID: 38364044 DOI: 10.1121/10.0024521] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Accepted: 01/06/2024] [Indexed: 02/18/2024]
Abstract
Auditory attribution of speaker gender has historically been assumed to operate within a binary framework. The prevalence of gender diversity and its associated sociophonetic variability motivates an examination of how listeners perceptually represent these diverse voices. Utterances from 30 transgender (1 agender individual, 15 non-binary individuals, 7 transgender men, and 7 transgender women) and 30 cisgender (15 men and 15 women) speakers were used in an auditory free classification paradigm, in which cisgender listeners classified the speakers on perceived general similarity and gender identity. Multidimensional scaling of listeners' classifications revealed two-dimensional solutions as the best fit for general similarity classifications. The first dimension was interpreted as masculinity/femininity, where listeners organized speakers from high to low fundamental frequency and first formant frequency. The second was interpreted as gender prototypicality, where listeners separated speakers with fundamental frequency and first formant frequency at upper and lower extreme values from more intermediate values. Listeners' classifications for gender identity collapsed into a one-dimensional space interpreted as masculinity/femininity. Results suggest that listeners engage in fine-grained analysis of speaker gender that cannot be adequately captured by a gender dichotomy. Further, varying terminology used in instructions may bias listeners' gender judgements.
Collapse
Affiliation(s)
- Brandon Merritt
- Department of Speech, Language, and Hearing Sciences, The University of Texas at El Paso, El Paso, Texas 79968, USA
| | - Tessa Bent
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| | - Rowan Kilgore
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| | - Cameron Eads
- Department of Speech, Language and Hearing Sciences, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
22
|
Elkhafif B, Havelka J, Burke MR, Weighall A. Are syntactic representations similar in both reading and listening? Evidence from priming in first and second languages. Q J Exp Psychol (Hove) 2024; 77:160-173. [PMID: 36802975 DOI: 10.1177/17470218231159588] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2023]
Abstract
It is unclear to what extent natural differences between reading and listening result in differences in the syntactic representations formed in each modality. The present study investigated the occurrence of syntactic priming bidirectionally from reading to listening, and vice versa to examine whether reading and listening share the same syntactic representations in both first language (L1) and second language (L2). Participants performed a lexical decision task in which the experimental words were embedded in sentences with either an ambiguous or a familiar structure. These structures were alternated to produce a priming effect. The modality was manipulated whereby participants (a) first read part of the sentence list and then listened to the rest of the list (reading-listening group), or (b) listened and then read (listening-reading group). In addition, the study involved two within-modality lists in which participants either read or listened to the whole list. The L1 group showed within-modal priming in both listening and reading as well as a cross-modal priming effect. Although L2 speakers showed priming in reading, the effect was absent in listening and weak in the listening-reading condition. The absence of priming in L2 listening was attributed to difficulties in L2 listening rather than to an inability to produce abstract priming.
Collapse
Affiliation(s)
- Basma Elkhafif
- School of Psychology, University of Leeds, Leeds, UK
- Department of Educational Psychology, Faculty of Education, Helwan University, Cairo, Egypt
| | | | | | - Anna Weighall
- School of Education, University of Sheffield, Sheffield, UK
| |
Collapse
|
23
|
Tzeng CY, Russell ML, Nygaard LC. Attention modulates perceptual learning of non-native-accented speech. Atten Percept Psychophys 2024; 86:339-353. [PMID: 37872434 DOI: 10.3758/s13414-023-02790-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/11/2023] [Indexed: 10/25/2023]
Abstract
Listeners readily adapt to variation in non-native-accented speech, learning to disambiguate between talker-specific and accent-based variation. We asked (1) which linguistic and indexical features of the spoken utterance are relevant for this learning to occur and (2) whether task-driven attention to these features affects the extent to which learning generalizes to novel utterances and voices. In two experiments, listeners heard English sentences (Experiment 1) or words (Experiment 2) produced by Spanish-accented talkers during an exposure phase. Listeners' attention was directed to lexical content (transcription), indexical cues (talker identification), or both (transcription + talker identification). In Experiment 1, listeners' test transcription of novel English sentences spoken by Spanish-accented talkers showed generalized perceptual learning to previously unheard voices and utterances for all training conditions. In Experiment 2, generalized learning occurred only in the transcription + talker identification condition, suggesting that attention to both linguistic and indexical cues optimizes listeners' ability to distinguish between individual talker- and group-based variation, especially with the reduced availability of sentence-length prosodic information. Collectively, these findings highlight the role of attentional processes in the encoding of speech input and underscore the interdependency of indexical and lexical characteristics in spoken language processing.
Collapse
Affiliation(s)
- Christina Y Tzeng
- Department of Psychology, San José State University, 1 Washington Sq, San José, CA, 95192, USA.
| | - Marissa L Russell
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
| | - Lynne C Nygaard
- Department of Psychology, Emory University, Atlanta, GA, USA
| |
Collapse
|
24
|
Meylan SC, Foushee R, Wong NH, Bergelson E, Levy RP. How adults understand what young children say. Nat Hum Behav 2023; 7:2111-2125. [PMID: 37884678 PMCID: PMC11033618 DOI: 10.1038/s41562-023-01698-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 08/11/2023] [Indexed: 10/28/2023]
Abstract
Children's early speech often bears little resemblance to that of adults, and yet parents and other caregivers are able to interpret that speech and react accordingly. Here we investigate how adult listeners' inferences reflect sophisticated beliefs about what children are trying to communicate, as well as how children are likely to pronounce words. Using a Bayesian framework for modelling spoken word recognition, we find that computational models can replicate adult interpretations of children's speech only when they include strong, context-specific prior expectations about the messages that children will want to communicate. This points to a critical role of adult cognitive processes in supporting early communication and reveals how children can actively prompt adults to take actions on their behalf even when they have only a nascent understanding of the adult language. We discuss the wide-ranging implications of the powerful listening capabilities of adults for theories of first language acquisition.
Collapse
Affiliation(s)
- Stephan C Meylan
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Department of Psychology and Neuroscience, Duke University, Durham, NC, USA.
| | - Ruthe Foushee
- Department of Psychology, University of Chicago, Chicago, IL, USA
| | - Nicole H Wong
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Elika Bergelson
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Roger P Levy
- Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
25
|
Petrova K, Jasmin K, Saito K, Tierney AT. Extensive residence in a second language environment modifies perceptual strategies for suprasegmental categorization. J Exp Psychol Learn Mem Cogn 2023; 49:1943-1955. [PMID: 38127498 PMCID: PMC10734206 DOI: 10.1037/xlm0001246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 02/08/2023] [Accepted: 03/06/2023] [Indexed: 12/23/2023]
Abstract
Languages differ in the importance of acoustic dimensions for speech categorization. This poses a potential challenge for second language (L2) learners, and the extent to which adult L2 learners can acquire new perceptual strategies for speech categorization remains unclear. This study investigated the effects of extensive English L2 immersion on speech perception strategies and dimension-selective-attention ability in native Mandarin speakers. Experienced first language (L1) Mandarin speakers (length of U.K. residence > 3 years) demonstrated more native-like weighting of cues to L2 suprasegmental categorization relative to inexperienced Mandarin speakers (length of residence < 1 year), weighting duration more highly. However, both the experienced and the inexperienced Mandarin speakers continued to weight duration less highly and pitch more highly during musical beat categorization and struggled to ignore pitch and selectively attend to amplitude in speech, relative to native English speakers. These results suggest that adult L2 experience can lead to retuning of perceptual strategies in specific contexts, but global acoustic salience is more resistant to change. (PsycInfo Database Record (c) 2023 APA, all rights reserved).
Collapse
Affiliation(s)
- Katya Petrova
- Department of Culture, Communication & Media, Institute of Education, University College London
| | - Kyle Jasmin
- Department of Psychology, Royal Holloway University of London
| | - Kazuya Saito
- Department of Culture, Communication & Media, Institute of Education, University College London
| | - Adam T Tierney
- Department of Psychological Sciences, Birkbeck University of London
| |
Collapse
|
26
|
Borrie SA, Hepworth TJ, Wynn CJ, Hustad KC, Barrett TS, Lansford KL. Perceptual Learning of Dysarthria in Adolescence. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:3791-3803. [PMID: 37616225 PMCID: PMC10713018 DOI: 10.1044/2023_jslhr-23-00231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 05/28/2023] [Accepted: 06/20/2023] [Indexed: 08/26/2023]
Abstract
PURPOSE As evidenced by perceptual learning studies involving adult listeners and speakers with dysarthria, adaptation to dysarthric speech is driven by signal predictability (speaker property) and a flexible speech perception system (listener property). Here, we extend adaptation investigations to adolescent populations and examine whether adult and adolescent listeners can learn to better understand an adolescent speaker with dysarthria. METHOD Classified by developmental stage, adult (n = 42) and adolescent (n = 40) listeners completed a three-phase perceptual learning protocol (pretest, familiarization, and posttest). During pretest and posttest, all listeners transcribed speech produced by a 13-year-old adolescent with spastic dysarthria associated with cerebral palsy. During familiarization, half of the adult and adolescent listeners engaged in structured familiarization (audio and lexical feedback) with the speech of the adolescent speaker with dysarthria; and the other half, with the speech of a neurotypical adolescent speaker (control). RESULTS Intelligibility scores increased from pretest to posttest for all listeners. However, listeners who received dysarthria familiarization achieved greater intelligibility improvements than those who received control familiarization. Furthermore, there was a significant effect of developmental stage, where the adults achieved greater intelligibility improvements relative to the adolescents. CONCLUSIONS This study provides the first tranche of evidence that adolescent dysarthric speech is learnable-a finding that holds even for adolescent listeners whose speech perception systems are not yet fully developed. Given the formative role that social interactions play during adolescence, these findings of improved intelligibility afford important clinical implications.
Collapse
Affiliation(s)
- Stephanie A. Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Taylor J. Hepworth
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| | - Camille J. Wynn
- Department of Communication Science and Disorders, University of Houston
| | - Katherine C. Hustad
- Waisman Center, University of Wisconsin–Madison
- Department of Communication Sciences and Disorders, University of Wisconsin–Madison
| | | | - Kaitlin L. Lansford
- Department of Communication Science and Disorders, Florida State University, Tallahassee
| |
Collapse
|
27
|
Aoki NB, Zellou G. Visual information affects adaptation to novel talkers: Ethnicity-specific and ethnicity-independent learning of L2-accented speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2290-2304. [PMID: 37843380 DOI: 10.1121/10.0021289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/13/2023] [Accepted: 09/13/2023] [Indexed: 10/17/2023]
Abstract
Prior work demonstrates that exposure to speakers of the same accent facilitates comprehension of a novel talker with the same accent (accent-specific learning). Moreover, exposure to speakers of multiple different accents enhances understanding of a talker with a novel accent (accent-independent learning). Although bottom-up acoustic information about accent constrains adaptation to novel talkers, the effect of top-down social information remains unclear. The current study examined effects of apparent ethnicity on adaptation to novel L2-accented ("non-native") talkers while keeping bottom-up information constant. Native English listeners transcribed sentences in noise for three Mandarin-accented English speakers and then a fourth (novel) Mandarin-accented English speaker. Transcription accuracy of the novel talker improves when: all speakers are presented with east Asian faces (ethnicity-specific learning); the exposure speakers are paired with different, non-east Asian ethnicities and the novel talker has an east Asian face (ethnicity-independent learning). However, accuracy does not improve when all speakers have White faces or when the exposure speakers have White faces and the test talker has an east Asian face. This study demonstrates that apparent ethnicity affects adaptation to novel L2-accented talkers, thus underscoring the importance of social expectations in perceptual learning and cross-talker generalization.
Collapse
Affiliation(s)
- Nicholas B Aoki
- Department of Linguistics, University of California Davis, Davis, California 95616, USA
| | - Georgia Zellou
- Department of Linguistics, University of California Davis, Davis, California 95616, USA
| |
Collapse
|
28
|
Shorey AE, King CJ, Theodore RM, Stilp CE. Talker adaptation or "talker" adaptation? Musical instrument variability impedes pitch perception. Atten Percept Psychophys 2023; 85:2488-2501. [PMID: 37258892 DOI: 10.3758/s13414-023-02722-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/26/2023] [Indexed: 06/02/2023]
Abstract
Listeners show perceptual benefits (faster and/or more accurate responses) when perceiving speech spoken by a single talker versus multiple talkers, known as talker adaptation. While near-exclusively studied in speech and with talkers, some aspects of talker adaptation might reflect domain-general processes. Music, like speech, is a sound class replete with acoustic variation, such as a multitude of pitch and instrument possibilities. Thus, it was hypothesized that perceptual benefits from structure in the acoustic signal (i.e., hearing the same sound source on every trial) are not specific to speech but rather a general auditory response. Forty nonmusician participants completed a simple musical task that mirrored talker adaptation paradigms. Low- or high-pitched notes were presented in single- and mixed-instrument blocks. Reflecting both music research on pitch and timbre interdependence and mirroring traditional "talker" adaptation paradigms, listeners were faster to make their pitch judgments when presented with a single instrument timbre relative to when the timbre was selected from one of four instruments from trial to trial. A second experiment ruled out the possibility that participants were responding faster to the specific instrument chosen as the single-instrument timbre. Consistent with general theoretical approaches to perception, perceptual benefits from signal structure are not limited to speech.
Collapse
Affiliation(s)
- Anya E Shorey
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA.
| | - Caleb J King
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 2 Alethia Drive, Unit 1085, Storrs, CT, 06269-1085, USA
- Connecticut Institute for the Brain and Cognitive Sciences, University of Connecticut, 337 Mansfield Road, Unit 1272, Storrs, CT, 06269-1272, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, 317 Life Sciences Building, Louisville, KY, 40272, USA
| |
Collapse
|
29
|
Charoy J, Samuel AG. Bad maps may not always get you lost: Lexically driven perceptual recalibration for substituted phonemes. Atten Percept Psychophys 2023; 85:2437-2458. [PMID: 37264293 PMCID: PMC10234583 DOI: 10.3758/s13414-023-02725-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 06/03/2023]
Abstract
The speech perception system adjusts its phoneme categories based on the current speech input and lexical context. This is known as lexically driven perceptual recalibration, and it is often assumed to underlie accommodation to non-native accented speech. However, recalibration studies have focused on maximally ambiguous sounds (e.g., a sound ambiguous between "sh" and "s" in a word like "superpower"), a scenario that does not represent the full range of variation present in accented speech. Indeed, non-native speakers sometimes completely substitute a phoneme for another, rather than produce an ambiguous segment (e.g., saying "shuperpower"). This has been called a "bad map" in the literature. In this study, we scale up the lexically driven recalibration paradigm to such cases. Because previous research suggests that the position of the critically accented phoneme modulates the success of recalibration, we include such a manipulation in our study. And to ensure that participants treat all critical items as words (an important point for successful recalibration), we use a new exposure task that incentivizes them to do so. Our findings suggest that while recalibration is most robust after exposure to ambiguous sounds, it also occurs after exposure to bad maps. But interestingly, positional effects may be reversed: recalibration was more likely for ambiguous sounds late in words, but more likely for bad maps occurring early in words. Finally, a comparison of an online versus in-lab version of these conditions shows that experimental setting may have a non-trivial effect on the results of recalibration studies.
Collapse
Affiliation(s)
- Jeanne Charoy
- Department of Psychology, Stony Brook University, New York, NY, USA.
| | - Arthur G Samuel
- Department of Psychology, Stony Brook University, New York, NY, USA
- Basque Center on Cognition Brain and Language, Donostia-San Sebastian, Spain
- IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| |
Collapse
|
30
|
Benjamin AS, Kording KP. A role for cortical interneurons as adversarial discriminators. PLoS Comput Biol 2023; 19:e1011484. [PMID: 37768890 PMCID: PMC10538760 DOI: 10.1371/journal.pcbi.1011484] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/31/2023] [Indexed: 09/30/2023] Open
Abstract
The brain learns representations of sensory information from experience, but the algorithms by which it does so remain unknown. One popular theory formalizes representations as inferred factors in a generative model of sensory stimuli, meaning that learning must improve this generative model and inference procedure. This framework underlies many classic computational theories of sensory learning, such as Boltzmann machines, the Wake/Sleep algorithm, and a more recent proposal that the brain learns with an adversarial algorithm that compares waking and dreaming activity. However, in order for such theories to provide insights into the cellular mechanisms of sensory learning, they must be first linked to the cell types in the brain that mediate them. In this study, we examine whether a subtype of cortical interneurons might mediate sensory learning by serving as discriminators, a crucial component in an adversarial algorithm for representation learning. We describe how such interneurons would be characterized by a plasticity rule that switches from Hebbian plasticity during waking states to anti-Hebbian plasticity in dreaming states. Evaluating the computational advantages and disadvantages of this algorithm, we find that it excels at learning representations in networks with recurrent connections but scales poorly with network size. This limitation can be partially addressed if the network also oscillates between evoked activity and generative samples on faster timescales. Consequently, we propose that an adversarial algorithm with interneurons as discriminators is a plausible and testable strategy for sensory learning in biological systems.
Collapse
Affiliation(s)
- Ari S. Benjamin
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Konrad P. Kording
- Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
31
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
32
|
Wolfrum V, Lehner K, Heim S, Ziegler W. Clinical Assessment of Communication-Related Speech Parameters in Dysarthria: The Impact of Perceptual Adaptation. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023:1-21. [PMID: 37486782 DOI: 10.1044/2023_jslhr-23-00105] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/26/2023]
Abstract
PURPOSE In current clinical practice, intelligibility of dysarthric speech is commonly assessed by speech-language therapists (SLTs), in most cases by the therapist caring for the patient being diagnosed. Since SLTs are familiar with dysarthria in general and with the speech of the individual patient to be assessed in particular, they have an adaptation advantage in understanding the patient's utterances. We examined whether and how listeners' assessments of communication-related speech parameters vary as a function of their familiarity with dysarthria in general and with the diagnosed patients in particular. METHOD Intelligibility, speech naturalness, and perceived listener effort were assessed in 20 persons with dysarthria (PWD). Patients' speech samples were judged by the individual treating therapists, five dysarthria experts who were unfamiliar with the patients, and crowdsourced naïve listeners. Adaptation effects were analyzed using (a) linear mixed models of overall scoring levels, (b) regression models of severity dependence, (c) network analyses of between-listener and between-parameter relationships, and (d) measures of intra- and interobserver consistency. RESULTS Significant advantages of dysarthria experts over laypeople were found in all parameters. An overall advantage of the treating therapists over nonfamiliar experts was only seen in listening effort. Severity-dependent adaptation effects occurred in all parameters. The therapists' responses were heterogeneous and inconsistent with those of the unfamiliar experts and the naïve listeners. CONCLUSIONS The way SLTs evaluate communication-relevant speech parameters of the PWD whom they care for is influenced not only by adaptation benefits but also by therapeutic biases. This finding weakens the validity of assessments of communication-relevant speech parameters by the treating therapists themselves and encourages the development and use of alternative methods.
Collapse
Affiliation(s)
- Vera Wolfrum
- Department of Neurology, Faculty of Medicine, RWTH Aachen University, Germany
| | - Katharina Lehner
- Clinical Neuropsychology Research Group, Institute for Phonetics and Speech Processing, Ludwig Maximilian University of Munich, Germany
| | - Stefan Heim
- Department of Psychiatry, Psychotherapy, and Psychosomatics, Faculty of Medicine, RWTH Aachen University, Germany
- Research Center Jülich, Institute of Neurosciences and Medicine (INM-1), Germany
- JARA - Translational Brain Medicine, Aachen, Germany
| | - Wolfram Ziegler
- Clinical Neuropsychology Research Group, Institute for Phonetics and Speech Processing, Ludwig Maximilian University of Munich, Germany
| |
Collapse
|
33
|
Floegel M, Kasper J, Perrier P, Kell CA. How the conception of control influences our understanding of actions. Nat Rev Neurosci 2023; 24:313-329. [PMID: 36997716 DOI: 10.1038/s41583-023-00691-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/28/2023] [Indexed: 04/01/2023]
Abstract
Wilful movement requires neural control. Commonly, neural computations are thought to generate motor commands that bring the musculoskeletal system - that is, the plant - from its current physical state into a desired physical state. The current state can be estimated from past motor commands and from sensory information. Modelling movement on the basis of this concept of plant control strives to explain behaviour by identifying the computational principles for control signals that can reproduce the observed features of movements. From an alternative perspective, movements emerge in a dynamically coupled agent-environment system from the pursuit of subjective perceptual goals. Modelling movement on the basis of this concept of perceptual control aims to identify the controlled percepts and their coupling rules that can give rise to the observed characteristics of behaviour. In this Perspective, we discuss a broad spectrum of approaches to modelling human motor control and their notions of control signals, internal models, handling of sensory feedback delays and learning. We focus on the influence that the plant control and the perceptual control perspective may have on decisions when modelling empirical data, which may in turn shape our understanding of actions.
Collapse
Affiliation(s)
- Mareike Floegel
- Department of Neurology and Brain Imaging Center, Goethe University Frankfurt, Frankfurt, Germany
| | - Johannes Kasper
- Department of Neurology and Brain Imaging Center, Goethe University Frankfurt, Frankfurt, Germany
| | - Pascal Perrier
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, Grenoble, France
| | - Christian A Kell
- Department of Neurology and Brain Imaging Center, Goethe University Frankfurt, Frankfurt, Germany.
| |
Collapse
|
34
|
Holmes E, Johnsrude IS. Intelligibility benefit for familiar voices is not accompanied by better discrimination of fundamental frequency or vocal tract length. Hear Res 2023; 429:108704. [PMID: 36701896 DOI: 10.1016/j.heares.2023.108704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 11/11/2022] [Accepted: 01/19/2023] [Indexed: 01/21/2023]
Abstract
Speech is more intelligible when it is spoken by familiar than unfamiliar people. If this benefit arises because key voice characteristics like perceptual correlates of fundamental frequency or vocal tract length (VTL) are more accurately represented for familiar voices, listeners may be able to discriminate smaller manipulations to such characteristics for familiar than unfamiliar voices. We measured participants' (N = 17) thresholds for discriminating pitch (correlate of fundamental frequency, or glottal pulse rate) and formant spacing (correlate of VTL; 'VTL-timbre') for voices that were familiar (participants' friends) and unfamiliar (other participants' friends). As expected, familiar voices were more intelligible. However, discrimination thresholds were no smaller for the same familiar voices. The size of the intelligibility benefit for a familiar over an unfamiliar voice did not relate to the difference in discrimination thresholds for the same voices. Also, the familiar-voice intelligibility benefit was just as large following perceptible manipulations to pitch and VTL-timbre. These results are more consistent with cognitive accounts of speech perception than traditional accounts that predict better discrimination.
Collapse
Affiliation(s)
- Emma Holmes
- Department of Speech Hearing and Phonetic Sciences, UCL, London WC1N 1PF, UK; Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada.
| | - Ingrid S Johnsrude
- Brain and Mind Institute, University of Western Ontario, London, Ontario N6A 3K7, Canada; School of Communication Sciences and Disorders, University of Western Ontario, London, Ontario N6G 1H1, Canada
| |
Collapse
|
35
|
Hearing is believing: Lexically guided perceptual learning is graded to reflect the quantity of evidence in speech input. Cognition 2023; 235:105404. [PMID: 36812836 DOI: 10.1016/j.cognition.2023.105404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/29/2022] [Accepted: 02/07/2023] [Indexed: 02/22/2023]
Abstract
There is wide variability in the acoustic patterns that are produced for a given linguistic message, including variability that is conditioned on who is speaking. Listeners solve this lack of invariance problem, at least in part, by dynamically modifying the mapping to speech sounds in response to structured variation in the input. Here we test a primary tenet of the ideal adapter framework of speech adaptation, which posits that perceptual learning reflects the incremental updating of cue-sound mappings to incorporate observed evidence with prior beliefs. Our investigation draws on the influential lexically guided perceptual learning paradigm. During an exposure phase, listeners heard a talker who produced fricative energy ambiguous between /ʃ/ and /s/. Lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/, and, across two behavioral experiments (n = 500), we manipulated the quantity of evidence and the consistency of evidence that was provided during exposure. Following exposure, listeners categorized tokens from an ashi - asi continuum to assess learning. The ideal adapter framework was formalized through computational simulations, which predicted that learning would be graded to reflect the quantity, but not the consistency, of the exposure input. These predictions were upheld in human listeners; the magnitude of the learning effect monotonically increased given exposure to four, 10, or 20 critical productions, and there was no evidence that learning differed given consistent versus inconsistent exposure. These results (1) provide support for a primary tenet of the ideal adapter framework, (2) establish quantity of evidence as a key determinant of adaptation in human listeners, and (3) provide critical evidence that lexically guided perceptual learning is not a binary outcome. In doing so, the current work provides foundational knowledge to support theoretical advances that consider perceptual learning as a graded outcome that is tightly linked to input statistics in the speech stream.
Collapse
|
36
|
Pezzelle S, Fernández R. Semantic Adaptation to the Interpretation of Gradable Adjectives via Active Linguistic Interaction. Cogn Sci 2023; 47:e13248. [PMID: 36739522 PMCID: PMC10078314 DOI: 10.1111/cogs.13248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 11/21/2022] [Accepted: 12/18/2022] [Indexed: 02/06/2023]
Abstract
When communicating, people adapt their linguistic representations to those of their interlocutors. Previous studies have shown that this also occurs at the semantic level for vague and context-dependent terms such as quantifiers and uncertainty expressions. However, work to date has mostly focused on passive exposure to a given speaker's interpretation, without considering the possible role of active linguistic interaction. In this study, we focus on gradable adjectives big and small and develop a novel experimental paradigm that allows participants to ask clarification questions to figure out their interlocutor's interpretation. We find that, when in doubt, speakers do resort to this strategy, despite its inherent cognitive cost, and that doing so results in higher semantic alignment measured in terms of communicative success. While not all question-answer pairs are equally informative, we show that speakers become better questioners as the interaction progresses. Yet, the higher semantic alignment observed when speakers are able to ask questions does not increase over time. This suggests that conversational interaction's key advantage may be to boost coordination without committing to long-term semantic updates. Our findings shed new light on the mechanisms used by speakers to achieve semantic alignment and on how language is shaped by communication.
Collapse
Affiliation(s)
- Sandro Pezzelle
- Institute for Logic, Language and Computation, University of Amsterdam
| | - Raquel Fernández
- Institute for Logic, Language and Computation, University of Amsterdam
| |
Collapse
|
37
|
Sachdeva S, Ruan H, Hamarneh G, Behne DM, Jongman A, Sereno JA, Wang Y. Plain-to-clear speech video conversion for enhanced intelligibility. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY 2023; 26:163-184. [PMID: 37008883 PMCID: PMC10042924 DOI: 10.1007/s10772-023-10018-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2022] [Accepted: 01/08/2023] [Indexed: 06/19/2023]
Abstract
Clearly articulated speech, relative to plain-style speech, has been shown to improve intelligibility. We examine if visible speech cues in video only can be systematically modified to enhance clear-speech visual features and improve intelligibility. We extract clear-speech visual features of English words varying in vowels produced by multiple male and female talkers. Via a frame-by-frame image-warping based video generation method with a controllable parameter (displacement factor), we apply the extracted clear-speech visual features to videos of plain speech to synthesize clear speech videos. We evaluate the generated videos using a robust, state of the art AI Lip Reader as well as human intelligibility testing. The contributions of this study are: (1) we successfully extract relevant visual cues for video modifications across speech styles, and have achieved enhanced intelligibility for AI; (2) this work suggests that universal talker-independent clear-speech features may be utilized to modify any talker's visual speech style; (3) we introduce "displacement factor" as a way of systematically scaling the magnitude of displacement modifications between speech styles; and (4) the high definition generated videos make them ideal candidates for human-centric intelligibility and perceptual training studies.
Collapse
Affiliation(s)
- Shubam Sachdeva
- Language and Brain Lab, Department of Linguistics, Simon Fraser University, Burnaby, BC Canada
| | - Haoyao Ruan
- Language and Brain Lab, Department of Linguistics, Simon Fraser University, Burnaby, BC Canada
| | - Ghassan Hamarneh
- Medical Image, Analysis Research Group, School of Computing Science, Simon Fraser University, Burnaby, BC Canada
| | - Dawn M. Behne
- NTNU Speech Lab, Department of Psychology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Allard Jongman
- KU Phonetics and Psycholinguistics Lab, Department of Linguistics, University of Kansas, Lawrence, KS USA
| | - Joan A. Sereno
- KU Phonetics and Psycholinguistics Lab, Department of Linguistics, University of Kansas, Lawrence, KS USA
| | - Yue Wang
- Language and Brain Lab, Department of Linguistics, Simon Fraser University, Burnaby, BC Canada
| |
Collapse
|
38
|
Lansford KL, Barrett TS, Borrie SA. Cognitive Predictors of Perception and Adaptation to Dysarthric Speech in Young Adult Listeners. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:30-47. [PMID: 36480697 PMCID: PMC10023189 DOI: 10.1044/2022_jslhr-22-00391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/09/2022] [Accepted: 09/02/2022] [Indexed: 06/17/2023]
Abstract
PURPOSE Although recruitment of cognitive-linguistic resources to support dysarthric speech perception and adaptation is presumed by theoretical accounts of effortful listening and supported by cross-disciplinary empirical findings, prospective relationships have received limited attention in the disordered speech literature. This study aimed to examine the predictive relationships between cognitive-linguistic parameters and intelligibility outcomes associated with familiarization with dysarthric speech in young adult listeners. METHOD A cohort of 156 listener participants between the ages of 18 and 50 years completed a three-phase perceptual training protocol (pretest, training, and posttest) with one of three speakers with dysarthria. Additionally, listeners completed the National Institutes of Health Toolbox Cognition Battery to obtain measures of the following cognitive-linguistic constructs: working memory, inhibitory control of attention, cognitive flexibility, processing speed, and vocabulary knowledge. RESULTS Elastic net regression models revealed that select cognitive-linguistic measures and their two-way interactions predicted both initial intelligibility and intelligibility improvement of dysarthric speech. While some consistency across models was shown, unique constellations of select cognitive factors and their interactions predicted initial intelligibility and intelligibility improvement of the three different speakers with dysarthria. CONCLUSIONS Current findings extend empirical support for theoretical models of speech perception in adverse listening conditions to dysarthric speech signals. Although predictive relationships were complex, vocabulary knowledge, working memory, and cognitive flexibility often emerged as important variables across the models.
Collapse
Affiliation(s)
- Kaitlin L. Lansford
- School of Communication Science & Disorders, Florida State University, Tallahassee
| | | | - Stephanie A. Borrie
- Department of Communicative Disorders and Deaf Education, Utah State University, Logan
| |
Collapse
|
39
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
40
|
Pourhashemi F, Baart M, van Laarhoven T, Vroomen J. Want to quickly adapt to distorted speech and become a better listener? Read lips, not text. PLoS One 2022; 17:e0278986. [PMID: 36580461 PMCID: PMC9799298 DOI: 10.1371/journal.pone.0278986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Accepted: 11/28/2022] [Indexed: 12/30/2022] Open
Abstract
When listening to distorted speech, does one become a better listener by looking at the face of the speaker or by reading subtitles that are presented along with the speech signal? We examined this question in two experiments in which we presented participants with spectrally distorted speech (4-channel noise-vocoded speech). During short training sessions, listeners received auditorily distorted words or pseudowords that were partially disambiguated by concurrently presented lipread information or text. After each training session, listeners were tested with new degraded auditory words. Learning effects (based on proportions of correctly identified words) were stronger if listeners had trained with words rather than with pseudowords (a lexical boost), and adding lipread information during training was more effective than adding text (a lipread boost). Moreover, the advantage of lipread speech over text training was also found when participants were tested more than a month later. The current results thus suggest that lipread speech may have surprisingly long-lasting effects on adaptation to distorted speech.
Collapse
Affiliation(s)
- Faezeh Pourhashemi
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Martijn Baart
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
- BCBL, Basque Center on Cognition, Brain, and Language, Donostia, Spain
- * E-mail:
| | - Thijs van Laarhoven
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| | - Jean Vroomen
- Dept. of Cognitive Neuropsychology, Tilburg University, Tilburg, The Netherlands
| |
Collapse
|
41
|
Apfelbaum KS, Kutlu E, McMurray B, Kapnoula EC. Don't force it! Gradient speech categorization calls for continuous categorization tasks. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3728. [PMID: 36586841 PMCID: PMC9894657 DOI: 10.1121/10.0015201] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 09/12/2022] [Accepted: 10/20/2022] [Indexed: 05/29/2023]
Abstract
Research on speech categorization and phoneme recognition has relied heavily on tasks in which participants listen to stimuli from a speech continuum and are asked to either classify each stimulus (identification) or discriminate between them (discrimination). Such tasks rest on assumptions about how perception maps onto discrete responses that have not been thoroughly investigated. Here, we identify critical challenges in the link between these tasks and theories of speech categorization. In particular, we show that patterns that have traditionally been linked to categorical perception could arise despite continuous underlying perception and that patterns that run counter to categorical perception could arise despite underlying categorical perception. We describe an alternative measure of speech perception using a visual analog scale that better differentiates between processes at play in speech categorization, and we review some recent findings that show how this task can be used to better inform our theories.
Collapse
Affiliation(s)
- Keith S Apfelbaum
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Ethan Kutlu
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Bob McMurray
- Department of Psychological and Brain Sciences, G60 Psychological and Brain Sciences Building, University of Iowa, Iowa City, Iowa 52242-1407, USA
| | - Efthymia C Kapnoula
- BCBL, Basque Center on Cognition, Brain and Language, Mikeletegi 69, 20009 Donostia, Spain
| |
Collapse
|
42
|
McMurray B. The myth of categorical perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3819. [PMID: 36586868 PMCID: PMC9803395 DOI: 10.1121/10.0016614] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Revised: 11/26/2022] [Accepted: 12/06/2022] [Indexed: 05/29/2023]
Abstract
Categorical perception (CP) is likely the single finding from speech perception with the biggest impact on cognitive science. However, within speech perception, it is widely known to be an artifact of task demands. CP is empirically defined as a relationship between phoneme identification and discrimination. As discrimination tasks do not appear to require categorization, this was thought to support the claim that listeners perceive speech solely in terms of linguistic categories. However, 50 years of work using discrimination tasks, priming, the visual world paradigm, and event related potentials has rejected the strongest forms of CP and provided little strong evidence for any form of it. This paper reviews the origins and impact of this scientific meme and the work challenging it. It discusses work showing that the encoding of auditory input is largely continuous, not categorical, and describes the modern theoretical synthesis in which listeners preserve fine-grained detail to enable more flexible processing. This synthesis is fundamentally inconsistent with CP. This leads to a different understanding of how to use and interpret the most basic paradigms in speech perception-phoneme identification along a continuum-and has implications for understanding language and hearing disorders, development, and multilingualism.
Collapse
Affiliation(s)
- Bob McMurray
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City, Iowa 52242, USA
| |
Collapse
|
43
|
Dole M, Vilain C, Haldin C, Baciu M, Cousin E, Lamalle L, Lœvenbruck H, Vilain A, Schwartz JL. Comparing the selectivity of vowel representations in cortical auditory vs. motor areas: A repetition-suppression study. Neuropsychologia 2022; 176:108392. [DOI: 10.1016/j.neuropsychologia.2022.108392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2022] [Revised: 09/22/2022] [Accepted: 10/03/2022] [Indexed: 10/31/2022]
|
44
|
Melguy YV, Johnson K. Perceptual adaptation to a novel accent: Phonetic category expansion or category shift? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:2090. [PMID: 36319220 DOI: 10.1121/10.0014602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Accepted: 09/20/2022] [Indexed: 06/16/2023]
Abstract
Listeners can rapidly adapt to an unfamiliar accent. For example, following exposure to a speaker whose /f/ sound is ambiguous between [s] and [f], they categorize more sounds along an [s]-[f] phonetic continuum as /f/. We investigated the adaptation mechanism underlying such perceptual changes-do listeners shift the target sound in phonetic space (category shift), or do they adopt a more general mechanism of broadening the category (category expansion)? In experiment 1, we trained listeners on an accent containing ambiguous /θ/ = [θ/s] and then tested them on categorizing phonetic continua spanning [θ]-[s] or [θ]-[f]. Listeners tested on the [θ]-[s] continua showed a significant increase in proportion of /θ/ responses vs controls, while those tested on [θ]-[f] did not. Experiment 2 investigated how acoustic-phonetic similarity may modulate the mechanism underlying recalibration. Listeners were trained on the same /θ/ = [θ/s] accent as in experiment 1 but were tested on a different continuum, [θ]-[ʃ]. This time, trained listeners showed a significant increase in proportion of /θ/ responses with the novel phonetic contrast. This suggests that phonetic recalibration involves some degree of non-uniform category expansion, constrained by phonetic similarity between training and test sounds.
Collapse
Affiliation(s)
| | - Keith Johnson
- Department of Linguistics, University of California, Berkeley, Berkeley, California 94704, USA
| |
Collapse
|
45
|
Perceptual learning of multiple talkers: Determinants, characteristics, and limitations. Atten Percept Psychophys 2022; 84:2335-2359. [PMID: 36076119 DOI: 10.3758/s13414-022-02556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Research suggests that listeners simultaneously update talker-specific generative models to reflect structured phonetic variation. Because past investigations exposed listeners to talkers of different genders, it is unknown whether adaptation is talker specific or rather linked to a broader sociophonetic class. Here, we test determinants of listeners' ability to update and apply talker-specific models for speech perception. In six experiments (n = 480), listeners were first exposed to the speech of two talkers who produced ambiguous fricative energy. The talkers' speech was interleaved during exposure, and lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/ for each talker. At test, listeners categorized tokens from ashi-asi continua, one for each talker. Across conditions and experiments, we manipulated exposure quantity, talker gender, blocked versus interleaved talker structure at test, and the degree to which fricative acoustics differed between talkers. When test was blocked by talker, learning was observed for different but not same gender talkers. When talkers were interleaved at test, learning was observed for both different and same gender talkers, which was attenuated when fricative acoustics were constant across talkers. There was no strong evidence to suggest that adaptation to multiple talkers required increased quantity of exposure beyond that required to adapt to a single talker. These results suggest that perceptual learning for speech is achieved via a mechanism that represents a context-dependent, cumulative integration of experience with speech input and identity critical constraints on listeners' ability to dynamically apply multiple generative models in mixed talker listening environments.
Collapse
|
46
|
Nenadić F, Tucker BV, Ten Bosch L. Computational Modeling of an Auditory Lexical Decision Experiment Using DIANA. LANGUAGE AND SPEECH 2022:238309221111752. [PMID: 36000386 PMCID: PMC10394956 DOI: 10.1177/00238309221111752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
We present an implementation of DIANA, a computational model of spoken word recognition, to model responses collected in the Massive Auditory Lexical Decision (MALD) project. DIANA is an end-to-end model, including an activation and decision component that takes the acoustic signal as input, activates internal word representations, and outputs lexicality judgments and estimated response latencies. Simulation 1 presents the process of creating acoustic models required by DIANA to analyze novel speech input. Simulation 2 investigates DIANA's performance in determining whether the input signal is a word present in the lexicon or a pseudoword. In Simulation 3, we generate estimates of response latency and correlate them with general tendencies in participant responses in MALD data. We find that DIANA performs fairly well in free word recognition and lexical decision. However, the current approach for estimating response latency provides estimates opposite to those found in behavioral data. We discuss these findings and offer suggestions as to what a contemporary model of spoken word recognition should be able to do.
Collapse
Affiliation(s)
- Filip Nenadić
- University of Alberta, Canada; Singidunum University, Serbia
| | | | | |
Collapse
|
47
|
Lee JJ, Perrachione TK. Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
Affiliation(s)
- Jayden J Lee
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, & Hearing Sciences, Boston University, 635 Commonwealth Ave, Boston, MA, 02215, USA.
| |
Collapse
|
48
|
Krumbiegel J, Ufer C, Blank H. Influence of voice properties on vowel perception depends on speaker context. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:820. [PMID: 36050169 DOI: 10.1121/10.0013363] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Accepted: 07/13/2022] [Indexed: 06/15/2023]
Abstract
Different speakers produce the same intended vowel with very different physical properties. Fundamental frequency (F0) and formant frequencies (FF), the two main parameters that discriminate between voices, also influence vowel perception. While it has been shown that listeners comprehend speech more accurately if they are familiar with a talker's voice, it is still unclear how such prior information is used when decoding the speech stream. In three online experiments, we examined the influence of speaker context via F0 and FF shifts on the perception of /o/-/u/ vowel contrasts. Participants perceived vowels from an /o/-/u/ continuum shifted toward /u/ when F0 was lowered or FF increased relative to the original speaker's voice and vice versa. This shift was reduced when the speakers were presented in a block-wise context compared to random order. Conversely, the original base voice was perceived to be shifted toward /u/ when presented in the context of a low F0 or high FF speaker, compared to a shift toward /o/ with high F0 or low FF speaker context. These findings demonstrate that that F0 and FF jointly influence vowel perception in speaker context.
Collapse
Affiliation(s)
- Julius Krumbiegel
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Carina Ufer
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| | - Helen Blank
- Institute for Systems Neuroscience, University Hospital Hamburg-Eppendorf, Hamburg, Germany
| |
Collapse
|
49
|
Ozernov-Palchik O, Beach SD, Brown M, Centanni TM, Gaab N, Kuperberg G, Perrachione TK, Gabrieli JDE. Speech-specific perceptual adaptation deficits in children and adults with dyslexia. J Exp Psychol Gen 2022; 151:1556-1572. [PMID: 34843363 PMCID: PMC9148384 DOI: 10.1037/xge0001145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
According to several influential theoretical frameworks, phonological deficits in dyslexia result from reduced sensitivity to acoustic cues that are essential for the development of robust phonemic representations. Some accounts suggest that these deficits arise from impairments in rapid auditory adaptation processes that are either speech-specific or domain-general. Here, we examined the specificity of auditory adaptation deficits in dyslexia using a nonlinguistic tone anchoring (adaptation) task and a linguistic selective adaptation task in children and adults with and without dyslexia. Children and adults with dyslexia had elevated tone-frequency discrimination thresholds, but both groups benefited from anchoring to repeated stimuli to the same extent as typical readers. Additionally, although both dyslexia groups had overall reduced accuracy for speech sound identification, only the child group had reduced categorical perception for speech. Across both age groups, individuals with dyslexia had reduced perceptual adaptation to speech. These results highlight broad auditory perceptual deficits across development in individuals with dyslexia for both linguistic and nonlinguistic domains, but speech-specific adaptation deficits. Finally, mediation models in children and adults revealed that the causal pathways from basic perception and adaptation to phonological awareness through speech categorization were not significant. Thus, rather than having causal effects, perceptual deficits may co-occur with the phonological deficits in dyslexia across development. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
Affiliation(s)
- Ola Ozernov-Palchik
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Harvard Graduate School of Education, Harvard University, Cambridge, Massachusetts, USA
| | - Sara D. Beach
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| | - Meredith Brown
- Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Tracy M. Centanni
- Department of Psychology, Texas Christian University, Fort Worth, Texas, USA
| | - Nadine Gaab
- Harvard Graduate School of Education, Harvard University, Cambridge, Massachusetts, USA
| | - Gina Kuperberg
- Department of Psychology, Tufts University, Medford, Massachusetts, USA
| | - Tyler K. Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, USA
| | - John D. E. Gabrieli
- McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
- Program in Speech and Hearing Bioscience and Technology, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
50
|
Adaptation to Social-Linguistic Associations in Audio-Visual Speech. Brain Sci 2022; 12:brainsci12070845. [PMID: 35884648 PMCID: PMC9312963 DOI: 10.3390/brainsci12070845] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/23/2022] [Accepted: 06/25/2022] [Indexed: 02/04/2023] Open
Abstract
Listeners entertain hypotheses about how social characteristics affect a speaker’s pronunciation. While some of these hypotheses may be representative of a demographic, thus facilitating spoken language processing, others may be erroneous stereotypes that impede comprehension. As a case in point, listeners’ stereotypes of language and ethnicity pairings in varieties of North American English can improve intelligibility and comprehension, or hinder these processes. Using audio-visual speech this study examines how listeners adapt to speech in noise from four speakers who are representative of selected accent-ethnicity associations in the local speech community: an Asian English-L1 speaker, a white English-L1 speaker, an Asian English-L2 speaker, and a white English-L2 speaker. The results suggest congruent accent-ethnicity associations facilitate adaptation, and that the mainstream local accent is associated with a more diverse speech community.
Collapse
|