1
|
Houghton ZN, Kapatsinski V. Are your random effects normal? A simulation study of methods for estimating whether subjects or items come from more than one population by examining the distribution of random effects in mixed-effects logistic regression. Behav Res Methods 2024; 56:5557-5587. [PMID: 38017204 DOI: 10.3758/s13428-023-02287-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/01/2023] [Indexed: 11/30/2023]
Abstract
With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.
Collapse
Affiliation(s)
- Zachary N Houghton
- Department of Linguistics, University of California, Davis, Kerr Hall, Davis, CA, 95616, USA
- Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, OR, 97403, USA
| | - Vsevolod Kapatsinski
- Department of Linguistics, University of Oregon, 1290 University of Oregon, Eugene, OR, 97403, USA.
| |
Collapse
|
2
|
McDonald M, Kaushanskaya M. Bilingual Children Shift and Relax Second-Language Phoneme Categorization in Response to Accented L2 and Native L1 Speech Exposure. LANGUAGE AND SPEECH 2024; 67:617-638. [PMID: 37401753 DOI: 10.1177/00238309231176760] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/05/2023]
Abstract
Listeners adjust their perception to match that of presented speech through shifting and relaxation of categorical boundaries. This allows for processing of speech variation, but may be detrimental to processing efficiency. Bilingual children are exposed to many types of speech in their linguistic environment, including native and non-native speech. This study examined how first language (L1) Spanish/second language (L2) English bilingual children shifted and relaxed phoneme categorization along the cue of voice onset time (VOT) during English speech processing after three types of language exposure: native English exposure, native Spanish exposure, and Spanish-accented English exposure. After exposure to Spanish-accented English speech, bilingual children shifted categorical boundaries in the direction of native English speech boundaries. After exposure to native Spanish speech, children shifted to a smaller extent in the same direction and relaxed boundaries leading to weaker differentiation between categories. These results suggest that prior exposure can affect processing of a second language in bilingual children, but different mechanisms are used when adapting to different types of speech variation.
Collapse
Affiliation(s)
- Margarethe McDonald
- Department of Linguistics and School of Psychology, University of Ottawa, Canada; Waisman Center, University of Wisconsin-Madison, USA
| | - Margarita Kaushanskaya
- Department of Communication Sciences and Disorders and Waisman Center, University of Wisconsin-Madison, USA
| |
Collapse
|
3
|
Kapatsinski V, Bramlett AA, Idemaru K. What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition 2024; 249:105818. [PMID: 38772253 DOI: 10.1016/j.cognition.2024.105818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).
Collapse
Affiliation(s)
- Vsevolod Kapatsinski
- University of Oregon, Department of Linguistics, 161 Straub Hall, University of Oregon, Eugene, OR 97403-1290, United States of America.
| | - Adam A Bramlett
- Carnegie-Mellon University, Department of Modern Languages, 341 Posner Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States of America.
| | - Kaori Idemaru
- University of Oregon, Department of East Asian Languages and Literatures, 114 Friendly Hall University of Oregon, Eugene, OR 97403-1248, United States of America.
| |
Collapse
|
4
|
Murphy TK, Nozari N, Holt LL. Transfer of statistical learning from passive speech perception to speech production. Psychon Bull Rev 2024; 31:1193-1205. [PMID: 37884779 PMCID: PMC11192850 DOI: 10.3758/s13423-023-02399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2023] [Indexed: 10/28/2023]
Abstract
Communicating with a speaker with a different accent can affect one's own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In the canonical condition, /b/-/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In the reverse condition, the F0xVOT relationship reversed to create an "accent" with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners' own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners' own speech productions.
Collapse
Affiliation(s)
- Timothy K Murphy
- Department of Psychology, Carnegie Mellon University, Baker Hall, Floor 3, Frew St, Pittsburgh, PA, 15213, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, 15213, USA.
| | - Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
| | - Lori L Holt
- Department of Psychology, University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
5
|
Hodson AJ, Shinn-Cunningham BG, Holt LL. Statistical learning across passive listening adjusts perceptual weights of speech input dimensions. Cognition 2023; 238:105473. [PMID: 37210878 DOI: 10.1016/j.cognition.2023.105473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/20/2023] [Accepted: 04/24/2023] [Indexed: 05/23/2023]
Abstract
Statistical learning across passive exposure has been theoretically situated with unsupervised learning. However, when input statistics accumulate over established representations - like speech syllables, for example - there is the possibility that prediction derived from activation of rich, existing representations may support error-driven learning. Here, across five experiments, we present evidence for error-driven learning across passive speech listening. Young adults passively listened to a string of eight beer - pier speech tokens with distributional regularities following either a canonical American-English acoustic dimension correlation or a correlation reversed to create an accent. A sequence-final test stimulus assayed the perceptual weight - the effectiveness - of the secondary dimension in signaling category membership as a function of preceding sequence regularities. Perceptual weight flexibly adjusted according to the passively experienced regularities even when the preceding regularities shifted on a trial-by-trial basis. The findings align with a theoretical view that activation of established internal representations can support learning across statistical regularities via error-driven learning. At the broadest level, this suggests that not all statistical learning need be unsupervised. Moreover, these findings help to account for how cognitive systems may accommodate competing demands for flexibility and stability: instead of overwriting existing representations when short-term input distributions depart from the norms, the mapping from input to category representations may be dynamically - and rapidly - adjusted via error-driven learning from predictions derived from internal representations.
Collapse
Affiliation(s)
- Alana J Hodson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
7
|
Dong B, Liang J, Liu C. Cue Weighting in Perception of the Retroflex and Non-Retroflex Laterals in the Zibo Dialect of Chinese. Behav Sci (Basel) 2023; 13:469. [PMID: 37366720 DOI: 10.3390/bs13060469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 05/31/2023] [Accepted: 05/31/2023] [Indexed: 06/28/2023] Open
Abstract
This study investigated cue weighting in the perception of the retroflex and non-retroflex lateral contrast in the monosyllabic words /ɭə/ and /lə/ in the Zibo dialect of Chinese. A binary forced-choice identification task was carried out among 32 natives, using computer-modified natural speech situated in a two-dimensional acoustic space. The results showed that both acoustic cues had a significant main effect on lateral identification, with F1 of the following schwa being the primary cue and the consonant-tos-vowel (C/V) duration ratio as a secondary cue. No interaction effect was found between these two acoustic cues. Moreover, the results indicated that acoustic cues were not equally weighted in production and perception of the syllables /ɭə/ and /lə/ in the Zibo dialect. Future studies are suggested involving other acoustic cues (e.g., the F1 of laterals) or adding noise in the identification task to better understand listeners' listening strategies in their perception of the two laterals in the Zibo dialect.
Collapse
Affiliation(s)
- Bing Dong
- Department of English, School of Foreign Languages, Tongji University, Shanghai 200092, China
| | - Jie Liang
- Department of English, School of Foreign Languages, Tongji University, Shanghai 200092, China
| | - Chang Liu
- Department of Speech, Language, and Hearing Sciences, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
8
|
Kong EJ, Kang S. Individual Differences in Categorical Judgment of L2 Stops: A Link to Proficiency and Acoustic Cue-Weighting. LANGUAGE AND SPEECH 2023; 66:354-380. [PMID: 35822267 DOI: 10.1177/00238309221108647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
This study investigated individual differences in Korean adult learners' categorical perception of L2 English stops with an aim to explore the relationship of gradient categorizations to perceptual sensitivity to acoustic cues and L2 proficiency. Korean young adult L2 learners of English (N = 49) participated in two speech perception tasks (visual analog scaling and forced-choice identification) in which they listened to English voiced and voiceless stops and Korean lax and aspirated stops with Voice Onset Time (VOT) and F0 manipulated to form a continuum. It was found that in both L1 and L2 stop perception, listeners' gradient category judgment was associated with greater reliance on language-specific redundant cues (i.e., F0 in L2 English and VOT in L1 Korean) and that in the perception of L2 stops, categorical listeners who tended to be less sensitive to F0 were the ones with a higher level of L2 English proficiency. The results suggest that the categorical manner of judging L2 stops reflects learners' better knowledge of L2-specific acoustic cue-weightings, based on which less relevant acoustic information is effectively suppressed.
Collapse
Affiliation(s)
- Eun Jong Kong
- Department of English, Korea Aerospace University, South Korea
| | - Soyoung Kang
- School of Linguistics and Language Studies, Carleton University, Canada
| |
Collapse
|
9
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
10
|
Wu YC, Holt LL. Phonetic category activation predicts the direction and magnitude of perceptual adaptation to accented speech. J Exp Psychol Hum Percept Perform 2022; 48:913-925. [PMID: 35849375 PMCID: PMC10236200 DOI: 10.1037/xhp0001037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unfamiliar accents can systematically shift speech acoustics away from community norms and reduce comprehension. Yet, limited exposure improves comprehension. This perceptual adaptation indicates that the mapping from acoustics to speech representations is dynamic, rather than fixed. But, what drives adjustments is debated. Supervised learning accounts posit that activation of an internal speech representation via disambiguating information generates predictions about patterns of speech input typically associated with the representation. When actual input mismatches predictions, the mapping is adjusted. We tested two hypotheses of this account across consonants and vowels as listeners categorized speech conveying an English-like acoustic regularity or an artificial accent. Across conditions, signal manipulations impacted which of two acoustic dimensions best conveyed category identity, and predicted which dimension would exhibit the effects of perceptual adaptation. Moreover, the strength of phonetic category activation, as estimated by categorization responses reliant on the dominant acoustic dimension, predicted the magnitude of adaptation observed across listeners. The results align with predictions of supervised learning accounts, suggesting that perceptual adaptation arises from speech category activation, corresponding predictions about the patterns of acoustic input that align with the category, and adjustments in subsequent speech perception when input mismatches these expectations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
11
|
Liu X. Individual differences in processing non-speech acoustic signals influence cue weighting strategies for L2 speech contrasts. JOURNAL OF PSYCHOLINGUISTIC RESEARCH 2022; 51:903-916. [PMID: 35320458 DOI: 10.1007/s10936-022-09869-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Accepted: 11/09/2021] [Indexed: 06/14/2023]
Abstract
How could individual differences in processing non-speech acoustic signals influence their cue weighting strategies for L2 speech contrasts? The present study investigated this question by testing forty L1 Chinese-L2 English listeners with two tasks: one for testing the listeners' sensitivity to pitch and temporal information of non-speech acoustic signals; the other for testing their cue weighting (VOT, F0) strategies for distinguishing voicing contrasts in English stop consonants. The results showed that the more sensitive the listeners were to temporal differences of non-speech acoustic signals, the more they relied on VOT to differentiate between the voicing contrasts in English stop consonants. No such association was found between listeners' differences in sensitivity to pitch changes of non-speech acoustic signals and their reliance on F0 to cue the voicing contrasts. The results could shed light on the different processing mechanisms for pitch and temporal information of acoustic signals.
Collapse
Affiliation(s)
- Xiaoluan Liu
- Department of English, East China Normal University, 200241, Shanghai, China.
| |
Collapse
|
12
|
Grover V, Namasivayam A, Mahendra N. A Viewpoint on Accent Services: Framing and Terminology Matter. AMERICAN JOURNAL OF SPEECH-LANGUAGE PATHOLOGY 2022; 31:639-648. [PMID: 34903038 DOI: 10.1044/2021_ajslp-20-00376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
PURPOSE The purpose of this article is to offer a contemporary viewpoint on accent services and contend that an equity-minded reframing of accent services in speech-language pathology is long overdue. Such reframing should address directly the use of nonpejorative terminology and the need for nurturing global linguistic diversity and practitioner diversity in speech-language pathology. The authors offer their perspective on affirmative and least-biased accent services, an in-depth scoping review of the literature on accent modification, and discuss using terms that communicate unconditional respect for speaker identity and an understanding of the impact of accent services on accented speakers. CONCLUSIONS Given ongoing discussions about the urgent need to diversify the profession of speech-language pathology, critical attention is needed toward existing biases toward accented speakers and how such biases manifest in the way that accent services are provided as well as in how clinicians conceptualize their role in working with accented speakers. The authors conclude with discussing alternate terms and offer recommendations for accent services provided by speech-language pathologists.
Collapse
Affiliation(s)
- Vikas Grover
- Department of Speech-Language Pathology, New York Medical College, Valhalla
| | - Aravind Namasivayam
- Department of Speech-Language Pathology, University of Toronto, Ontario, Canada
| | - Nidhi Mahendra
- Department of Communicative Disorders & Sciences, San José State University, CA
| |
Collapse
|
13
|
Kapnoula EC, McMurray B. Idiosyncratic use of bottom-up and top-down information leads to differences in speech perception flexibility: Converging evidence from ERPs and eye-tracking. BRAIN AND LANGUAGE 2021; 223:105031. [PMID: 34628259 PMCID: PMC11251822 DOI: 10.1016/j.bandl.2021.105031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 07/29/2021] [Accepted: 09/22/2021] [Indexed: 06/13/2023]
Abstract
Listeners generally categorize speech sounds in a gradient manner. However, recent work, using a visual analogue scaling (VAS) task, suggests that some listeners show more categorical performance, leading to less flexible cue integration and poorer recovery from misperceptions (Kapnoula et al., 2017, 2021). We asked how individual differences in speech gradiency can be reconciled with the well-established gradiency in the modal listener, showing how VAS performance relates to both Visual World Paradigm and EEG measures of gradiency. We also investigated three potential sources of these individual differences: inhibitory control; lexical inhibition; and early cue encoding. We used the N1 ERP component to track pre-categorical encoding of Voice Onset Time (VOT). The N1 linearly tracked VOT, reflecting a fundamentally gradient speech perception; however, for less gradient listeners, this linearity was disrupted near the boundary. Thus, while all listeners are gradient, they may show idiosyncratic encoding of specific cues, affecting downstream processing.
Collapse
Affiliation(s)
- Efthymia C Kapnoula
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Basque Center on Cognition, Brain and Language, Spain.
| | - Bob McMurray
- Dept. of Psychological and Brain Sciences, University of Iowa, United States; DeLTA Center, University of Iowa, United States; Dept. of Communication Sciences and Disorders, DeLTA Center, University of Iowa, United States; Dept. of Linguistics, DeLTA Center, University of Iowa, United States
| |
Collapse
|
14
|
Ou J, Yu ACL, Xiang M. Individual Differences in Categorization Gradience As Predicted by Online Processing of Phonetic Cues During Spoken Word Recognition: Evidence From Eye Movements. Cogn Sci 2021; 45:e12948. [PMID: 33682211 DOI: 10.1111/cogs.12948] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 12/13/2020] [Accepted: 12/21/2020] [Indexed: 11/30/2022]
Abstract
Recent studies have documented substantial variability among typical listeners in how gradiently they categorize speech sounds, and this variability in categorization gradience may link to how listeners weight different cues in the incoming signal. The present study tested the relationship between categorization gradience and cue weighting across two sets of English contrasts, each varying orthogonally in two acoustic dimensions. Participants performed a four-alternative forced-choice identification task in a visual world paradigm while their eye movements were monitored. We found that (a) greater categorization gradience derived from behavioral identification responses corresponds to larger secondary cue weights derived from eye movements; (b) the relationship between categorization gradience and secondary cue weighting is observed across cues and contrasts, suggesting that categorization gradience may be a consistent within-individual property in speech perception; and (c) listeners who showed greater categorization gradience tend to adopt a buffered processing strategy, especially when cues arrive asynchronously in time.
Collapse
Affiliation(s)
- Jinghua Ou
- Department of Linguistics, University of Chicago
| | - Alan C L Yu
- Department of Linguistics, University of Chicago
| | - Ming Xiang
- Department of Linguistics, University of Chicago
| |
Collapse
|
15
|
Zhang X, Wu YC, Holt LL. The Learning Signal in Perceptual Tuning of Speech: Bottom Up Versus Top-Down Information. Cogn Sci 2021; 45:e12947. [PMID: 33682208 DOI: 10.1111/cogs.12947] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/01/2022]
Abstract
Cognitive systems face a tension between stability and plasticity. The maintenance of long-term representations that reflect the global regularities of the environment is often at odds with pressure to flexibly adjust to short-term input regularities that may deviate from the norm. This tension is abundantly clear in speech communication when talkers with accents or dialects produce input that deviates from a listener's language community norms. Prior research demonstrates that when bottom-up acoustic information or top-down word knowledge is available to disambiguate speech input, there is short-term adaptive plasticity such that subsequent speech perception is shifted even in the absence of the disambiguating information. Although such effects are well-documented, it is not yet known whether bottom-up and top-down resolution of ambiguity may operate through common processes, or how these information sources may interact in guiding the adaptive plasticity of speech perception. The present study investigates the joint contributions of bottom-up information from the acoustic signal and top-down information from lexical knowledge in the adaptive plasticity of speech categorization according to short-term input regularities. The results implicate speech category activation, whether from top-down or bottom-up sources, in driving rapid adjustment of listeners' reliance on acoustic dimensions in speech categorization. Broadly, this pattern of perception is consistent with dynamic mapping of input to category representations that is flexibly tuned according to interactive processing accommodating both lexical knowledge and idiosyncrasies of the acoustic input.
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
16
|
Jasmin K, Dick F, Holt LL, Tierney A. Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Fred Dick
- Department of Psychological Sciences
| | | | | |
Collapse
|
17
|
Abstract
Recent research demonstrates that the relationship between an acoustic dimension and speech categories is not static. Rather, it is influenced by the evolving distribution of dimensional regularity experienced across time, and specific to experienced individual sounds. Three studies examine the nature of this perceptual, dimension-based statistical learning of artificially accented [b] and [p] speech categories in online word recognition by testing generalization of learning across contexts, and testing the effect of a larger word list across which learning is induced. The results indicate that whereas learning of accented [b] and [p] generalizes across contexts, generalization to contexts not experienced in the accent is weaker even for the same speech categories [b] and [p] spoken by the same speaker. The results support a rich model of speech representation that is sensitive to context-dependent variation in the way the acoustic dimensions are related to speech categories.
Collapse
|
18
|
Schertz J, Clare EJ. Phonetic cue weighting in perception and production. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2019; 11:e1521. [DOI: 10.1002/wcs.1521] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2019] [Revised: 07/12/2019] [Accepted: 08/25/2019] [Indexed: 11/07/2022]
Affiliation(s)
- Jessamyn Schertz
- University of Toronto Department of Language Studies Mississauga Ontario Canada
| | - Emily J. Clare
- University of Toronto Department of Linguistics Toronto Ontario Canada
| |
Collapse
|
19
|
Bushong W, Jaeger TF. Dynamic re-weighting of acoustic and contextual cues in spoken word recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL135. [PMID: 31472578 PMCID: PMC7273512 DOI: 10.1121/1.5119271] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 06/24/2019] [Accepted: 07/06/2019] [Indexed: 05/25/2023]
Abstract
Listeners integrate acoustic and contextual cues during word recognition. However, experiments investigating this integration disrupt natural cue correlations. It was investigated whether changes in correlational structure affect listeners' relative cue weightings. Two groups of participants engaged in a word recognition task. In one group, acoustic (voice onset time) and contextual (lexical bias) cues followed natural correlations; in the other, cues were uncorrelated. When cues were correlated, cue weights were stable throughout the experiment; when cues were uncorrelated, contextual cues were down-weighted. Listeners thus can re-weight cues based on their statistical structure. Studies failing to account for re-weighting risk over/under-estimating cue importance.
Collapse
Affiliation(s)
- Wednesday Bushong
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, ,
| | - T Florian Jaeger
- Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627, ,
| |
Collapse
|
20
|
Schertz J, Chow CTY, Kamal NSN. The influence of tone language experience and speech style on the use of intonation in language discrimination. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:EL58. [PMID: 31370592 DOI: 10.1121/1.5117167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 06/18/2019] [Indexed: 06/10/2023]
Abstract
This work tests whether listeners' use of suprasegmental information in speech perception is modulated by language background and speech style. Native Mandarin (tone language) and Malay (non-tone language) listeners completed an AX language discrimination task with four levels of signal degradation and two speech styles. Listeners in both groups showed more benefit from pitch information in read than in spontaneous speech. Mandarin listeners showed a greater benefit than Malay listeners from the inclusion of f0 information in a segmentally degraded signal, suggesting that experience with lexical tone may extend to increased attention and/or sensitivity to phrase-level pitch contours.
Collapse
Affiliation(s)
- Jessamyn Schertz
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| | - Crystal Tze Ying Chow
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| | - Nur Sakinah Nor Kamal
- Department of Language Studies, University of Toronto Mississauga, Mississauga, Ontario L5L 1C6, , ,
| |
Collapse
|
21
|
Zhang X, Holt LL. Simultaneous tracking of coevolving distributional regularities in speech. J Exp Psychol Hum Percept Perform 2018; 44:1760-1779. [PMID: 30272462 PMCID: PMC6205888 DOI: 10.1037/xhp0000569] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Speech processing depends upon mapping variable acoustic speech input in a manner that reflects the long-term regularities of the native language. Yet, these mappings are flexible such that introduction of short-term distributional regularities in speech input, like those arising from foreign accents or talker idiosyncrasies, leads to rapid adjustments in the effectiveness of acoustic dimensions in signaling phonetic categories. The present experiments investigate whether the system is able to track simultaneous short-term distributional statistics present in speech input or if, instead, the global regularity jointly defined by these distributions dominates. Three experiments establish that adult listeners are able to track distinct simultaneously evolving regularities across time, given information to support the "binning" of acoustic instances. Both voice quality and visual information to indicate talker supported tracking of coevolving distributional regularities, even when the regularities are opposing and even when the acoustic speech tokens contributing to the distinct distributions are identical. This indicates that reweighting of perceptual dimensions in response to short-term regularities in speech input is not simply an accumulation of acoustic instances. Rather, the system is able to track multiple context-sensitive regularities simultaneously, with rapid context-dependent adaptive adjustments in how acoustic speech input maps to phonetic categories. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
22
|
Kong EJ, Lee H. Attentional Modulation and Individual Differences in Explaining the Changing Role of Fundamental Frequency in Korean Laryngeal Stop Perception. LANGUAGE AND SPEECH 2018; 61:384-408. [PMID: 28937301 DOI: 10.1177/0023830917729840] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Previous research has shown differential degrees of attention in processing hierarchical linguistic information where higher order cues require greater attention in speech processing. The current study investigated the influence of attentional resources on acoustic cue weightings in speech perception by examining Korean listeners' identifications of the three-way laryngeal stops (tense vs. lax vs. aspirated). Using a dual-task paradigm, we presented 28 adult Korean listeners with identification tasks blocked by no-distractor versus distractor conditions where arithmetic calculations distracted the listeners' speech processing. Auditory stimuli were prepared by combining voice-onset times (VOTs) and fundamental frequencies (F0s) based on natural production. Group analyses revealed that VOT was an informative parameter across the three stop laryngeal categories and the listeners' reliance on VOT was consistently reduced under the distracting condition. Subsequent individual-level analysis further showed that listeners with heavier perceptual reliance on VOT were hindered by the distractor more than others in utilizing VOT. Unlike VOT, the F0 cue did not systematically interact with the distracting listening condition. The findings indicated that VOT (but not F0) required greater attention in processing the Korean laryngeal stops, and was presumably a higher order acoustic cue than F0. The current study contributes to the understanding of attention and cue primacy in general as well as to the clarification of the relative roles of VOT and F0 for the Korean stop laryngeal contrast.
Collapse
|
23
|
Cooper A, Bradlow A. Training-induced pattern-specific phonetic adjustments by first and second language listeners. JOURNAL OF PHONETICS 2018; 68:32-49. [PMID: 30270945 PMCID: PMC6155987 DOI: 10.1016/j.wocn.2018.02.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The current study investigated the phonetic adjustment mechanisms that underlie perceptual adaptation in first and second language (Dutch-English) listeners by exposing them to a novel English accent containing controlled deviations from the standard accent (e.g. /i/-to-/ɪ/ yielding /krɪm/ instead of /krim/ for 'cream'). These deviations involved contrasts that either were contrastive or were not contrastive in Dutch. Following accent exposure with disambiguating feedback, listeners completed lexical decision and word identification tasks. Both native and second language listeners demonstrated adaptation, evidenced by higher lexical endorsement rates and word identification accuracy than untrained control listeners for items containing trained accent patterns. However, for L2 listeners, adaptation was modulated by the phonemic contrast, that is, whether or not it was contrastive in the listeners' native language. Specifically, the training-induced criterion loosening for the L2 listeners was limited to contrasts that exist in both their L1, Dutch, and L2, English. For contrasts that are either absent or neutralized in Dutch, the L2 listeners demonstrated relatively loose pre-training criteria compared to L1 listeners. The results indicate that accent exposure induces both a general increase in tolerance for atypical speech input as well as targeted adjustments to specific categories for both L1 and L2 listeners.
Collapse
Affiliation(s)
- Angela Cooper
- Correspondence to: Angela Cooper, Department of Psychology, University of Toronto, 3359 Mississauga Rd., Mississauga, ON, L5L 1C6. Tel: 647-774-8967.,
| | | |
Collapse
|
24
|
Kong EJ, Edwards J. Individual differences in categorical perception of speech: Cue weighting and executive function. JOURNAL OF PHONETICS 2016; 59:40-57. [PMID: 28503007 PMCID: PMC5423668 DOI: 10.1016/j.wocn.2016.08.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
This study examined individual differences in categorical perception and the use of multiple acoustic cues in the perception of the stop voicing contrast. Goals were to investigate whether gradiency of speech perception was related to listeners' differential sensitivity to acoustic cues and to individual differences in executive function. The experiment included two speech perception tasks (visual analogue scaling [VAS] and anticipatory eye movement [AEM]) administered to 30 English-speaking adults in two separate experimental sessions. Stimuli were a /ta/ to /da/ continuum that systematically varied VOT and f0. Findings were that some listeners had a more gradient pattern of responses on the VAS task; the listeners who had a gradient response pattern on the VAS task also showed more sensitivity to f0 on the AEM task. The patterns were consistent across individuals tested on two separate occasions. These results suggest that variability in how categorically listeners perceive speech sounds is consistent and systematic within individuals.
Collapse
Affiliation(s)
- Eun Jong Kong
- Korea Aerospace University, 100, Hanggongdae gil, Hwajeon-dong, Deogyang-gu, Goyang-city, Gyeonggi-do, South Korea 412-791
| | - Jan Edwards
- University of Wisconsin-Madison, 301 Goodnight Hall, 1975 Willow Dr., Madison, WI 53706 USA,
| |
Collapse
|
25
|
Schertz J, Cho T, Lotto A, Warner N. Individual differences in phonetic cue use in production and perception of a non-native sound contrast. JOURNAL OF PHONETICS 2015; 52:183-204. [PMID: 26644630 PMCID: PMC4669969 DOI: 10.1016/j.wocn.2015.07.003] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The current work examines native Korean speakers' perception and production of stop contrasts in their native language (L1, Korean) and second language (L2, English), focusing on three acoustic dimensions that are all used, albeit to different extents, in both languages: voice onset time (VOT), f0 at vowel onset, and closure duration. Participants used all three cues to distinguish the L1 Korean three-way stop distinction in both production and perception. Speakers' productions of the L2 English contrasts were reliably distinguished using both VOT and f0 (even though f0 is only a very weak cue to the English contrast), and, to a lesser extent, closure duration. In contrast to the relative homogeneity of the L2 productions, group patterns on a forced-choice perception task were less clear-cut, due to considerable individual differences in perceptual categorization strategies, with listeners using either primarily VOT duration, primarily f0, or both dimensions equally to distinguish the L2 English contrast. Differences in perception, which were stable across experimental sessions, were not predicted by individual variation in production patterns. This work suggests that reliance on multiple cues in representation of a phonetic contrast can form the basis for distinct individual cue-weighting strategies in phonetic categorization.
Collapse
Affiliation(s)
- Jessamyn Schertz
- Centre for French and Linguistics, University of Toronto Scarborough, 1265 Military Trail, HW413, Toronto, ON M1C 1A4, Canada
| | - Taehong Cho
- Hanyang Phonetics and Psycholinguistics Lab, #104, College of Humanities, Hanyang University, Seoul (133-791), Korea
| | - Andrew Lotto
- Department of Speech, Language and Hearing Sciences, The University of Arizona, P.O. Box 210071, Tucson, AZ 85721, USA
| | - Natasha Warner
- Department of Linguistics, The University of Arizona, P.O. Box 210025, Tucson, AZ 85721, USA
| |
Collapse
|