1
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Wu YC, Holt LL. Phonetic category activation predicts the direction and magnitude of perceptual adaptation to accented speech. J Exp Psychol Hum Percept Perform 2022; 48:913-925. [PMID: 35849375 PMCID: PMC10236200 DOI: 10.1037/xhp0001037] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Unfamiliar accents can systematically shift speech acoustics away from community norms and reduce comprehension. Yet, limited exposure improves comprehension. This perceptual adaptation indicates that the mapping from acoustics to speech representations is dynamic, rather than fixed. But, what drives adjustments is debated. Supervised learning accounts posit that activation of an internal speech representation via disambiguating information generates predictions about patterns of speech input typically associated with the representation. When actual input mismatches predictions, the mapping is adjusted. We tested two hypotheses of this account across consonants and vowels as listeners categorized speech conveying an English-like acoustic regularity or an artificial accent. Across conditions, signal manipulations impacted which of two acoustic dimensions best conveyed category identity, and predicted which dimension would exhibit the effects of perceptual adaptation. Moreover, the strength of phonetic category activation, as estimated by categorization responses reliant on the dominant acoustic dimension, predicted the magnitude of adaptation observed across listeners. The results align with predictions of supervised learning accounts, suggesting that perceptual adaptation arises from speech category activation, corresponding predictions about the patterns of acoustic input that align with the category, and adjustments in subsequent speech perception when input mismatches these expectations. (PsycInfo Database Record (c) 2022 APA, all rights reserved).
Collapse
|
3
|
Yu ACL. Perceptual Cue Weighting Is Influenced by the Listener's Gender and Subjective Evaluations of the Speaker: The Case of English Stop Voicing. Front Psychol 2022; 13:840291. [PMID: 35529558 PMCID: PMC9067435 DOI: 10.3389/fpsyg.2022.840291] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 02/28/2022] [Indexed: 11/13/2022] Open
Abstract
Speech categories are defined by multiple acoustic dimensions and their boundaries are generally fuzzy and ambiguous in part because listeners often give differential weighting to these cue dimensions during phonetic categorization. This study explored how a listener's perception of a speaker's socio-indexical and personality characteristics influences the listener's perceptual cue weighting. In a matched-guise study, three groups of listeners classified a series of gender-neutral /b/-/p/ continua that vary in VOT and F0 at the onset of the following vowel. Listeners were assigned to one of three prompt conditions (i.e., a visually male talker, a visually female talker, or audio-only) and rated the talker in terms of vocal (and facial, in the visual prompt conditions) gender prototypicality, attractiveness, friendliness, confidence, trustworthiness, and gayness. Male listeners and listeners who saw a male face showed less reliance on VOT compared to listeners in the other conditions. Listeners' visual evaluation of the talker also affected their weighting of VOT and onset F0 cues, although the effects of facial impressions differ depending on the gender of the listener. The results demonstrate that individual differences in perceptual cue weighting are modulated by the listener's gender and his/her subjective evaluation of the talker. These findings lend support for exemplar-based models of speech perception and production where socio-indexical features are encoded as a part of the episodic traces in the listeners' mental lexicon. This study also shed light on the relationship between individual variation in cue weighting and community-level sound change by demonstrating that VOT and onset F0 co-variation in North American English has acquired a certain degree of socio-indexical significance.
Collapse
Affiliation(s)
- Alan C L Yu
- Chicago Phonology Laboratory, Department of Linguistics, University of Chicago, Chicago, IL, United States
| |
Collapse
|
4
|
Zhang H, Wiener S, Holt LL. Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:992. [PMID: 35232077 PMCID: PMC8846952 DOI: 10.1121/10.0009378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
Speech contrasts are signaled by multiple acoustic dimensions, but these dimensions are not equally diagnostic. Moreover, the relative diagnosticity, or weight, of acoustic dimensions in speech can shift in different communicative contexts for both speech perception and speech production. However, the literature remains unclear on whether, and if so how, talkers adjust speech to emphasize different acoustic dimensions in the context of changing communicative demands. Here, we examine the interplay of flexible cue weights in speech production and perception across amplitude and duration, secondary non-spectral acoustic dimensions for phonated Mandarin Chinese lexical tone, across natural speech and whispering, which eliminates fundamental frequency contour, the primary acoustic dimension. Phonated and whispered Mandarin productions from native talkers revealed enhancement of both duration and amplitude cues in whispered, compared to phonated speech. When nonspeech amplitude-modulated noises modeled these patterns of enhancement, identification of the noises as Mandarin lexical tone categories was more accurate than identification of noises modeling phonated speech amplitude and duration cues. Thus, speakers exaggerate secondary cues in whispered speech and listeners make use of this information. Yet, enhancement is not symmetric among the four Mandarin lexical tones, indicating possible constraints on the realization of this enhancement.
Collapse
Affiliation(s)
- Hui Zhang
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Seth Wiener
- Department of Modern Languages, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Lori L Holt
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
5
|
Listeners track talker-specific prosody to deal with talker-variability. Brain Res 2021; 1769:147605. [PMID: 34363790 DOI: 10.1016/j.brainres.2021.147605] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]
Abstract
One of the challenges in speech perception is that listeners must deal with considerable segmental and suprasegmental variability in the acoustic signal due to differences between talkers. Most previous studies have focused on how listeners deal with segmental variability. In this EEG experiment, we investigated whether listeners track talker-specific usage of suprasegmental cues to lexical stress to recognize spoken words correctly. In a three-day training phase, Dutch participants learned to map non-word minimal stress pairs onto different object referents (e.g., USklot meant "lamp"; usKLOT meant "train"). These non-words were produced by two male talkers. Critically, each talker used only one suprasegmental cue to signal stress (e.g., Talker A used only F0 and Talker B only intensity). We expected participants to learn which talker used which cue to signal stress. In the test phase, participants indicated whether spoken sentences including these non-words were correct ("The word for lamp is…"). We found that participants were slower to indicate that a stimulus was correct if the non-word was produced with the unexpected cue (e.g., Talker A using intensity). That is, if in training Talker A used F0 to signal stress, participants experienced a mismatch between predicted and perceived phonological word-forms if, at test, Talker A unexpectedly used intensity to cue stress. In contrast, the N200 amplitude, an event-related potential related to phonological prediction, was not modulated by the cue mismatch. Theoretical implications of these contrasting results are discussed. The behavioral findings illustrate talker-specific prediction of prosodic cues, picked up through perceptual learning during training.
Collapse
|
6
|
Zhang X, Wu YC, Holt LL. The Learning Signal in Perceptual Tuning of Speech: Bottom Up Versus Top-Down Information. Cogn Sci 2021; 45:e12947. [PMID: 33682208 DOI: 10.1111/cogs.12947] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Revised: 01/04/2021] [Accepted: 01/05/2021] [Indexed: 12/01/2022]
Abstract
Cognitive systems face a tension between stability and plasticity. The maintenance of long-term representations that reflect the global regularities of the environment is often at odds with pressure to flexibly adjust to short-term input regularities that may deviate from the norm. This tension is abundantly clear in speech communication when talkers with accents or dialects produce input that deviates from a listener's language community norms. Prior research demonstrates that when bottom-up acoustic information or top-down word knowledge is available to disambiguate speech input, there is short-term adaptive plasticity such that subsequent speech perception is shifted even in the absence of the disambiguating information. Although such effects are well-documented, it is not yet known whether bottom-up and top-down resolution of ambiguity may operate through common processes, or how these information sources may interact in guiding the adaptive plasticity of speech perception. The present study investigates the joint contributions of bottom-up information from the acoustic signal and top-down information from lexical knowledge in the adaptive plasticity of speech categorization according to short-term input regularities. The results implicate speech category activation, whether from top-down or bottom-up sources, in driving rapid adjustment of listeners' reliance on acoustic dimensions in speech categorization. Broadly, this pattern of perception is consistent with dynamic mapping of input to category representations that is flexibly tuned according to interactive processing accommodating both lexical knowledge and idiosyncrasies of the acoustic input.
Collapse
Affiliation(s)
- Xujin Zhang
- Department of Psychology, Carnegie Mellon University
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University
| |
Collapse
|
7
|
Idemaru K, Vaughn C. Perceptual tracking of distinct distributional regularities within a single voice. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL427. [PMID: 33379901 DOI: 10.1121/10.0002762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/09/2020] [Indexed: 06/12/2023]
Abstract
The speech signal is inherently variable and listeners need to recalibrate when local, short-term distributions of acoustic dimensions deviate from long-term representation. The present experiment investigated the specificity of this perceptual adjustment, addressing whether the perceptual system is capable of tracking differing simultaneous short-term acoustic distributions of the same speech categories, conditioned by context. The results indicated that instead of aggregating over the contextual variation, listeners tracked separate distributional statistics for instances of speech categories experienced in different phonetic/lexical contexts, suggesting that perceptual learning is not only influenced by distributional statistics, but also by external factors such as contextual information.
Collapse
Affiliation(s)
- Kaori Idemaru
- Department of East Asian Languages and Literatures, 1248 University of Oregon, Eugene, Oregon 97403-1248, USA
| | - Charlotte Vaughn
- Department of Linguistics, 1290 University of Oregon, Eugene, Oregon 97403-1290, ,
| |
Collapse
|
8
|
Jasmin K, Dick F, Holt LL, Tierney A. Tailored perception: Individuals' speech and music perception strategies fit their perceptual abilities. J Exp Psychol Gen 2020; 149:914-934. [PMID: 31589067 PMCID: PMC7133494 DOI: 10.1037/xge0000688] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2018] [Revised: 08/09/2019] [Accepted: 08/12/2019] [Indexed: 01/09/2023]
Abstract
Perception involves integration of multiple dimensions that often serve overlapping, redundant functions, for example, pitch, duration, and amplitude in speech. Individuals tend to prioritize these dimensions differently (stable, individualized perceptual strategies), but the reason for this has remained unclear. Here we show that perceptual strategies relate to perceptual abilities. In a speech cue weighting experiment (trial N = 990), we first demonstrate that individuals with a severe deficit for pitch perception (congenital amusics; N = 11) categorize linguistic stimuli similarly to controls (N = 11) when the main distinguishing cue is duration, which they perceive normally. In contrast, in a prosodic task where pitch cues are the main distinguishing factor, we show that amusics place less importance on pitch and instead rely more on duration cues-even when pitch differences in the stimuli are large enough for amusics to discern. In a second experiment testing musical and prosodic phrase interpretation (N = 16 amusics; 15 controls), we found that relying on duration allowed amusics to overcome their pitch deficits to perceive speech and music successfully. We conclude that auditory signals, because of their redundant nature, are robust to impairments for specific dimensions, and that optimal speech and music perception strategies depend not only on invariant acoustic dimensions (the physical signal), but on perceptual dimensions whose precision varies across individuals. Computational models of speech perception (indeed, all types of perception involving redundant cues e.g., vision and touch) should therefore aim to account for the precision of perceptual dimensions and characterize individuals as well as groups. (PsycInfo Database Record (c) 2020 APA, all rights reserved).
Collapse
Affiliation(s)
| | - Fred Dick
- Department of Psychological Sciences
| | | | | |
Collapse
|
9
|
Alexander JED, Nygaard LC. Specificity and generalization in perceptual adaptation to accented speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:3382. [PMID: 31255164 PMCID: PMC6557708 DOI: 10.1121/1.5110302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Revised: 05/03/2019] [Accepted: 05/10/2019] [Indexed: 05/28/2023]
Abstract
The present study investigated the degree to which perceptual adaptation to foreign-accented speech is specific to the regularities in pronunciation associated with a particular accent. Across experiments, the conditions under which generalization of learning did or did not occur were evaluated. In Experiment 1, listeners trained on word-length utterances in Korean-accented English and tested with words produced by the same or different set of Korean-accented speakers. Listeners performed better than untrained controls when tested with novel words from the same or different speakers. In Experiment 2, listeners were trained with Spanish-, Korean-, or mixed-accented speech and transcribed novel words produced by unfamiliar Korean- or Spanish-accented speakers at test. The findings revealed relative specificity of learning. Listeners trained and tested on the same variety of accented speech showed better transcription at test than those trained with a different accent or untrained controls. Performance after mixed-accent training was intermediate. Patterns of errors and analysis of acoustic properties for accented vowels suggested perceptual improvement for regularities arising from each accent, with learning dependent on the relative similarity of linguistic form within and across accents.
Collapse
Affiliation(s)
- Jessica E D Alexander
- Department of Psychology, Centenary College of Louisiana, 2911 Centenary Boulevard, Shreveport, Louisiana 71105, USA
| | - Lynne C Nygaard
- Department of Psychology, Emory University, 36 Eagle Row, Atlanta, Georgia 30322, USA
| |
Collapse
|