1
|
Kapatsinski V, Bramlett AA, Idemaru K. What do you learn from a single cue? Dimensional reweighting and cue reassociation from experience with a newly unreliable phonetic cue. Cognition 2024; 249:105818. [PMID: 38772253 DOI: 10.1016/j.cognition.2024.105818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 05/23/2024]
Abstract
In language comprehension, we use perceptual cues to infer meanings. Some of these cues reside on perceptual dimensions. For example, the difference between bear and pear is cued by a difference in voice onset time (VOT), which is a continuous perceptual dimension. The present paper asks whether, and when, experience with a single value on a dimension behaving unexpectedly is used by the learner to reweight the whole dimension. We show that learners reweight the whole VOT dimension when exposed to a single VOT value (e.g., 45 ms) and provided with feedback indicating that the speaker intended to produce a /b/ 50% of the time and a /p/ the other 50% of the time. Importantly, dimensional reweighting occurs only if 1) the 50/50 feedback is unexpected for the VOT value, and 2) there is another dimension that is predictive of feedback. When no predictive dimension is available, listeners reassociate the experienced VOT value with the more surprising outcome but do not downweight the entire VOT dimension. These results provide support for perceptual representations of speech sounds that combine cues and dimensions, for viewing perceptual learning in speech as a combination of error-driven cue reassociation and dimensional reweighting, and for considering dimensional reweighting to be reallocation of attention that occurs only when there is evidence that reallocating attention would improve prediction accuracy (Harmon, Z., Idemaru, K., & Kapatsinski, V. 2019. Learning mechanisms in cue reweighting. Cognition, 189, 76-88.).
Collapse
Affiliation(s)
- Vsevolod Kapatsinski
- University of Oregon, Department of Linguistics, 161 Straub Hall, University of Oregon, Eugene, OR 97403-1290, United States of America.
| | - Adam A Bramlett
- Carnegie-Mellon University, Department of Modern Languages, 341 Posner Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213, United States of America.
| | - Kaori Idemaru
- University of Oregon, Department of East Asian Languages and Literatures, 114 Friendly Hall University of Oregon, Eugene, OR 97403-1248, United States of America.
| |
Collapse
|
2
|
Nozari N, Martin RC. Is working memory domain-general or domain-specific? Trends Cogn Sci 2024:S1364-6613(24)00164-5. [PMID: 39019705 DOI: 10.1016/j.tics.2024.06.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 06/17/2024] [Accepted: 06/18/2024] [Indexed: 07/19/2024]
Abstract
Given the fundamental role of working memory (WM) in all domains of cognition, a central question has been whether WM is domain-general. However, the term 'domain-general' has been used in different, and sometimes misleading, ways. By reviewing recent evidence and biologically plausible models of WM, we show that the level of domain-generality varies substantially between three facets of WM: in terms of computations, WM is largely domain-general. In terms of neural correlates, it contains both domain-general and domain-specific elements. Finally, in terms of application, it is mostly domain-specific. This variance encourages a shift of focus towards uncovering domain-general computational principles and away from domain-general approaches to the analysis of individual differences and WM training, favoring newer perspectives, such as training-as-skill-learning.
Collapse
Affiliation(s)
- Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, USA; Cognitive Science Program, Indiana University, Bloomington, IN, USA.
| | - Randi C Martin
- Department of Psychological Sciences, Rice University, Houston, TX, USA
| |
Collapse
|
3
|
Murphy TK, Nozari N, Holt LL. Transfer of statistical learning from passive speech perception to speech production. Psychon Bull Rev 2024; 31:1193-1205. [PMID: 37884779 PMCID: PMC11192850 DOI: 10.3758/s13423-023-02399-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/29/2023] [Indexed: 10/28/2023]
Abstract
Communicating with a speaker with a different accent can affect one's own speech. Despite the strength of evidence for perception-production transfer in speech, the nature of transfer has remained elusive, with variable results regarding the acoustic properties that transfer between speakers and the characteristics of the speakers who exhibit transfer. The current study investigates perception-production transfer through the lens of statistical learning across passive exposure to speech. Participants experienced a short sequence of acoustically variable minimal pair (beer/pier) utterances conveying either an accent or typical American English acoustics, categorized a perceptually ambiguous test stimulus, and then repeated the test stimulus aloud. In the canonical condition, /b/-/p/ fundamental frequency (F0) and voice onset time (VOT) covaried according to typical English patterns. In the reverse condition, the F0xVOT relationship reversed to create an "accent" with speech input regularities atypical of American English. Replicating prior studies, F0 played less of a role in perceptual speech categorization in reverse compared with canonical statistical contexts. Critically, this down-weighting transferred to production, with systematic down-weighting of F0 in listeners' own speech productions in reverse compared with canonical contexts that was robust across male and female participants. Thus, the mapping of acoustics to speech categories is rapidly adjusted by short-term statistical learning across passive listening and these adjustments transfer to influence listeners' own speech productions.
Collapse
Affiliation(s)
- Timothy K Murphy
- Department of Psychology, Carnegie Mellon University, Baker Hall, Floor 3, Frew St, Pittsburgh, PA, 15213, USA.
- Center for the Neural Basis of Cognition, Pittsburgh, PA, 15213, USA.
| | - Nazbanou Nozari
- Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN, 47405, USA
| | - Lori L Holt
- Department of Psychology, University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
4
|
Xie X, Jaeger TF, Kurumada C. What we do (not) know about the mechanisms underlying adaptive speech perception: A computational framework and review. Cortex 2023; 166:377-424. [PMID: 37506665 DOI: 10.1016/j.cortex.2023.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 12/23/2022] [Accepted: 05/05/2023] [Indexed: 07/30/2023]
Abstract
Speech from unfamiliar talkers can be difficult to comprehend initially. These difficulties tend to dissipate with exposure, sometimes within minutes or less. Adaptivity in response to unfamiliar input is now considered a fundamental property of speech perception, and research over the past two decades has made substantial progress in identifying its characteristics. The mechanisms underlying adaptive speech perception, however, remain unknown. Past work has attributed facilitatory effects of exposure to any one of three qualitatively different hypothesized mechanisms: (1) low-level, pre-linguistic, signal normalization, (2) changes in/selection of linguistic representations, or (3) changes in post-perceptual decision-making. Direct comparisons of these hypotheses, or combinations thereof, have been lacking. We describe a general computational framework for adaptive speech perception (ASP) that-for the first time-implements all three mechanisms. We demonstrate how the framework can be used to derive predictions for experiments on perception from the acoustic properties of the stimuli. Using this approach, we find that-at the level of data analysis presently employed by most studies in the field-the signature results of influential experimental paradigms do not distinguish between the three mechanisms. This highlights the need for a change in research practices, so that future experiments provide more informative results. We recommend specific changes to experimental paradigms and data analysis. All data and code for this study are shared via OSF, including the R markdown document that this article is generated from, and an R library that implements the models we present.
Collapse
Affiliation(s)
- Xin Xie
- Language Science, University of California, Irvine, USA.
| | - T Florian Jaeger
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA; Computer Science, University of Rochester, Rochester, NY, USA
| | - Chigusa Kurumada
- Brain and Cognitive Sciences, University of Rochester, Rochester, NY, USA
| |
Collapse
|
5
|
Hodson AJ, Shinn-Cunningham BG, Holt LL. Statistical learning across passive listening adjusts perceptual weights of speech input dimensions. Cognition 2023; 238:105473. [PMID: 37210878 PMCID: PMC11380765 DOI: 10.1016/j.cognition.2023.105473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 03/20/2023] [Accepted: 04/24/2023] [Indexed: 05/23/2023]
Abstract
Statistical learning across passive exposure has been theoretically situated with unsupervised learning. However, when input statistics accumulate over established representations - like speech syllables, for example - there is the possibility that prediction derived from activation of rich, existing representations may support error-driven learning. Here, across five experiments, we present evidence for error-driven learning across passive speech listening. Young adults passively listened to a string of eight beer - pier speech tokens with distributional regularities following either a canonical American-English acoustic dimension correlation or a correlation reversed to create an accent. A sequence-final test stimulus assayed the perceptual weight - the effectiveness - of the secondary dimension in signaling category membership as a function of preceding sequence regularities. Perceptual weight flexibly adjusted according to the passively experienced regularities even when the preceding regularities shifted on a trial-by-trial basis. The findings align with a theoretical view that activation of established internal representations can support learning across statistical regularities via error-driven learning. At the broadest level, this suggests that not all statistical learning need be unsupervised. Moreover, these findings help to account for how cognitive systems may accommodate competing demands for flexibility and stability: instead of overwriting existing representations when short-term input distributions depart from the norms, the mapping from input to category representations may be dynamically - and rapidly - adjusted via error-driven learning from predictions derived from internal representations.
Collapse
Affiliation(s)
- Alana J Hodson
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA.
| | | | - Lori L Holt
- Department of Psychology, Carnegie Mellon University, Pittsburgh, PA, USA; Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
6
|
Conover L. The direction of attention in second language phonological contrast learning. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:3390. [PMID: 37350624 DOI: 10.1121/10.0019714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2022] [Accepted: 05/30/2023] [Indexed: 06/24/2023]
Abstract
This study attempted to describe why some individuals are more successful when learning to perceive the sounds of a second language by analyzing the role attention plays in perceptual learning. Fifty-seven monolingual English-speaking adults completed the study. The participants underwent a perceptual learning paradigm presenting the novel contrast, the voicing contrast between Thai /b/ and /p/. The experiment consisted of a 40-item pretest, 480-trial learning phase, and 40-item posttest. Approximately half of the participants (n = 30) were given explicit instruction to listen for the specific contrast prior to the learning phase; other participants were not told the nature of the contrast. The Attention Network Test (ANT) from Fan, McCandliss, Sommer, Raz, and Posner [(2002). J. Cogn. Neurosci. 14(3), 340-347] was used to assess attentional networks. Generalized linear models and linear mixed effect models (LME) were fit to predict the participants' post-test scores based on ANT subscores, experimental group, and learning block (LME only). The results showed a correlation between attentional control and the ability to learn non-native phoneme contrasts regardless of instruction. In addition, there was a positive interaction between attentional control and the provision of explicit instructions during the learning process, such that individuals with high attentional control learned better when they received explicit instruction prior to training.
Collapse
Affiliation(s)
- Laura Conover
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida 33610, USA
| |
Collapse
|
7
|
Hearing is believing: Lexically guided perceptual learning is graded to reflect the quantity of evidence in speech input. Cognition 2023; 235:105404. [PMID: 36812836 DOI: 10.1016/j.cognition.2023.105404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 11/29/2022] [Accepted: 02/07/2023] [Indexed: 02/22/2023]
Abstract
There is wide variability in the acoustic patterns that are produced for a given linguistic message, including variability that is conditioned on who is speaking. Listeners solve this lack of invariance problem, at least in part, by dynamically modifying the mapping to speech sounds in response to structured variation in the input. Here we test a primary tenet of the ideal adapter framework of speech adaptation, which posits that perceptual learning reflects the incremental updating of cue-sound mappings to incorporate observed evidence with prior beliefs. Our investigation draws on the influential lexically guided perceptual learning paradigm. During an exposure phase, listeners heard a talker who produced fricative energy ambiguous between /ʃ/ and /s/. Lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/, and, across two behavioral experiments (n = 500), we manipulated the quantity of evidence and the consistency of evidence that was provided during exposure. Following exposure, listeners categorized tokens from an ashi - asi continuum to assess learning. The ideal adapter framework was formalized through computational simulations, which predicted that learning would be graded to reflect the quantity, but not the consistency, of the exposure input. These predictions were upheld in human listeners; the magnitude of the learning effect monotonically increased given exposure to four, 10, or 20 critical productions, and there was no evidence that learning differed given consistent versus inconsistent exposure. These results (1) provide support for a primary tenet of the ideal adapter framework, (2) establish quantity of evidence as a key determinant of adaptation in human listeners, and (3) provide critical evidence that lexically guided perceptual learning is not a binary outcome. In doing so, the current work provides foundational knowledge to support theoretical advances that consider perceptual learning as a graded outcome that is tightly linked to input statistics in the speech stream.
Collapse
|
8
|
Jasmin K, Tierney A, Obasih C, Holt L. Short-term perceptual reweighting in suprasegmental categorization. Psychon Bull Rev 2023; 30:373-382. [PMID: 35915382 PMCID: PMC9971089 DOI: 10.3758/s13423-022-02146-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/05/2022] [Indexed: 11/08/2022]
Abstract
Segmental speech units such as phonemes are described as multidimensional categories whose perception involves contributions from multiple acoustic input dimensions, and the relative perceptual weights of these dimensions respond dynamically to context. For example, when speech is altered to create an "accent" in which two acoustic dimensions are correlated in a manner opposite that of long-term experience, the dimension that carries less perceptual weight is down-weighted to contribute less in category decisions. It remains unclear, however, whether this short-term reweighting extends to perception of suprasegmental features that span multiple phonemes, syllables, or words, in part because it has remained debatable whether suprasegmental features are perceived categorically. Here, we investigated the relative contribution of two acoustic dimensions to word emphasis. Participants categorized instances of a two-word phrase pronounced with typical covariation of fundamental frequency (F0) and duration, and in the context of an artificial "accent" in which F0 and duration (established in prior research on English speech as "primary" and "secondary" dimensions, respectively) covaried atypically. When categorizing "accented" speech, listeners rapidly down-weighted the secondary dimension (duration). This result indicates that listeners continually track short-term regularities across speech input and dynamically adjust the weight of acoustic evidence for suprasegmental decisions. Thus, dimension-based statistical learning appears to be a widespread phenomenon in speech perception extending to both segmental and suprasegmental categorization.
Collapse
Affiliation(s)
- Kyle Jasmin
- Department of Psychology, Wolfson Building, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK.
| | | | | | - Lori Holt
- Carnegie Mellon University, Pittsburgh, PA, USA
| |
Collapse
|
9
|
Non-sensory Influences on Auditory Learning and Plasticity. J Assoc Res Otolaryngol 2022; 23:151-166. [PMID: 35235100 PMCID: PMC8964851 DOI: 10.1007/s10162-022-00837-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Accepted: 12/30/2021] [Indexed: 10/19/2022] Open
Abstract
Distinguishing between regular and irregular heartbeats, conversing with speakers of different accents, and tuning a guitar-all rely on some form of auditory learning. What drives these experience-dependent changes? A growing body of evidence suggests an important role for non-sensory influences, including reward, task engagement, and social or linguistic context. This review is a collection of contributions that highlight how these non-sensory factors shape auditory plasticity and learning at the molecular, physiological, and behavioral level. We begin by presenting evidence that reward signals from the dopaminergic midbrain act on cortico-subcortical networks to shape sound-evoked responses of auditory cortical neurons, facilitate auditory category learning, and modulate the long-term storage of new words and their meanings. We then discuss the role of task engagement in auditory perceptual learning and suggest that plasticity in top-down cortical networks mediates learning-related improvements in auditory cortical and perceptual sensitivity. Finally, we present data that illustrates how social experience impacts sound-evoked activity in the auditory midbrain and forebrain and how the linguistic environment rapidly shapes speech perception. These findings, which are derived from both human and animal models, suggest that non-sensory influences are important regulators of auditory learning and plasticity and are often implemented by shared neural substrates. Application of these principles could improve clinical training strategies and inform the development of treatments that enhance auditory learning in individuals with communication disorders.
Collapse
|
10
|
Zhang H, Wiener S, Holt LL. Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:992. [PMID: 35232077 PMCID: PMC8846952 DOI: 10.1121/10.0009378] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 01/07/2022] [Accepted: 01/10/2022] [Indexed: 06/14/2023]
Abstract
Speech contrasts are signaled by multiple acoustic dimensions, but these dimensions are not equally diagnostic. Moreover, the relative diagnosticity, or weight, of acoustic dimensions in speech can shift in different communicative contexts for both speech perception and speech production. However, the literature remains unclear on whether, and if so how, talkers adjust speech to emphasize different acoustic dimensions in the context of changing communicative demands. Here, we examine the interplay of flexible cue weights in speech production and perception across amplitude and duration, secondary non-spectral acoustic dimensions for phonated Mandarin Chinese lexical tone, across natural speech and whispering, which eliminates fundamental frequency contour, the primary acoustic dimension. Phonated and whispered Mandarin productions from native talkers revealed enhancement of both duration and amplitude cues in whispered, compared to phonated speech. When nonspeech amplitude-modulated noises modeled these patterns of enhancement, identification of the noises as Mandarin lexical tone categories was more accurate than identification of noises modeling phonated speech amplitude and duration cues. Thus, speakers exaggerate secondary cues in whispered speech and listeners make use of this information. Yet, enhancement is not symmetric among the four Mandarin lexical tones, indicating possible constraints on the realization of this enhancement.
Collapse
Affiliation(s)
- Hui Zhang
- Speech-Language-Hearing Center, School of Foreign Languages, Shanghai Jiao Tong University, 800 Dongchuan Road, Shanghai 200240, China
| | - Seth Wiener
- Department of Modern Languages, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| | - Lori L Holt
- Department of Psychology and Neuroscience Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania 15213, USA
| |
Collapse
|
11
|
Nixon JS, Tomaschek F. Prediction and error in early infant speech learning: A speech acquisition model. Cognition 2021; 212:104697. [PMID: 33798952 PMCID: PMC8173624 DOI: 10.1016/j.cognition.2021.104697] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 03/03/2021] [Accepted: 03/19/2021] [Indexed: 12/28/2022]
Abstract
In the last two decades, statistical clustering models have emerged as a dominant model of how infants learn the sounds of their language. However, recent empirical and computational evidence suggests that purely statistical clustering methods may not be sufficient to explain speech sound acquisition. To model early development of speech perception, the present study used a two-layer network trained with Rescorla-Wagner learning equations, an implementation of discriminative, error-driven learning. The model contained no a priori linguistic units, such as phonemes or phonetic features. Instead, expectations about the upcoming acoustic speech signal were learned from the surrounding speech signal, with spectral components extracted from an audio recording of child-directed speech as both inputs and outputs of the model. To evaluate model performance, we simulated infant responses in the high-amplitude sucking paradigm using vowel and fricative pairs and continua. The simulations were able to discriminate vowel and consonant pairs and predicted the infant speech perception data. The model also showed the greatest amount of discrimination in the expected spectral frequencies. These results suggest that discriminative error-driven learning may provide a viable approach to modelling early infant speech sound acquisition.
Collapse
Affiliation(s)
- Jessie S Nixon
- Quantitative Linguistics Group, Eberhard Karls University of Tübingen, Germany.
| | - Fabian Tomaschek
- Quantitative Linguistics Group, Eberhard Karls University of Tübingen, Germany.
| |
Collapse
|
12
|
Idemaru K, Vaughn C. Perceptual tracking of distinct distributional regularities within a single voice. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:EL427. [PMID: 33379901 DOI: 10.1121/10.0002762] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/09/2020] [Indexed: 06/12/2023]
Abstract
The speech signal is inherently variable and listeners need to recalibrate when local, short-term distributions of acoustic dimensions deviate from long-term representation. The present experiment investigated the specificity of this perceptual adjustment, addressing whether the perceptual system is capable of tracking differing simultaneous short-term acoustic distributions of the same speech categories, conditioned by context. The results indicated that instead of aggregating over the contextual variation, listeners tracked separate distributional statistics for instances of speech categories experienced in different phonetic/lexical contexts, suggesting that perceptual learning is not only influenced by distributional statistics, but also by external factors such as contextual information.
Collapse
Affiliation(s)
- Kaori Idemaru
- Department of East Asian Languages and Literatures, 1248 University of Oregon, Eugene, Oregon 97403-1248, USA
| | - Charlotte Vaughn
- Department of Linguistics, 1290 University of Oregon, Eugene, Oregon 97403-1290, ,
| |
Collapse
|