1
|
Flaherty MM. The role of long-term target and masker talker familiarity in children's speech-in-speech recognition. Front Psychol 2024; 15:1369195. [PMID: 38784624 PMCID: PMC11112701 DOI: 10.3389/fpsyg.2024.1369195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 04/23/2024] [Indexed: 05/25/2024] Open
Abstract
Objectives This study investigated the influence of long-term talker familiarity on speech-in-speech recognition in school-age children, with a specific emphasis on the role of familiarity with the mother's voice as either the target or masker speech. Design Open-set sentence recognition was measured adaptively in a two-talker masker. Target and masker sentences were recorded by the adult mothers of the child participants. Each child heard sentences spoken by three adult female voices during testing; their own mother's voice (familiar voice) and two unfamiliar adult female voices. Study sample Twenty-four school age children (8-13 years) with normal hearing. Results When the target speech was spoken by a familiar talker (the mother), speech recognition was significantly better compared to when the target was unfamiliar. When the masker was spoken by the familiar talker, there was no difference in performance relative to the unfamiliar masker condition. Across all conditions, younger children required a more favorable signal-to-noise ratio than older children. Conclusion Implicit long-term familiarity with a talker consistently improves children's speech-in-speech recognition across the age range tested, specifically when the target talker is familiar. However, performance remains unaffected by masker talker familiarity. Additionally, while target familiarity is advantageous, it does not entirely eliminate children's increased susceptibility to competing speech.
Collapse
Affiliation(s)
- Mary M. Flaherty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign, IL, United States
| |
Collapse
|
2
|
Flaherty MM, Price R, Murgia S, Manukian E. Can Playing a Game Improve Children's Speech Recognition? A Preliminary Study of Implicit Talker Familiarity Effects. Am J Audiol 2023:1-16. [PMID: 38056473 DOI: 10.1044/2023_aja-23-00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
PURPOSE The goal was to evaluate whether implicit talker familiarization via an interactive computer game, designed for this study, could improve children's word recognition in classroom noise. It was hypothesized that, regardless of age, children would perform better when recognizing words spoken by the talker who was heard during the game they played. METHOD Using a one-group pretest-posttest experimental design, this study examined the impact of short-term implicit voice exposure on children's word recognition in classroom noise. Implicit voice familiarization occurred via an interactive computer game, played at home for 10 min a day for 5 days. In the game, children (8-12 years) heard one voice, intended to become the "familiar talker." Pre- and postfamiliarization, children identified words in prerecorded classroom noise. Four conditions were tested to evaluate talker familiarity and generalization effects. RESULTS Results demonstrated an 11% improvement when recognizing words spoken by the voice heard in the game ("familiar talker"). This was observed only for words that were heard in the game and did not generalize to unfamiliarized words. Before familiarization, younger children had poorer recognition than older children in all conditions; however, after familiarization, there was no effect of age on performance for familiarized stimuli. CONCLUSIONS Implicit short-term exposure to a talker has the potential to improve children's speech recognition. Therefore, leveraging talker familiarity through gameplay shows promise as a viable method for improving children's speech-in-noise recognition. However, given that improvements did not generalize to unfamiliarized words, careful consideration of exposure stimuli is necessary to optimize this approach.
Collapse
Affiliation(s)
- Mary M Flaherty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Rachael Price
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
- Department of Audiology, Children's Hospital of Philadelphia, PA
| | - Silvia Murgia
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Emma Manukian
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| |
Collapse
|
3
|
Drown L, Philip B, Francis AL, Theodore RM. Revisiting the left ear advantage for phonetic cues to talker identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:3107. [PMID: 36456295 PMCID: PMC9715276 DOI: 10.1121/10.0015093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 09/13/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
Previous research suggests that learning to use a phonetic property [e.g., voice-onset-time, (VOT)] for talker identity supports a left ear processing advantage. Specifically, listeners trained to identify two "talkers" who only differed in characteristic VOTs showed faster talker identification for stimuli presented to the left ear compared to that presented to the right ear, which is interpreted as evidence of hemispheric lateralization consistent with task demands. Experiment 1 (n = 97) aimed to replicate this finding and identify predictors of performance; experiment 2 (n = 79) aimed to replicate this finding under conditions that better facilitate observation of laterality effects. Listeners completed a talker identification task during pretest, training, and posttest phases. Inhibition, category identification, and auditory acuity were also assessed in experiment 1. Listeners learned to use VOT for talker identity, which was positively associated with auditory acuity. Talker identification was not influenced by ear of presentation, and Bayes factors indicated strong support for the null. These results suggest that talker-specific phonetic variation is not sufficient to induce a left ear advantage for talker identification; together with the extant literature, this instead suggests that hemispheric lateralization for talker-specific phonetic variation requires phonetic variation to be conditioned on talker differences in source characteristics.
Collapse
Affiliation(s)
- Lee Drown
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Betsy Philip
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| | - Alexander L Francis
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, Indiana 47907-2122, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06269-1085, USA
| |
Collapse
|
4
|
Perceptual learning of multiple talkers: Determinants, characteristics, and limitations. Atten Percept Psychophys 2022; 84:2335-2359. [PMID: 36076119 DOI: 10.3758/s13414-022-02556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Research suggests that listeners simultaneously update talker-specific generative models to reflect structured phonetic variation. Because past investigations exposed listeners to talkers of different genders, it is unknown whether adaptation is talker specific or rather linked to a broader sociophonetic class. Here, we test determinants of listeners' ability to update and apply talker-specific models for speech perception. In six experiments (n = 480), listeners were first exposed to the speech of two talkers who produced ambiguous fricative energy. The talkers' speech was interleaved during exposure, and lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/ for each talker. At test, listeners categorized tokens from ashi-asi continua, one for each talker. Across conditions and experiments, we manipulated exposure quantity, talker gender, blocked versus interleaved talker structure at test, and the degree to which fricative acoustics differed between talkers. When test was blocked by talker, learning was observed for different but not same gender talkers. When talkers were interleaved at test, learning was observed for both different and same gender talkers, which was attenuated when fricative acoustics were constant across talkers. There was no strong evidence to suggest that adaptation to multiple talkers required increased quantity of exposure beyond that required to adapt to a single talker. These results suggest that perceptual learning for speech is achieved via a mechanism that represents a context-dependent, cumulative integration of experience with speech input and identity critical constraints on listeners' ability to dynamically apply multiple generative models in mixed talker listening environments.
Collapse
|
5
|
Colby S, Orena AJ. Recognizing Voices Through a Cochlear Implant: A Systematic Review of Voice Perception, Talker Discrimination, and Talker Identification. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3165-3194. [PMID: 35926089 PMCID: PMC9911123 DOI: 10.1044/2022_jslhr-21-00209] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 02/02/2022] [Accepted: 05/03/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVE Some cochlear implant (CI) users report having difficulty accessing indexical information in the speech signal, presumably due to limitations in the transmission of fine spectrotemporal cues. The purpose of this review article was to systematically review and evaluate the existing research on talker processing in CI users. Specifically, we reviewed the performance of CI users in three types of talker- and voice-related tasks. We also examined the different factors (such as participant, hearing, and device characteristics) that might influence performance in these specific tasks. DESIGN We completed a systematic search of the literature with select key words using citation aggregation software to search Google Scholar. We included primary reports that tested (a) talker discrimination, (b) voice perception, and (c) talker identification. Each report must have had at least one group of participants with CIs. Each included study was also evaluated for quality of evidence. RESULTS The searches resulted in 1,561 references, which were first screened for inclusion and then evaluated in full. Forty-three studies examining talker discrimination, voice perception, and talker identification were included in the final review. Most studies were focused on postlingually deafened and implanted adult CI users, with fewer studies focused on prelingual implant users. In general, CI users performed above chance in these tasks. When there was a difference between groups, CI users performed less accurately than their normal-hearing (NH) peers. A subset of CI users reached the same level of performance as NH participants exposed to noise-vocoded stimuli. Some studies found that CI users and NH participants relied on different cues for talker perception. Within groups of CI users, there is moderate evidence for a bimodal benefit for talker processing, and there are mixed findings about the effects of hearing experience. CONCLUSIONS The current review highlights the challenges faced by CI users in tracking and recognizing voices and how they adapt to it. Although large variability exists, there is evidence that CI users can process indexical information from speech, though with less accuracy than their NH peers. Recent work has described some of the factors that might ease the challenges of talker processing in CI users. We conclude by suggesting some future avenues of research to optimize real-world speech outcomes.
Collapse
Affiliation(s)
- Sarah Colby
- Department of Psychological and Brain Sciences, University of Iowa, Iowa City
| | - Adriel John Orena
- Department of Psychology, University of British Columbia, Vancouver, Canada
| |
Collapse
|
6
|
Zhang X, Cheng B, Zhang Y. The Role of Talker Variability in Nonnative Phonetic Learning: A Systematic Review and Meta-Analysis. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:4802-4825. [PMID: 34763529 DOI: 10.1044/2021_jslhr-21-00181] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
PURPOSE High-variability phonetic training (HVPT) has been found to be effective on adult second language (L2) learning, but results are mixed in regards to the benefit of multiple talkers over single talker. This study provides a systematic review with meta-analysis to investigate the talker variability effect in nonnative phonetic learning and the factors moderating the effect. METHOD We collected studies with keyword search in major academic databases including EBSCO, ERIC, MEDLINE, ProQuest Dissertations & Theses, Elsevier, Scopus, Wiley Online Library, and Web of Science. We identified potential participant-, training-, and study-related moderators and conducted a random-effects model meta-analysis for each individual variable. RESULTS On the basis of 18 studies with a total of 549 participants, we obtained a small-level summary effect size (Hedges' g = 0.46, 95% confidence interval [CI; 0.08, 0.84]) for the immediate training outcomes, which was greatly reduced (g = -0.04, 95% CI [-0.46, 0.37]) after removal of outliers and correction for publication bias, whereas the effect size for immediate perceptual gains was nearly medium (g = 0.56, 95% CI [0.13, 1.00]) compared with the nonsignificant production gains. Critically, the summary effect sizes for generalizations to new talkers (g = 0.72, 95% CI [0.15, 1.29]) and for long-term retention (g = 1.09, 95% CI [0.39, 1.78]) were large. Moreover, the training program length and the talker presentation format were found to potentially moderate the immediate perceptual gains and generalization outcomes. CONCLUSIONS Our study presents the first meta-analysis on the role of talker variability in nonnative phonetic training, which demonstrates the heterogeneity and limitations of research on this topic. The results highlight the need for further investigation of the influential factors and underlying mechanisms for the presence or absence of talker variability effects. Supplemental Material https://doi.org/10.23641/asha.16959388.
Collapse
Affiliation(s)
- Xiaojuan Zhang
- English Department & Language and Cognitive Neuroscience Lab, School of Foreign Studies, Xi'an Jiaotong University, China
| | - Bing Cheng
- English Department & Language and Cognitive Neuroscience Lab, School of Foreign Studies, Xi'an Jiaotong University, China
| | - Yang Zhang
- Department of Speech-Language-Hearing Sciences and Center for Neurobehavioral Development, University of Minnesota, Twin Cities, Minneapolis
| |
Collapse
|
7
|
Online pragmatic interpretations of scalar adjectives are affected by perceived speaker reliability. PLoS One 2021; 16:e0245130. [PMID: 33606683 PMCID: PMC7895354 DOI: 10.1371/journal.pone.0245130] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Accepted: 12/22/2020] [Indexed: 11/23/2022] Open
Abstract
Linguistic communication requires understanding of words in relation to their context. Among various aspects of context, one that has received relatively little attention until recently is the speakers themselves. We asked whether comprehenders’ online language comprehension is affected by the perceived reliability with which a speaker formulates pragmatically well-formed utterances. In two eye-tracking experiments, we conceptually replicated and extended a seminal work by Grodner and Sedivy (2011). A between-participant manipulation was used to control reliability with which a speaker follows implicit pragmatic conventions (e.g., using a scalar adjective in accordance with contextual contrast). Experiment 1 replicated Grodner and Sedivy’s finding that contrastive inference in response to scalar adjectives was suspended when both the spoken input and the instructions provided evidence of the speaker’s (un)reliability: For speech from the reliable speaker, comprehenders exhibited the early fixations attributable to a contextually-situated, contrastive interpretation of a scalar adjective. In contrast, for speech from the unreliable speaker, comprehenders did not exhibit such early fixations. Experiment 2 provided novel evidence of the reliability effect in the absence of explicit instructions. In both experiments, the effects emerged in the earliest expected time window given the stimuli sentence structure. The results suggest that real-time interpretations of spoken language are optimized in the context of a speaker identity, characteristics of which are extrapolated across utterances.
Collapse
|
8
|
A second chance for a first impression: Sensitivity to cumulative input statistics for lexically guided perceptual learning. Psychon Bull Rev 2021; 28:1003-1014. [PMID: 33443706 DOI: 10.3758/s13423-020-01840-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/26/2020] [Indexed: 11/08/2022]
Abstract
Listeners use lexical knowledge to modify the mapping from acoustics to speech sounds, but the timecourse of experience that informs lexically guided perceptual learning is unknown. Some data suggest that learning is contingent on initial exposure to atypical productions, while other data suggest that learning reflects only the most recent exposure. Here we seek to reconcile these findings by assessing the type and timecourse of exposure that promote robust lexcially guided perceptual learning. In three experiments, listeners (n = 560) heard 20 critical productions interspersed among 200 trials during an exposure phase and then categorized items from an ashi-asi continuum in a test phase. In Experiment 1, critical productions consisted of ambiguous fricatives embedded in either /s/- or /ʃ/-biasing contexts. Learning was observed; the /s/-bias group showed more asi responses compared to the /ʃ/-bias group. In Experiment 2, listeners heard ambiguous and clear productions in a consistent context. Order and lexical bias were manipulated between-subjects, and perceptual learning occurred regardless of the order in which the clear and ambiguous productions were heard. In Experiment 3, listeners heard ambiguous fricatives in both /s/- and /ʃ/-biasing contexts. Order differed between two exposure groups, and no difference between groups was observed at test. Moreover, the results showed a monotonic decrease in learning across experiments, in line with decreasing exposure to stable lexically biasing contexts, and were replicated across novel stimulus sets. In contrast to previous findings showing that either initial or most recent experience are critical for lexically guided perceptual learning, the current results suggest that perceptual learning reflects cumulative experience with a talker's input over time.
Collapse
|
9
|
Abstract
Recent research demonstrates that the relationship between an acoustic dimension and speech categories is not static. Rather, it is influenced by the evolving distribution of dimensional regularity experienced across time, and specific to experienced individual sounds. Three studies examine the nature of this perceptual, dimension-based statistical learning of artificially accented [b] and [p] speech categories in online word recognition by testing generalization of learning across contexts, and testing the effect of a larger word list across which learning is induced. The results indicate that whereas learning of accented [b] and [p] generalizes across contexts, generalization to contexts not experienced in the accent is weaker even for the same speech categories [b] and [p] spoken by the same speaker. The results support a rich model of speech representation that is sensitive to context-dependent variation in the way the acoustic dimensions are related to speech categories.
Collapse
|
10
|
Ganugapati D, Theodore RM. Structured phonetic variation facilitates talker identification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:EL469. [PMID: 31255121 DOI: 10.1121/1.5100166] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Accepted: 04/12/2019] [Indexed: 06/09/2023]
Abstract
Listeners use talker-specific phonetic structure to facilitate language comprehension. This study tests whether sensitivity to talker-specific phonetic variation also facilitates talker identification. During training, two listener groups learned to associate talkers' voices with cartoon pseudo-faces. For one group, each talker produced characteristically different voice-onset-time values; for the other group, no talker-specific phonetic structure was present. After training, listeners were tested on talker identification for trained and novel words, which was improved for those who heard structured phonetic variation compared to those who did not. These findings suggest an additive benefit of talker-specific phonetic variation for talker identification beyond traditional indexical cues.
Collapse
Affiliation(s)
- Divya Ganugapati
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, ,
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, ,
| |
Collapse
|
11
|
Babel M, McAuliffe M, Norton C, Senior B, Vaughn C. The Goldilocks Zone of Perceptual Learning. PHONETICA 2019; 76:179-200. [PMID: 31112962 DOI: 10.1159/000494929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/29/2018] [Indexed: 06/09/2023]
Abstract
BACKGROUND/AIMS Lexically guided perceptual learning in speech is the updating of linguistic categories based on novel input disambiguated by the structure provided in a recognized lexical item. We test the range of variation that allows for perceptual learning by presenting listeners with items that vary from subtle within-category variation to fully remapped cross-category variation. METHODS Experiment 1 uses a lexically guided perceptual learning paradigm with words containing noncanonical /s/ realizations from s/ʃ continua that correspond to "typical," "ambiguous," "atypical," and "remapped" steps. Perceptual learning is tested in an s/ʃ categorization task. Experiment 2 addresses listener sensitivity to variation in the exposure items using AX discrimination tasks. RESULTS Listeners in experiment 1 showed perceptual learning with the maximally ambiguous tokens. Performance of listeners in experiment 2 suggests that tokens which showed the most perceptual learning were not perceptually salient on their own. CONCLUSION These results demonstrate that perceptual learning is enhanced with maximally ambiguous stimuli. Excessively atypical pronunciations show attenuated perceptual learning, while typical pronunciations show no evidence for perceptual learning. AX discrimination illustrates that the maximally ambiguous stimuli are not perceptually unique. Together, these results suggest that perceptual learning relies on an interplay between confidence in phonetic and lexical predictions and category typicality.
Collapse
Affiliation(s)
- Molly Babel
- University of British Columbia, Vancouver, British Columbia, Canada,
| | | | - Carolyn Norton
- University of British Columbia, Vancouver, British Columbia, Canada
| | - Brianne Senior
- University of British Columbia, Vancouver, British Columbia, Canada
| | | |
Collapse
|
12
|
Assgari AA, Theodore RM, Stilp CE. Variability in talkers' fundamental frequencies shapes context effects in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:1443. [PMID: 31067942 DOI: 10.1121/1.5093638] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2018] [Accepted: 02/22/2019] [Indexed: 06/09/2023]
Abstract
The perception of any given sound is influenced by surrounding sounds. When successive sounds differ in their spectral compositions, these differences may be perceptually magnified, resulting in spectral contrast effects (SCEs). For example, listeners are more likely to perceive /ɪ/ (low F1) following sentences with higher F1 frequencies; listeners are also more likely to perceive /ɛ/ (high F1) following sentences with lower F1 frequencies. Previous research showed that SCEs for vowel categorization were attenuated when sentence contexts were spoken by different talkers [Assgari and Stilp. (2015). J. Acoust. Soc. Am. 138(5), 3023-3032], but the locus of this diminished contextual influence was not specified. Here, three experiments examined implications of variable talker acoustics for SCEs in the categorization of /ɪ/ and /ɛ/. The results showed that SCEs were smaller when the mean fundamental frequency (f0) of context sentences was highly variable across talkers compared to when mean f0 was more consistent, even when talker gender was held constant. In contrast, SCE magnitudes were not influenced by variability in mean F1. These findings suggest that talker variability attenuates SCEs due to diminished consistency of f0 as a contextual influence. Connections between these results and talker normalization are considered.
Collapse
Affiliation(s)
- Ashley A Assgari
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut 06828, USA
| | - Christian E Stilp
- Department of Psychological and Brain Sciences, University of Louisville, Louisville, Kentucky 40292, USA
| |
Collapse
|
13
|
Distributional learning for speech reflects cumulative exposure to a talker's phonetic distributions. Psychon Bull Rev 2019; 26:985-992. [PMID: 30604404 DOI: 10.3758/s13423-018-1551-5] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Efficient speech perception requires listeners to maintain an exquisite tension between stability of the language architecture and flexibility to accommodate variation in the input, such as that associated with individual talker differences in speech production. Achieving this tension can be guided by top-down learning mechanisms, wherein lexical information constrains interpretation of speech input, and by bottom-up learning mechanisms, in which distributional information in the speech signal is used to optimize the mapping to speech sound categories. An open question for theories of perceptual learning concerns the nature of the representations that are built for individual talkers: do these representations reflect long-term, global exposure to a talker or rather only short-term, local exposure? Recent research suggests that when lexical knowledge is used to resolve a talker's ambiguous productions, listeners disregard previous experience with a talker and instead rely on only recent experience, a finding that is contrary to predictions of Bayesian belief-updating accounts of perceptual adaptation. Here we use a distributional learning paradigm in which lexical information is not explicitly required to resolve ambiguous input to provide an additional test of global versus local exposure accounts. Listeners completed two blocks of phonetic categorization for stimuli that differed in voice-onset-time, a probabilistic cue to the voicing contrast in English stop consonants. In each block, two distributions were presented, one specifying /g/ and one specifying /k/. Across the two blocks, variance of the distributions was manipulated to be either narrow or wide. The critical manipulation was order of the two blocks; half of the listeners were first exposed to the narrow distributions followed by the wide distributions, with the order reversed for the other half of the listeners. The results showed that for earlier trials, the identification slope was steeper for the narrow-wide group compared to the wide-narrow group, but this difference was attenuated for later trials. The between-group convergence was driven by an asymmetry in learning between the two orders such that only those in the narrow-wide group showed slope movement during exposure, a pattern that was mirrored by computational simulations in which the distributional statistics of the present talker were integrated with prior experience with English. This pattern of results suggests that listeners did not disregard all prior experience with the talker, and instead used cumulative exposure to guide phonetic decisions, which raises the possibility that accommodating a talker's phonetic signature entails maintaining representations that reflect global experience.
Collapse
|
14
|
Campbell JA, McSherry HL, Theodore RM. Contextual Influences on Phonetic Categorization in School-Aged Children. FRONTIERS IN COMMUNICATION 2018; 3:35. [PMID: 31763339 PMCID: PMC6874108 DOI: 10.3389/fcomm.2018.00035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Perceptual stability in adult listeners is supported by the ability to process acoustic-phonetic variation categorically and dynamically adjust category boundaries given systematic contextual influences. The current study examined the developmental trajectory of such flexibility. Adults and school-aged children (5-10 years of age) made voicing identification decisions to voice-onset-time (VOT) continua that differed in speaking rate and place of articulation. The results showed that both populations were sensitive to contextual influences; the voicing boundary was located at a longer VOT for the slow compared to the fast speaking rate continuum and for the velar compared to the labial continuum, and the magnitude of the displacement was slighter greater for the adults compared to the children. Moreover, the two populations differed in terms of the absolute location of the voicing boundaries and the categorization slopes, with slopes becoming more categorical as age increased. These results demonstrate that sensitivity to contextual influences on speech perception emerges early in development, but mature perceptual tuning requires extended experience.
Collapse
Affiliation(s)
- Jean A. Campbell
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut, USA
| | - Heather L. McSherry
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut, USA
| | - Rachel M. Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, Storrs, Connecticut, USA
- Connecticut Institute of Brain and Cognitive Sciences, University of Connecticut, Storrs, Connecticut, USA
| |
Collapse
|
15
|
Drouin JR, Theodore RM. Lexically guided perceptual learning is robust to task-based changes in listening strategy. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:1089. [PMID: 30180678 PMCID: PMC6117182 DOI: 10.1121/1.5047672] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Revised: 06/29/2018] [Accepted: 07/02/2018] [Indexed: 05/28/2023]
Abstract
Listeners use lexical information to resolve ambiguity in the speech signal, resulting in the restructuring of speech sound categories. Recent findings suggest that lexically guided perceptual learning is attenuated when listeners use a perception-focused listening strategy (that directs attention towards surface variation) compared to when listeners use a comprehension-focused listening strategy (that directs attention towards higher-level linguistic information). However, previous investigations used the word position of the ambiguity to manipulate listening strategy, raising the possibility that attenuated learning reflected decreased strength of lexical recruitment instead of a perception-oriented listening strategy. The current work tests this hypothesis. Listeners completed an exposure phase followed by a test phase. During exposure, listeners heard an ambiguous fricative embedded in word-medial lexical contexts that supported realization of the ambiguity as /∫/. At test, listeners categorized members of an /ɑsi/-/ɑ∫i/ continuum. Listening strategy was manipulated via exposure task (experiment 1) and explicit acknowledgement of the ambiguity (experiment 2). Compared to control participants, listeners who were exposed to the ambiguity showed more /∫/ responses at the test; critically, the magnitude of learning did not differ across listening strategy conditions. These results suggest that given sufficient lexical context, lexically guided perceptual learning is robust to task-based changes in listening strategy.
Collapse
Affiliation(s)
- Julia R Drouin
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, USA
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Unit 1085, Storrs, Connecticut 06269-1085, USA
| |
Collapse
|
16
|
Tobin SJ, Nam H, Fowler CA. Phonetic drift in Spanish-English bilinguals: Experiment and a self-organizing model. JOURNAL OF PHONETICS 2017; 65:45-59. [PMID: 31346299 PMCID: PMC6657701 DOI: 10.1016/j.wocn.2017.05.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Studies of speech accommodation provide evidence for change in use of language structures beyond the critical/sensitive period. For example, Sancier and Fowler (1997) found changes in the voice-onset-times (VOTs) of both languages of a Portuguese-English bilingual as a function of her language context. Though accommodation has been studied widely within a monolingual context, it has received less attention in and between the languages of bilinguals. We tested whether these findings of phonetic accommodation, speech accommodation at the phonetic level, would generalize to a sample of Spanish-English bilinguals. We recorded participants reading Spanish and English sentences after 3-4 months in the US and after 2-4 weeks in a Spanish speaking country and measured the VOTs of their voiceless plosives. Our statistical analyses show that participants' English VOTs drifted towards those of the ambient language, but their Spanish VOTs did not. We found considerable variation in the extent of individual participants' drift in English. Further analysis of our results suggested that native-likeness of L2 VOTs and extent of active language use predict the extent of drift. We provide a model based on principles of self-organizing dynamical systems to account for our Spanish-English phonetic drift findings and the Portuguese-English findings.
Collapse
Affiliation(s)
- Stephen J. Tobin
- Department of Psychology University of Connecticut, 406 Babbidge Road, Unit 1020, Storrs, CT 06269-1020, USA
- Haskins Laboratories, 300 George St., Suite 900, New Haven, CT 06511, USA
- Universität Potsdam, Department Linguistik, Haus 14, Karl-Liebknecht-Straβe 24-25, 14476 Potsdam, Germany
| | - Hosung Nam
- Haskins Laboratories, 300 George St., Suite 900, New Haven, CT 06511, USA
- Department of English Language and Literature, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 136-701, South Korea
| | - Carol A. Fowler
- Department of Psychology University of Connecticut, 406 Babbidge Road, Unit 1020, Storrs, CT 06269-1020, USA
- Haskins Laboratories, 300 George St., Suite 900, New Haven, CT 06511, USA
| |
Collapse
|
17
|
Clayards M. Individual Talker and Token Covariation in the Production of Multiple Cues to Stop Voicing. PHONETICA 2017; 75:1-23. [PMID: 28595176 DOI: 10.1159/000448809] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 07/30/2016] [Indexed: 05/24/2023]
Abstract
BACKGROUND/AIMS Previous research found that individual talkers have consistent differences in the production of segments impacting the perception of their speech by others. Speakers also produce multiple acoustic-phonetic cues to phonological contrasts. Less is known about how multiple cues covary within a phonetic category and across talkers. We examined differences in individual talkers across cues and whether token-by-token variability is a result of intrinsic factors or speaking style by examining within-category correlations. METHODS We examined correlations for 3 cues (voice onset time, VOT, talker-relative onset fundamental frequency, f0, and talker-relative following vowel duration) to word-initial labial stop voicing in English. RESULTS VOT for /b/ and /p/ productions and onset f0 for /b/ productions varied significantly by talker. Token-by-token within-category variation was largely limited to speaking rate effects. VOT and f0 were negatively correlated within category for /b/ productions after controlling for speaking rate and talker mean f0, but in the opposite direction expected for an intrinsic effect. Within-category talker means were correlated across VOT and vowel duration for /p/ productions. Some talkers produced more prototypical values than others, indicating systematic talker differences. CONCLUSION Relationships between cues are mediated more by categories and talkers than by intrinsic physiological relationships.Talker differences reflect systematic speaking style differences.
Collapse
Affiliation(s)
- Meghan Clayards
- Department of Linguistics, School of Communication Sciences and Disorders, McGill University, Montreal, QC, Canada
| |
Collapse
|
18
|
Myers EB, Theodore RM. Voice-sensitive brain networks encode talker-specific phonetic detail. BRAIN AND LANGUAGE 2017; 165:33-44. [PMID: 27898342 PMCID: PMC5237402 DOI: 10.1016/j.bandl.2016.11.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/05/2016] [Revised: 09/13/2016] [Accepted: 11/04/2016] [Indexed: 05/09/2023]
Abstract
The speech stream simultaneously carries information about talker identity and linguistic content, and the same acoustic property (e.g., voice-onset-time, or VOT) may be used for both purposes. Separable neural networks for processing talker identity and phonetic content have been identified, but it is unclear how a singular acoustic property is parsed by the neural system for talker identification versus phonetic processing. In the current study, listeners were exposed to two talkers with characteristically different VOTs. Subsequently, brain activation was measured using fMRI as listeners performed a phonetic categorization task on these stimuli. Right temporoparietal regions previously implicated in talker identification showed sensitivity to the match between VOT variant and talker, whereas left posterior temporal regions showed sensitivity to the typicality of phonetic exemplars, regardless of talker typicality. Taken together, these results suggest that neural systems for voice recognition capture talker-specific phonetic variation.
Collapse
Affiliation(s)
- Emily B Myers
- University of Connecticut, Department of Speech, Language, and Hearing Sciences, 850 Bolton Road, Unit 1085, Storrs, CT 06269-1085, United States; University of Connecticut, Department of Psychological Sciences, 406 Babbidge Road, Unit 1020, Storrs, CT 06269-1020, United States; Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, United States; Connecticut Institute for the Brain and Cognitive Sciences, 337 Mansfield Road, Unit 1272, Storrs, CT 06269-1085, United States.
| | - Rachel M Theodore
- University of Connecticut, Department of Speech, Language, and Hearing Sciences, 850 Bolton Road, Unit 1085, Storrs, CT 06269-1085, United States; Haskins Laboratories, 300 George Street, Suite 900, New Haven, CT 06511, United States; Connecticut Institute for the Brain and Cognitive Sciences, 337 Mansfield Road, Unit 1272, Storrs, CT 06269-1085, United States
| |
Collapse
|
19
|
Xie X, Theodore RM, Myers EB. More than a boundary shift: Perceptual adaptation to foreign-accented speech reshapes the internal structure of phonetic categories. J Exp Psychol Hum Percept Perform 2016; 43:206-217. [PMID: 27819457 DOI: 10.1037/xhp0000285] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The literature on perceptual learning for speech shows that listeners use lexical information to disambiguate phonetically ambiguous speech sounds and that they maintain this new mapping for later recognition of ambiguous sounds for a given talker. Evidence for this kind of perceptual reorganization has focused on phonetic category boundary shifts. Here, we asked whether listeners adjust both category boundaries and internal category structure in rapid adaptation to foreign accents. We investigated the perceptual learning of Mandarin-accented productions of word-final voiced stops in English. After exposure to a Mandarin speaker's productions, native-English listeners' adaptation to the talker was tested in 3 ways: a cross-modal priming task to assess spoken word recognition (Experiment 1), a category identification task to assess shifts in the phonetic boundary (Experiment 2), and a goodness rating task to assess internal category structure (Experiment 3). Following exposure, both category boundary and internal category structure were adjusted; moreover, these prelexical changes facilitated subsequent word recognition. Together, the results demonstrate that listeners' sensitivity to acoustic-phonetic detail in the accented input promoted a dynamic, comprehensive reorganization of their perceptual response as a consequence of exposure to the accented input. We suggest that an examination of internal category structure is important for a complete account of the mechanisms of perceptual learning. (PsycINFO Database Record
Collapse
Affiliation(s)
- Xin Xie
- Department of Speech, Language and Hearing Sciences, University of Connecticut
| | - Rachel M Theodore
- Department of Speech, Language and Hearing Sciences, University of Connecticut
| | - Emily B Myers
- Department of Speech, Language and Hearing Sciences, University of Connecticut
| |
Collapse
|
20
|
Drouin JR, Theodore RM, Myers EB. Lexically guided perceptual tuning of internal phonetic category structure. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:EL307. [PMID: 27794292 PMCID: PMC6910001 DOI: 10.1121/1.4964468] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 08/26/2016] [Accepted: 09/23/2016] [Indexed: 05/28/2023]
Abstract
Listeners use lexical information to retune the mapping between the acoustic signal and speech sound representations, resulting in changes to phonetic category boundaries. Other research shows that phonetic categories have a rich internal structure; within-category variation is represented in a graded fashion. The current work examined whether lexically informed perceptual learning promotes a comprehensive reorganization of internal category structure. The results showed a reorganization of internal structure for one but not both of the examined categories, which may reflect an attenuation of learning for distributions with extensive category overlap. This finding points towards potential input-driven constraints on lexically guided phonetic retuning.
Collapse
Affiliation(s)
- Julia R Drouin
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Storrs, Connecticut 06269, USA , ,
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Storrs, Connecticut 06269, USA , ,
| | - Emily B Myers
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Storrs, Connecticut 06269, USA , ,
| |
Collapse
|
21
|
Kadam MA, Orena AJ, Theodore RM, Polka L. Reading ability influences native and non-native voice recognition, even for unimpaired readers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:EL6-12. [PMID: 26827051 DOI: 10.1121/1.4937488] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Research suggests that phonological ability exerts a gradient influence on talker identification, including evidence that adults and children with reading disability show impaired talker recognition for native and non-native languages. The present study examined whether this relationship is also observed among unimpaired readers. Learning rate and generalization of learning in a talker identification task were examined in average and advanced readers who were tested in both native and non-native language conditions. The results indicate that even among unimpaired readers, phonological competence as captured by reading ability exerts a gradient influence on perceptual learning for talkers' voices.
Collapse
Affiliation(s)
- Minal A Kadam
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Storrs, Connecticut 06269, USA
| | - Adriel John Orena
- School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th Floor, Montreal, Quebec H3A 1G1, Canada , , ,
| | - Rachel M Theodore
- Department of Speech, Language, and Hearing Sciences, University of Connecticut, 850 Bolton Road, Storrs, Connecticut 06269, USA
| | - Linda Polka
- School of Communication Sciences and Disorders, McGill University, 2001 McGill College, 8th Floor, Montreal, Quebec H3A 1G1, Canada , , ,
| |
Collapse
|