1
|
Alispahic S, Pellicano E, Cutler A, Antoniou M. Multiple talker processing in autistic adult listeners. Sci Rep 2024; 14:14698. [PMID: 38926416 PMCID: PMC11208580 DOI: 10.1038/s41598-024-62429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 05/16/2024] [Indexed: 06/28/2024] Open
Abstract
Accommodating talker variability is a complex and multi-layered cognitive process. It involves shifting attention to the vocal characteristics of the talker as well as the linguistic content of their speech. Due to an interdependence between voice and phonological processing, multi-talker environments typically incur additional processing costs compared to single-talker environments. A failure or inability to efficiently distribute attention over multiple acoustic cues in the speech signal may have detrimental language learning consequences. Yet, no studies have examined effects of multi-talker processing in populations with atypical perceptual, social and language processing for communication, including autistic people. Employing a classic word-monitoring task, we investigated effects of talker variability in Australian English autistic (n = 24) and non-autistic (n = 28) adults. Listeners responded to target words (e.g., apple, duck, corn) in randomised sequences of words. Half of the sequences were spoken by a single talker and the other half by multiple talkers. Results revealed that autistic participants' sensitivity scores to accurately-spotted target words did not differ to those of non-autistic participants, regardless of whether they were spoken by a single or multiple talkers. As expected, the non-autistic group showed the well-established processing cost associated with talker variability (e.g., slower response times). Remarkably, autistic listeners' response times did not differ across single- or multi-talker conditions, indicating they did not show perceptual processing costs when accommodating talker variability. The present findings have implications for theories of autistic perception and speech and language processing.
Collapse
Affiliation(s)
- Samra Alispahic
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia.
| | - Elizabeth Pellicano
- Department of Educational Studies, Macquarie University, Sydney, Australia
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Anne Cutler
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- ARC Centre of Excellence for the Dynamics of Language, Clayton, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
2
|
Bosen AK, Doria GM. Identifying Links Between Latent Memory and Speech Recognition Factors. Ear Hear 2024; 45:351-369. [PMID: 37882100 PMCID: PMC10922378 DOI: 10.1097/aud.0000000000001430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2023]
Abstract
OBJECTIVES The link between memory ability and speech recognition accuracy is often examined by correlating summary measures of performance across various tasks, but interpretation of such correlations critically depends on assumptions about how these measures map onto underlying factors of interest. The present work presents an alternative approach, wherein latent factor models are fit to trial-level data from multiple tasks to directly test hypotheses about the underlying structure of memory and the extent to which latent memory factors are associated with individual differences in speech recognition accuracy. Latent factor models with different numbers of factors were fit to the data and compared to one another to select the structures which best explained vocoded sentence recognition in a two-talker masker across a range of target-to-masker ratios, performance on three memory tasks, and the link between sentence recognition and memory. DESIGN Young adults with normal hearing (N = 52 for the memory tasks, of which 21 participants also completed the sentence recognition task) completed three memory tasks and one sentence recognition task: reading span, auditory digit span, visual free recall of words, and recognition of 16-channel vocoded Perceptually Robust English Sentence Test Open-set sentences in the presence of a two-talker masker at target-to-masker ratios between +10 and 0 dB. Correlations between summary measures of memory task performance and sentence recognition accuracy were calculated for comparison to prior work, and latent factor models were fit to trial-level data and compared against one another to identify the number of latent factors which best explains the data. Models with one or two latent factors were fit to the sentence recognition data and models with one, two, or three latent factors were fit to the memory task data. Based on findings with these models, full models that linked one speech factor to one, two, or three memory factors were fit to the full data set. Models were compared via Expected Log pointwise Predictive Density and post hoc inspection of model parameters. RESULTS Summary measures were positively correlated across memory tasks and sentence recognition. Latent factor models revealed that sentence recognition accuracy was best explained by a single factor that varied across participants. Memory task performance was best explained by two latent factors, of which one was generally associated with performance on all three tasks and the other was specific to digit span recall accuracy at lists of six digits or more. When these models were combined, the general memory factor was closely related to the sentence recognition factor, whereas the factor specific to digit span had no apparent association with sentence recognition. CONCLUSIONS Comparison of latent factor models enables testing hypotheses about the underlying structure linking cognition and speech recognition. This approach showed that multiple memory tasks assess a common latent factor that is related to individual differences in sentence recognition, although performance on some tasks was associated with multiple factors. Thus, while these tasks provide some convergent assessment of common latent factors, caution is needed when interpreting what they tell us about speech recognition.
Collapse
|
3
|
McLaughlin DJ, Colvett JS, Bugg JM, Van Engen KJ. Sequence effects and speech processing: cognitive load for speaker-switching within and across accents. Psychon Bull Rev 2024; 31:176-186. [PMID: 37442872 PMCID: PMC10867039 DOI: 10.3758/s13423-023-02322-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/15/2023]
Abstract
Prior work in speech processing indicates that listening tasks with multiple speakers (as opposed to a single speaker) result in slower and less accurate processing. Notably, the trial-to-trial cognitive demands of switching between speakers or switching between accents have yet to be examined. We used pupillometry, a physiological index of cognitive load, to examine the demands of processing first (L1) and second (L2) language-accented speech when listening to sentences produced by the same speaker consecutively (no switch), a novel speaker of the same accent (within-accent switch), and a novel speaker with a different accent (across-accent switch). Inspired by research on sequential adjustments in cognitive control, we aimed to identify the cognitive demands of accommodating a novel speaker and accent by examining the trial-to-trial changes in pupil dilation during speech processing. Our results indicate that switching between speakers was more cognitively demanding than listening to the same speaker consecutively. Additionally, switching to a novel speaker with a different accent was more cognitively demanding than switching between speakers of the same accent. However, there was an asymmetry for across-accent switches, such that switching from an L1 to an L2 accent was more demanding than vice versa. Findings from the present study align with work examining multi-talker processing costs, and provide novel evidence that listeners dynamically adjust cognitive processing to accommodate speaker and accent variability. We discuss these novel findings in the context of an active control model and auditory streaming framework of speech processing.
Collapse
Affiliation(s)
- Drew J McLaughlin
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA.
- Basque Center on Cognition, Brain and Language, Paseo Mikeletegi, 69, 20009, Donostia-San Sebastián, Gipuzkoa, Spain.
| | - Jackson S Colvett
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| | - Julie M Bugg
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| |
Collapse
|
4
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
5
|
McLaughlin DJ, Van Engen KJ. Exploring effects of social information on talker-independent accent adaptation. JASA EXPRESS LETTERS 2023; 3:125201. [PMID: 38059794 DOI: 10.1121/10.0022536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Accepted: 11/01/2023] [Indexed: 12/08/2023]
Abstract
The present study examined whether race information about speakers can promote rapid and generalizable perceptual adaptation to second-language accent. First-language English listeners were presented with Cantonese-accented English sentences in speech-shaped noise during a training session with three intermixed talkers, followed by a test session with a novel (i.e., fourth) talker. Participants were assigned to view either three East Asian or three White faces during training, corresponding to each speaker. Results indicated no effect of the social priming manipulation on the training or test sessions, although both groups performed better at test than a control group.
Collapse
Affiliation(s)
- Drew J McLaughlin
- Basque Center on Cognition, Brain and Language, Donostia-San Sebastián, Gipuzkoa 20018, Spain
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, ,
| | - Kristin J Van Engen
- Department of Psychological & Brain Sciences, Washington University in St. Louis, St. Louis, Missouri 63130, ,
| |
Collapse
|
6
|
Crespo K, Vlach H, Kaushanskaya M. The effects of bilingualism on children's cross-situational word learning under different variability conditions. J Exp Child Psychol 2023; 229:105621. [PMID: 36689904 PMCID: PMC10088528 DOI: 10.1016/j.jecp.2022.105621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2022] [Revised: 12/20/2022] [Accepted: 12/22/2022] [Indexed: 01/22/2023]
Abstract
In the current study, we examined the separate and combined effects of exemplar and speaker variability on monolingual and bilingual children's cross-situational word learning performance. Results revealed that children's word learning performance did not differ when the input varied in a single dimension (i.e., exemplars or speakers) compared with a condition with no variability independent of their linguistic background. However, when performance in conditions that varied in a single dimension (i.e., exemplars or speakers) was compared with a condition that varied in multiple dimensions (i.e., exemplars and speakers), bilingual word learning advantages were observed; bilinguals were more likely to learn word-referent associations than monolinguals. Together, results suggest that children can learn and generalize word-referent associations from input that varies in exemplars and speakers and that bilingualism may bolster learning under conditions of increased input variability.
Collapse
Affiliation(s)
- Kimberly Crespo
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA 02215, USA.
| | - Haley Vlach
- Department of Educational Psychology, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Margarita Kaushanskaya
- Department of Communication Sciences and Disorders, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
7
|
Kapadia AM, Tin JAA, Perrachione TK. Multiple sources of acoustic variation affect speech processing efficiency. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:209. [PMID: 36732274 PMCID: PMC9836727 DOI: 10.1121/10.0016611] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/14/2022] [Accepted: 12/07/2022] [Indexed: 05/29/2023]
Abstract
Phonetic variability across talkers imposes additional processing costs during speech perception, evident in performance decrements when listening to speech from multiple talkers. However, within-talker phonetic variation is a less well-understood source of variability in speech, and it is unknown how processing costs from within-talker variation compare to those from between-talker variation. Here, listeners performed a speeded word identification task in which three dimensions of variability were factorially manipulated: between-talker variability (single vs multiple talkers), within-talker variability (single vs multiple acoustically distinct recordings per word), and word-choice variability (two- vs six-word choices). All three sources of variability led to reduced speech processing efficiency. Between-talker variability affected both word-identification accuracy and response time, but within-talker variability affected only response time. Furthermore, between-talker variability, but not within-talker variability, had a greater impact when the target phonological contrasts were more similar. Together, these results suggest that natural between- and within-talker variability reflect two distinct magnitudes of common acoustic-phonetic variability: Both affect speech processing efficiency, but they appear to have qualitatively and quantitatively unique effects due to differences in their potential to obscure acoustic-phonemic correspondences across utterances.
Collapse
Affiliation(s)
- Alexandra M Kapadia
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jessica A A Tin
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
8
|
Perceptual learning of multiple talkers: Determinants, characteristics, and limitations. Atten Percept Psychophys 2022; 84:2335-2359. [PMID: 36076119 DOI: 10.3758/s13414-022-02556-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/08/2022] [Indexed: 11/08/2022]
Abstract
Research suggests that listeners simultaneously update talker-specific generative models to reflect structured phonetic variation. Because past investigations exposed listeners to talkers of different genders, it is unknown whether adaptation is talker specific or rather linked to a broader sociophonetic class. Here, we test determinants of listeners' ability to update and apply talker-specific models for speech perception. In six experiments (n = 480), listeners were first exposed to the speech of two talkers who produced ambiguous fricative energy. The talkers' speech was interleaved during exposure, and lexical context differentially biased interpretation of the ambiguity as either /s/ or /ʃ/ for each talker. At test, listeners categorized tokens from ashi-asi continua, one for each talker. Across conditions and experiments, we manipulated exposure quantity, talker gender, blocked versus interleaved talker structure at test, and the degree to which fricative acoustics differed between talkers. When test was blocked by talker, learning was observed for different but not same gender talkers. When talkers were interleaved at test, learning was observed for both different and same gender talkers, which was attenuated when fricative acoustics were constant across talkers. There was no strong evidence to suggest that adaptation to multiple talkers required increased quantity of exposure beyond that required to adapt to a single talker. These results suggest that perceptual learning for speech is achieved via a mechanism that represents a context-dependent, cumulative integration of experience with speech input and identity critical constraints on listeners' ability to dynamically apply multiple generative models in mixed talker listening environments.
Collapse
|
9
|
Distinct mechanisms for talker adaptation operate in parallel on different timescales. Psychon Bull Rev 2021; 29:627-634. [PMID: 34731443 DOI: 10.3758/s13423-021-02019-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2021] [Indexed: 11/08/2022]
Abstract
The mapping between speech acoustics and phonemic representations is highly variable across talkers, and listeners are slower to recognize words when listening to multiple talkers compared with a single talker. Listeners' speech processing efficiency in mixed-talker settings improves when given time to reorient their attention to each new talker. However, it remains unknown how much time is needed to fully reorient attention to a new talker in mixed-talker settings so that speech processing becomes as efficient as when listening to a single talker. In this study, we examined how speech processing efficiency improves in mixed-talker settings as a function of the duration of continuous speech from a talker. In single-talker and mixed-talker conditions, listeners identified target words either in isolation or preceded by a carrier vowel of parametrically varying durations from 300 to 1,500 ms. Listeners' word identification was significantly slower in every mixed-talker condition compared with the corresponding single-talker condition. The costs associated with processing mixed-talker speech declined significantly as the duration of the speech carrier increased from 0 to 600 ms. However, increasing the carrier duration beyond 600 ms did not achieve further reduction in talker variability-related processing costs. These results suggest that two parallel mechanisms support processing talker variability: A stimulus-driven mechanism that operates on short timescales to reorient attention to new auditory sources, and a top-down mechanism that operates over longer timescales to allocate the cognitive resources needed to accommodate uncertainty in acoustic-phonemic correspondences during contexts where speech may come from multiple talkers.
Collapse
|
10
|
Lim SJ, Carter YD, Njoroge JM, Shinn-Cunningham BG, Perrachione TK. Talker discontinuity disrupts attention to speech: Evidence from EEG and pupillometry. BRAIN AND LANGUAGE 2021; 221:104996. [PMID: 34358924 PMCID: PMC8515637 DOI: 10.1016/j.bandl.2021.104996] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Revised: 07/11/2021] [Accepted: 07/13/2021] [Indexed: 05/13/2023]
Abstract
Speech is processed less efficiently from discontinuous, mixed talkers than one consistent talker, but little is known about the neural mechanisms for processing talker variability. Here, we measured psychophysiological responses to talker variability using electroencephalography (EEG) and pupillometry while listeners performed a delayed recall of digit span task. Listeners heard and recalled seven-digit sequences with both talker (single- vs. mixed-talker digits) and temporal (0- vs. 500-ms inter-digit intervals) discontinuities. Talker discontinuity reduced serial recall accuracy. Both talker and temporal discontinuities elicited P3a-like neural evoked response, while rapid processing of mixed-talkers' speech led to increased phasic pupil dilation. Furthermore, mixed-talkers' speech produced less alpha oscillatory power during working memory maintenance, but not during speech encoding. Overall, these results are consistent with an auditory attention and streaming framework in which talker discontinuity leads to involuntary, stimulus-driven attentional reorientation to novel speech sources, resulting in the processing interference classically associated with talker variability.
Collapse
Affiliation(s)
- Sung-Joo Lim
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| | - Yaminah D Carter
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | - J Michelle Njoroge
- Department of Speech, Language, and Hearing Sciences, Boston University, United States
| | | | - Tyler K Perrachione
- Department of Speech, Language, and Hearing Sciences, Boston University, United States.
| |
Collapse
|
11
|
Xu J, Abdel Rahman R, Sommer W. Who speaks next? Adaptations to speaker identity in processing spoken sentences. Psychophysiology 2021; 59:e13948. [PMID: 34587288 DOI: 10.1111/psyp.13948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2021] [Revised: 07/22/2021] [Accepted: 09/13/2021] [Indexed: 11/30/2022]
Abstract
When listening to a speaker, we need to adapt to her individual speaking characteristics, such as error proneness, accent, etc. The present study investigated two aspects of adaptation to speaker identity during processing spoken sentences in multi-speaker situations: the effect of speaker sequence across sentences and the effect of learning speaker-specific error probability. Spoken sentences were presented, cued, and accompanied by one of three portraits that were labeled as the speakers' faces. In Block 1 speaker-specific probabilities of syntax errors were 10%, 50%, or 90%; in Block 2 they were uniformly 50%. In both blocks, speech errors elicited P600 effects in the scalp recorded ERP. We found a speaker sequence effect only in Block 1: the P600 to target words was larger after speaker switches than after speaker repetitions, independent of sentence correctness. In Block 1, listeners showed higher accuracy in judging sentence correctness spoken by speakers with lower error proportions. No speaker-specific differences in target word P600 and accuracy were found in Block 2. When speakers differ in error proneness, listeners seem to flexibly adapt their speech processing for the upcoming sentence through attention reorientation and resource reallocation if the speaker is about to change, and through proactive maintenance of neural resources if the speaker remains the same.
Collapse
Affiliation(s)
- Jue Xu
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Rasha Abdel Rahman
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Werner Sommer
- Institut für Psychologie, Humboldt-Universität zu Berlin, Berlin, Germany
| |
Collapse
|