1
|
Alispahic S, Pellicano E, Cutler A, Antoniou M. Multiple talker processing in autistic adult listeners. Sci Rep 2024; 14:14698. [PMID: 38926416 PMCID: PMC11208580 DOI: 10.1038/s41598-024-62429-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 05/16/2024] [Indexed: 06/28/2024] Open
Abstract
Accommodating talker variability is a complex and multi-layered cognitive process. It involves shifting attention to the vocal characteristics of the talker as well as the linguistic content of their speech. Due to an interdependence between voice and phonological processing, multi-talker environments typically incur additional processing costs compared to single-talker environments. A failure or inability to efficiently distribute attention over multiple acoustic cues in the speech signal may have detrimental language learning consequences. Yet, no studies have examined effects of multi-talker processing in populations with atypical perceptual, social and language processing for communication, including autistic people. Employing a classic word-monitoring task, we investigated effects of talker variability in Australian English autistic (n = 24) and non-autistic (n = 28) adults. Listeners responded to target words (e.g., apple, duck, corn) in randomised sequences of words. Half of the sequences were spoken by a single talker and the other half by multiple talkers. Results revealed that autistic participants' sensitivity scores to accurately-spotted target words did not differ to those of non-autistic participants, regardless of whether they were spoken by a single or multiple talkers. As expected, the non-autistic group showed the well-established processing cost associated with talker variability (e.g., slower response times). Remarkably, autistic listeners' response times did not differ across single- or multi-talker conditions, indicating they did not show perceptual processing costs when accommodating talker variability. The present findings have implications for theories of autistic perception and speech and language processing.
Collapse
Affiliation(s)
- Samra Alispahic
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia.
| | - Elizabeth Pellicano
- Department of Educational Studies, Macquarie University, Sydney, Australia
- Department of Clinical, Educational and Health Psychology, University College London, London, UK
| | - Anne Cutler
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
- Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
- ARC Centre of Excellence for the Dynamics of Language, Clayton, Australia
| | - Mark Antoniou
- The MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Sydney, NSW, Australia
| |
Collapse
|
2
|
Keur-Huizinga L, Kramer SE, de Geus EJC, Zekveld AA. A Multimodal Approach to Measuring Listening Effort: A Systematic Review on the Effects of Auditory Task Demand on Physiological Measures and Their Relationship. Ear Hear 2024:00003446-990000000-00297. [PMID: 38880960 DOI: 10.1097/aud.0000000000001508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2024]
Abstract
OBJECTIVES Listening effort involves the mental effort required to perceive an auditory stimulus, for example in noisy environments. Prolonged increased listening effort, for example due to impaired hearing ability, may increase risk of health complications. It is therefore important to identify valid and sensitive measures of listening effort. Physiological measures have been shown to be sensitive to auditory task demand manipulations and are considered to reflect changes in listening effort. Such measures include pupil dilation, alpha power, skin conductance level, and heart rate variability. The aim of the current systematic review was to provide an overview of studies to listening effort that used multiple physiological measures. The two main questions were: (1) what is the effect of changes in auditory task demand on simultaneously acquired physiological measures from various modalities? and (2) what is the relationship between the responses in these physiological measures? DESIGN Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, relevant articles were sought in PubMed, PsycInfo, and Web of Science and by examining the references of included articles. Search iterations with different combinations of psychophysiological measures were performed in conjunction with listening effort-related search terms. Quality was assessed using the Appraisal Tool for Cross-Sectional Studies. RESULTS A total of 297 articles were identified from three databases, of which 27 were included. One additional article was identified from reference lists. Of the total 28 included articles, 16 included an analysis regarding the relationship between the physiological measures. The overall quality of the included studies was reasonable. CONCLUSIONS The included studies showed that most of the physiological measures either show no effect to auditory task demand manipulations or a consistent effect in the expected direction. For example, pupil dilation increased, pre-ejection period decreased, and skin conductance level increased with increasing auditory task demand. Most of the relationships between the responses of these physiological measures were nonsignificant or weak. The physiological measures varied in their sensitivity to auditory task demand manipulations. One of the identified knowledge gaps was that the included studies mostly used tasks with high-performance levels, resulting in an underrepresentation of the physiological changes at lower performance levels. This makes it difficult to capture how the physiological responses behave across the full psychometric curve. Our results support the Framework for Understanding Effortful Listening and the need for a multimodal approach to listening effort. We furthermore discuss focus points for future studies.
Collapse
Affiliation(s)
- Laura Keur-Huizinga
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sophia E Kramer
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| | - Eco J C de Geus
- Department of Biological Psychology, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Adriana A Zekveld
- Amsterdam UMC location Vrije Universiteit Amsterdam, Otolaryngology-Head and Neck Surgery, Ear & Hearing, Amsterdam Public Health Research Institute, Amsterdam, The Netherlands
| |
Collapse
|
3
|
McLaughlin DJ, Colvett JS, Bugg JM, Van Engen KJ. Sequence effects and speech processing: cognitive load for speaker-switching within and across accents. Psychon Bull Rev 2024; 31:176-186. [PMID: 37442872 PMCID: PMC10867039 DOI: 10.3758/s13423-023-02322-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/15/2023]
Abstract
Prior work in speech processing indicates that listening tasks with multiple speakers (as opposed to a single speaker) result in slower and less accurate processing. Notably, the trial-to-trial cognitive demands of switching between speakers or switching between accents have yet to be examined. We used pupillometry, a physiological index of cognitive load, to examine the demands of processing first (L1) and second (L2) language-accented speech when listening to sentences produced by the same speaker consecutively (no switch), a novel speaker of the same accent (within-accent switch), and a novel speaker with a different accent (across-accent switch). Inspired by research on sequential adjustments in cognitive control, we aimed to identify the cognitive demands of accommodating a novel speaker and accent by examining the trial-to-trial changes in pupil dilation during speech processing. Our results indicate that switching between speakers was more cognitively demanding than listening to the same speaker consecutively. Additionally, switching to a novel speaker with a different accent was more cognitively demanding than switching between speakers of the same accent. However, there was an asymmetry for across-accent switches, such that switching from an L1 to an L2 accent was more demanding than vice versa. Findings from the present study align with work examining multi-talker processing costs, and provide novel evidence that listeners dynamically adjust cognitive processing to accommodate speaker and accent variability. We discuss these novel findings in the context of an active control model and auditory streaming framework of speech processing.
Collapse
Affiliation(s)
- Drew J McLaughlin
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA.
- Basque Center on Cognition, Brain and Language, Paseo Mikeletegi, 69, 20009, Donostia-San Sebastián, Gipuzkoa, Spain.
| | - Jackson S Colvett
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| | - Julie M Bugg
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| | - Kristin J Van Engen
- Department of Psychological and Brain Sciences, Washington University in St. Louis, St Louis, MO, USA
| |
Collapse
|
4
|
Luthra S. Why are listeners hindered by talker variability? Psychon Bull Rev 2024; 31:104-121. [PMID: 37580454 PMCID: PMC10864679 DOI: 10.3758/s13423-023-02355-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/27/2023] [Indexed: 08/16/2023]
Abstract
Though listeners readily recognize speech from a variety of talkers, accommodating talker variability comes at a cost: Myriad studies have shown that listeners are slower to recognize a spoken word when there is talker variability compared with when talker is held constant. This review focuses on two possible theoretical mechanisms for the emergence of these processing penalties. One view is that multitalker processing costs arise through a resource-demanding talker accommodation process, wherein listeners compare sensory representations against hypothesized perceptual candidates and error signals are used to adjust the acoustic-to-phonetic mapping (an active control process known as contextual tuning). An alternative proposal is that these processing costs arise because talker changes involve salient stimulus-level discontinuities that disrupt auditory attention. Some recent data suggest that multitalker processing costs may be driven by both mechanisms operating over different time scales. Fully evaluating this claim requires a foundational understanding of both talker accommodation and auditory streaming; this article provides a primer on each literature and also reviews several studies that have observed multitalker processing costs. The review closes by underscoring a need for comprehensive theories of speech perception that better integrate auditory attention and by highlighting important considerations for future research in this area.
Collapse
Affiliation(s)
- Sahil Luthra
- Department of Psychology, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA.
| |
Collapse
|
5
|
Winn MB. Time Scales and Moments of Listening Effort Revealed in Pupillometry. Semin Hear 2023; 44:106-123. [PMID: 37122881 PMCID: PMC10147502 DOI: 10.1055/s-0043-1767741] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/07/2023] Open
Abstract
This article offers a collection of observations that highlight the value of time course data in pupillometry and points out ways in which these observations create deeper understanding of listening effort. The main message is that listening effort should be considered on a moment-to-moment basis rather than as a singular amount. A review of various studies and the reanalysis of data reveal distinct signatures of effort before a stimulus, during a stimulus, in the moments after a stimulus, and changes over whole experimental testing sessions. Collectively these observations motivate questions that extend beyond the "amount" of effort, toward understanding how long the effort lasts, and how precisely someone can allocate effort at specific points in time or reduce effort at other times. Apparent disagreements between studies are reconsidered as informative lessons about stimulus selection and the nature of pupil dilation as a reflection of decision making rather than the difficulty of sensory encoding.
Collapse
Affiliation(s)
- Matthew B. Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
6
|
Park JJ, Baek SC, Suh MW, Choi J, Kim SJ, Lim Y. The effect of topic familiarity and volatility of auditory scene on selective auditory attention. Hear Res 2023; 433:108770. [PMID: 37104990 DOI: 10.1016/j.heares.2023.108770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 04/06/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023]
Abstract
Selective auditory attention has been shown to modulate the cortical representation of speech. This effect has been well documented in acoustically more challenging environments. However, the influence of top-down factors, in particular topic familiarity, on this process remains unclear, despite evidence that semantic information can promote speech-in-noise perception. Apart from individual features forming a static listening condition, dynamic and irregular changes of auditory scenes-volatile listening environments-have been less studied. To address these gaps, we explored the influence of topic familiarity and volatile listening on the selective auditory attention process during dichotic listening using electroencephalography. When stories with unfamiliar topics were presented, participants' comprehension was severely degraded. However, their cortical activity selectively tracked the speech of the target story well. This implies that topic familiarity hardly influences the speech tracking neural index, possibly when the bottom-up information is sufficient. However, when the listening environment was volatile and the listeners had to re-engage in new speech whenever auditory scenes altered, the neural correlates of the attended speech were degraded. In particular, the cortical response to the attended speech and the spatial asymmetry of the response to the left and right attention were significantly attenuated around 100-200 ms after the speech onset. These findings suggest that volatile listening environments could adversely affect the modulation effect of selective attention, possibly by hampering proper attention due to increased perceptual load.
Collapse
Affiliation(s)
- Jonghwa Jeonglok Park
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Seung-Cheol Baek
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Research Group Neurocognition of Music and Language, Max Planck Institute for Empirical Aesthetics, Grüneburgweg 14, Frankfurt am Main 60322, Germany
| | - Myung-Whan Suh
- Department of Otorhinolaryngology-Head and Neck Surgery, Seoul National University Hospital, Seoul 03080, South Korea
| | - Jongsuk Choi
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of AI Robotics, KIST School, Korea University of Science and Technology, Seoul 02792, South Korea
| | - Sung June Kim
- Department of Electrical and Computer Engineering, College of Engineering, Seoul National University, Seoul 08826, South Korea
| | - Yoonseob Lim
- Center for Intelligent & Interactive Robotics, Artificial Intelligence and Robot Institute, Korea Institute of Science and Technology, Seoul 02792, South Korea; Department of HY-KIST Bio-convergence, Hanyang University, Seoul 04763, South Korea.
| |
Collapse
|
7
|
Noyce AL, Kwasa JAC, Shinn-Cunningham BG. Defining attention from an auditory perspective. WILEY INTERDISCIPLINARY REVIEWS. COGNITIVE SCIENCE 2023; 14:e1610. [PMID: 35642475 PMCID: PMC9712589 DOI: 10.1002/wcs.1610] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 04/24/2022] [Accepted: 04/29/2022] [Indexed: 01/17/2023]
Abstract
Attention prioritizes certain information at the expense of other information in ways that are similar across vision, audition, and other sensory modalities. It influences how-and even what-information is represented and processed, affecting brain activity at every level. Much of the core research into cognitive and neural mechanisms of attention has used visual tasks. However, the same top-down, object-based, and bottom-up attentional processes shape auditory perception, largely through the same underlying, cognitive networks. This article is categorized under: Psychology > Attention.
Collapse
|
8
|
Qin Z, Jin R, Zhang C. The Effects of Training Variability and Pitch Aptitude on the Overnight Consolidation of Lexical Tones. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3377-3391. [PMID: 36018578 DOI: 10.1044/2022_jslhr-22-00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE Although variability of training materials has the potential to benefit the learning of lexical tones, the benefit is contingent on an individual's pitch aptitude. Previous studies did not segregate immediate learning and consolidation after an overnight interval, and little is known about how pitch aptitude differences affect consolidation. This study examined whether pitch aptitude predicts overnight consolidation of Cantonese level tones through high-variability (HV) and low-variability (LV) training. METHOD Two groups of Mandarin-speaking participants were first assessed in terms of pitch threshold and tone discrimination, which tapped into different aspects of pitch aptitude. They then received Cantonese level tone identification training in either an HV or an LV condition. The participants were trained in the evening, were tested after training, and returned after 24 hr for overnight consolidation assessment. RESULTS The results indicate that pitch aptitude, measured through pitch threshold, may have predicted overnight consolidation and training progress of the HV group but not those of the LV group. In the HV group, compared with high-aptitude learners, low-aptitude learners benefited temporarily from training variability but did not consolidate the tonal knowledge as much as their high-aptitude counterparts did after 24 hr. CONCLUSIONS The findings suggest that individual learners had difficulty learning nonnative tones by virtue of memory consolidation. Higher pitch aptitude ability (pitch threshold) may provide protection against the decay of learned tones and facilitate tone consolidation. The findings imply that the early emergence of tonal representation is a dynamic process among individuals of nonnative speakers who are exposed to training variability. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.20539998.
Collapse
Affiliation(s)
- Zhen Qin
- Division of Humanities, The Hong Kong University of Science and Technology, Kowloon
| | - Rui Jin
- Division of Humanities, The Hong Kong University of Science and Technology, Kowloon
| | - Caicai Zhang
- Research Centre for Language, Cognition, and Neuroscience, Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hung Hom
| |
Collapse
|
9
|
Implicit and explicit learning in talker identification. Atten Percept Psychophys 2022; 84:2002-2015. [PMID: 35534783 PMCID: PMC10081569 DOI: 10.3758/s13414-022-02500-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/23/2022] [Indexed: 11/08/2022]
Abstract
In the real world, listeners seem to implicitly learn talkers' vocal identities during interactions that prioritize attending to the content of talkers' speech. In contrast, most laboratory experiments of talker identification employ training paradigms that require listeners to explicitly practice identifying voices. Here, we investigated whether listeners become familiar with talkers' vocal identities during initial exposures that do not involve explicit talker identification. Participants were assigned to one of three exposure tasks, in which they heard identical stimuli but were differentially required to attend to the talkers' vocal identity or to the verbal content of their speech: (1) matching the talker to a concurrent visual cue (talker-matching); (2) discriminating whether the talker was the same as the prior trial (talker 1-back); or (3) discriminating whether speech content matched the previous trial (verbal 1-back). All participants were then tested on their ability to learn to identify talkers from novel speech content. Critically, we manipulated whether the talkers during this post-test differed from those heard during training. Compared to learning to identify novel talkers, listeners were significantly more accurate learning to identify the talkers they had previously been exposed to in the talker-matching and verbal 1-back tasks, but not the talker 1-back task. The correlation between talker identification test performance and exposure task performance was also greater when the talkers were the same in both tasks. These results suggest that listeners learn talkers' vocal identity implicitly during speech perception, even if they are not explicitly attending to the talkers' identity.
Collapse
|
10
|
Liang W, Brown CA, Shinn-Cunningham BG. Cat-astrophic effects of sudden interruptions on spatial auditory attention. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3219. [PMID: 35649920 PMCID: PMC9113758 DOI: 10.1121/10.0010453] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Salient interruptions draw attention involuntarily. Here, we explored whether this effect depends on the spatial and temporal relationships between a target stream and interrupter. In a series of online experiments, listeners focused spatial attention on a target stream of spoken syllables in the presence of an otherwise identical distractor stream from the opposite hemifield. On some random trials, an interrupter (a cat "MEOW") occurred. Experiment 1 established that the interrupter, which occurred randomly in 25% of the trials in the hemifield opposite the target, degraded target recall. Moreover, a majority of participants exhibited this degradation for the first target syllable, which finished before the interrupter began. Experiment 2 showed that the effect of an interrupter was similar whether it occurred in the opposite or the same hemifield as the target. Experiment 3 found that the interrupter degraded performance slightly if it occurred before the target stream began but had no effect if it began after the target stream ended. Experiment 4 showed decreased interruption effects when the interruption frequency increased (50% of the trials). These results demonstrate that a salient interrupter disrupts recall of a target stream, regardless of its direction, especially if it occurs during a target stream.
Collapse
Affiliation(s)
- Wusheng Liang
- Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | - Christopher A Brown
- Department of Communication Science and Disorders, The University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | | |
Collapse
|
11
|
Distinct mechanisms for talker adaptation operate in parallel on different timescales. Psychon Bull Rev 2021; 29:627-634. [PMID: 34731443 DOI: 10.3758/s13423-021-02019-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/23/2021] [Indexed: 11/08/2022]
Abstract
The mapping between speech acoustics and phonemic representations is highly variable across talkers, and listeners are slower to recognize words when listening to multiple talkers compared with a single talker. Listeners' speech processing efficiency in mixed-talker settings improves when given time to reorient their attention to each new talker. However, it remains unknown how much time is needed to fully reorient attention to a new talker in mixed-talker settings so that speech processing becomes as efficient as when listening to a single talker. In this study, we examined how speech processing efficiency improves in mixed-talker settings as a function of the duration of continuous speech from a talker. In single-talker and mixed-talker conditions, listeners identified target words either in isolation or preceded by a carrier vowel of parametrically varying durations from 300 to 1,500 ms. Listeners' word identification was significantly slower in every mixed-talker condition compared with the corresponding single-talker condition. The costs associated with processing mixed-talker speech declined significantly as the duration of the speech carrier increased from 0 to 600 ms. However, increasing the carrier duration beyond 600 ms did not achieve further reduction in talker variability-related processing costs. These results suggest that two parallel mechanisms support processing talker variability: A stimulus-driven mechanism that operates on short timescales to reorient attention to new auditory sources, and a top-down mechanism that operates over longer timescales to allocate the cognitive resources needed to accommodate uncertainty in acoustic-phonemic correspondences during contexts where speech may come from multiple talkers.
Collapse
|