1
|
Johns MA, Calloway RC, Karunathilake IMD, Decruy LP, Anderson S, Simon JZ, Kuchinsky SE. Attention Mobilization as a Modulator of Listening Effort: Evidence From Pupillometry. Trends Hear 2024; 28:23312165241245240. [PMID: 38613337 PMCID: PMC11015766 DOI: 10.1177/23312165241245240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 03/11/2024] [Accepted: 03/15/2024] [Indexed: 04/14/2024] Open
Abstract
Listening to speech in noise can require substantial mental effort, even among younger normal-hearing adults. The task-evoked pupil response (TEPR) has been shown to track the increased effort exerted to recognize words or sentences in increasing noise. However, few studies have examined the trajectory of listening effort across longer, more natural, stretches of speech, or the extent to which expectations about upcoming listening difficulty modulate the TEPR. Seventeen younger normal-hearing adults listened to 60-s-long audiobook passages, repeated three times in a row, at two different signal-to-noise ratios (SNRs) while pupil size was recorded. There was a significant interaction between SNR, repetition, and baseline pupil size on sustained listening effort. At lower baseline pupil sizes, potentially reflecting lower attention mobilization, TEPRs were more sustained in the harder SNR condition, particularly when attention mobilization remained low by the third presentation. At intermediate baseline pupil sizes, differences between conditions were largely absent, suggesting these listeners had optimally mobilized their attention for both SNRs. Lastly, at higher baseline pupil sizes, potentially reflecting overmobilization of attention, the effect of SNR was initially reversed for the second and third presentations: participants initially appeared to disengage in the harder SNR condition, resulting in reduced TEPRs that recovered in the second half of the story. Together, these findings suggest that the unfolding of listening effort over time depends critically on the extent to which individuals have successfully mobilized their attention in anticipation of difficult listening conditions.
Collapse
Affiliation(s)
- M. A. Johns
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| | - R. C. Calloway
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| | - I. M. D. Karunathilake
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA
| | - L. P. Decruy
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
| | - S. Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD 20742, USA
| | - J. Z. Simon
- Institute for Systems Research, University of Maryland, College Park, MD 20742, USA
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA
- Department of Biology, University of Maryland, College Park, MD 20742, USA
| | - S. E. Kuchinsky
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD 20742, USA
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD 20889, USA
| |
Collapse
|
2
|
Roverud E, Villard S, Kidd G. Strength of target source segregation cues affects the outcome of speech-on-speech masking experiments. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2780. [PMID: 37140176 PMCID: PMC10319449 DOI: 10.1121/10.0019307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/11/2023] [Accepted: 04/14/2023] [Indexed: 05/05/2023]
Abstract
In speech-on-speech listening experiments, some means for designating which talker is the "target" must be provided for the listener to perform better than chance. However, the relative strength of the segregation variables designating the target could affect the results of the experiment. Here, we examine the interaction of two source segregation variables-spatial separation and talker gender differences-and demonstrate that the relative strengths of these cues may affect the interpretation of the results. Participants listened to sentence pairs spoken by different-gender target and masker talkers, presented naturally or vocoded (degrading gender cues), either colocated or spatially separated. Target and masker words were temporally interleaved to eliminate energetic masking in either an every-other-word or randomized order of presentation. Results showed that the order of interleaving had no effect on recall performance. For natural speech with strong talker gender cues, spatial separation of sources yielded no improvement in performance. For vocoded speech with degraded talker gender cues, performance improved significantly with spatial separation of sources. These findings reveal that listeners may shift among target source segregation cues contingent on cue viability. Finally, performance was poor when the target was designated after stimulus presentation, indicating strong reliance on the cues.
Collapse
Affiliation(s)
- Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Sarah Villard
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
3
|
Johns MA, Calloway RC, Phillips I, Karuzis VP, Dutta K, Smith E, Shamma SA, Goupell MJ, Kuchinsky SE. Performance on stochastic figure-ground perception varies with individual differences in speech-in-noise recognition and working memory capacity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:286. [PMID: 36732241 PMCID: PMC9851714 DOI: 10.1121/10.0016756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 12/07/2022] [Accepted: 12/10/2022] [Indexed: 06/18/2023]
Abstract
Speech recognition in noisy environments can be challenging and requires listeners to accurately segregate a target speaker from irrelevant background noise. Stochastic figure-ground (SFG) tasks in which temporally coherent inharmonic pure-tones must be identified from a background have been used to probe the non-linguistic auditory stream segregation processes important for speech-in-noise processing. However, little is known about the relationship between performance on SFG tasks and speech-in-noise tasks nor the individual differences that may modulate such relationships. In this study, 37 younger normal-hearing adults performed an SFG task with target figure chords consisting of four, six, eight, or ten temporally coherent tones amongst a background of randomly varying tones. Stimuli were designed to be spectrally and temporally flat. An increased number of temporally coherent tones resulted in higher accuracy and faster reaction times (RTs). For ten target tones, faster RTs were associated with better scores on the Quick Speech-in-Noise task. Individual differences in working memory capacity and self-reported musicianship further modulated these relationships. Overall, results demonstrate that the SFG task could serve as an assessment of auditory stream segregation accuracy and RT that is sensitive to individual differences in cognitive and auditory abilities, even among younger normal-hearing adults.
Collapse
Affiliation(s)
- Michael A Johns
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Regina C Calloway
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Ian Phillips
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889, USA
| | - Valerie P Karuzis
- Applied Research Laboratory of Intelligence and Security, University of Maryland, College Park, Maryland 20742, USA
| | - Kelsey Dutta
- Institute for Systems Research, University of Maryland, College Park, Maryland 20742, USA
| | - Ed Smith
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Shihab A Shamma
- Department of Electrical and Computer Engineering, University of Maryland, College Park, Maryland 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889, USA
| |
Collapse
|
4
|
Cho AY, Kidd G. Auditory motion as a cue for source segregation and selection in a "cocktail party" listening environment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1684. [PMID: 36182296 PMCID: PMC9489258 DOI: 10.1121/10.0013990] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Source motion was examined as a cue for segregating concurrent speech or noise sources. In two different headphone-based tasks-motion detection (MD) and speech-on-speech masking (SI)-one source among three was designated as the target only by imposing sinusoidal variation in azimuth during the stimulus presentation. For MD, the lstener was asked which of the three concurrent sources was in motion during the trial. For SI, the listener was asked to report the words spoken by the moving speech source. MD performance improved as the amplitude of the sinusoidal motion (i.e., displacement in azimuth) increased over the range of values tested (±5° to ±30°) for both modulated noise and speech targets, with better performance found for speech. SI performance also improved as the amplitude of target motion increased. Furthermore, SI performance improved as word position progressed throughout the sentence. Performance on the MD task was correlated with performance on SI task across individual subjects. For the SI conditions tested here, these findings are consistent with the proposition that listeners first detect the moving target source, then focus attention on the target location as the target sentence unfolds.
Collapse
Affiliation(s)
- Adrian Y Cho
- Speech and Hearing Bioscience and Technology Program, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Gerald Kidd
- Department of Speech, Language, and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
5
|
Huet MP, Micheyl C, Gaudrain E, Parizet E. Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening-The Long-SWoRD test. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1557. [PMID: 35364949 DOI: 10.1121/10.0007225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Centre National de la Recerche Scientifique UMR5292, Institut National de la Santé et de la Recherche Médicale U1028, Université Claude Bernard Lyon 1, Université de Lyon, Centre Hospitalier Le Vinatier, Neurocampus, 95 boulevard Pinel, Bron Cedex, 69675, France
| | - Etienne Parizet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| |
Collapse
|
6
|
Abstract
Identification of speech from a "target" talker was measured in a speech-on-speech
masking task with two simultaneous "masker" talkers. The overall level of each talker was
either fixed or randomized throughout each stimulus presentation to investigate the
effectiveness of level as a cue for segregating competing talkers and attending to the
target. Experimental manipulations included varying the level difference between talkers
and imposing three types of target level uncertainty: 1) fixed target level across trials,
2) random target level across trials, or 3) random target levels on a word-by-word basis
within a trial. When the target level was predictable performance was better than
corresponding conditions when the target level was uncertain. Masker confusions were
consistent with a high degree of informational masking (IM). Furthermore, evidence was
found for "tuning" in level and a level "release" from IM. These findings suggest that
conforming to listener expectation about relative level, in addition to cues signaling
talker identity, facilitates segregation of, and maintaining focus of attention on, a
specific talker in multiple-talker communication situations.
Collapse
Affiliation(s)
- Andrew J Byrne
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Christopher Conroy
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA
| | - Gerald Kidd
- Department of Speech, Language, & Hearing Sciences, 1846Boston University, MA, USA.,Department of Otolaryngology, Head-Neck Surgery, Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
7
|
Calcus A, Schoof T, Rosen S, Shinn-Cunningham B, Souza P. Switching Streams Across Ears to Evaluate Informational Masking of Speech-on-Speech. Ear Hear 2021; 41:208-216. [PMID: 31107365 PMCID: PMC6856419 DOI: 10.1097/aud.0000000000000741] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES This study aimed to evaluate the informational component of speech-on-speech masking. Speech perception in the presence of a competing talker involves not only informational masking (IM) but also a number of masking processes involving interaction of masker and target energy in the auditory periphery. Such peripherally generated masking can be eliminated by presenting the target and masker in opposite ears (dichotically). However, this also reduces IM by providing listeners with lateralization cues that support spatial release from masking (SRM). In tonal sequences, IM can be isolated by rapidly switching the lateralization of dichotic target and masker streams across the ears, presumably producing ambiguous spatial percepts that interfere with SRM. However, it is not clear whether this technique works with speech materials. DESIGN Speech reception thresholds (SRTs) were measured in 17 young normal-hearing adults for sentences produced by a female talker in the presence of a competing male talker under three different conditions: diotic (target and masker in both ears), dichotic, and dichotic but switching the target and masker streams across the ears. Because switching rate and signal coherence were expected to influence the amount of IM observed, these two factors varied across conditions. When switches occurred, they were either at word boundaries or periodically (every 116 msec) and either with or without a brief gap (84 msec) at every switch point. In addition, SRTs were measured in a quiet condition to rule out audibility as a limiting factor. RESULTS SRTs were poorer for the four switching dichotic conditions than for the nonswitching dichotic condition, but better than for the diotic condition. Periodic switches without gaps resulted in the worst SRTs compared to the other switch conditions, thus maximizing IM. CONCLUSIONS These findings suggest that periodically switching the target and masker streams across the ears (without gaps) was the most efficient in disrupting SRM. Thus, this approach can be used in experiments that seek a relatively pure measure of IM, and could be readily extended to translational research.
Collapse
Affiliation(s)
- Axelle Calcus
- UCL Speech, Hearing and Phonetic Sciences, 2 Wakefield Street, London WC1N 1PF, United Kingdom
- Laboratoire des Systèmes Perceptifs, Département d’Etudes Cognitives, Ecole Normale Supérieure, PSL University, CNRS, 75005 Paris, France
| | - Tim Schoof
- UCL Speech, Hearing and Phonetic Sciences, 2 Wakefield Street, London WC1N 1PF, United Kingdom
| | - Stuart Rosen
- UCL Speech, Hearing and Phonetic Sciences, 2 Wakefield Street, London WC1N 1PF, United Kingdom
| | | | - Pamela Souza
- Department of Communication Sciences and Disorders, Knowles Hearing Center, Northwestern University, 2240 Campus Drive, Evanston, Illinois 60208, USA
| |
Collapse
|
8
|
Jett B, Buss E, Best V, Oleson J, Calandruccio L. Does Sentence-Level Coarticulation Affect Speech Recognition in Noise or a Speech Masker? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:1390-1403. [PMID: 33784185 PMCID: PMC8608179 DOI: 10.1044/2021_jslhr-20-00450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 12/04/2020] [Accepted: 01/05/2021] [Indexed: 06/12/2023]
Abstract
Purpose Three experiments were conducted to better understand the role of between-word coarticulation in masked speech recognition. Specifically, we explored whether naturally coarticulated sentences supported better masked speech recognition as compared to sentences derived from individually spoken concatenated words. We hypothesized that sentence recognition thresholds (SRTs) would be similar for coarticulated and concatenated sentences in a noise masker but would be better for coarticulated sentences in a speech masker. Method Sixty young adults participated (n = 20 per experiment). An adaptive tracking procedure was used to estimate SRTs in the presence of noise or two-talker speech maskers. Targets in Experiments 1 and 2 were matrix-style sentences, while targets in Experiment 3 were semantically meaningful sentences. All experiments included coarticulated and concatenated targets; Experiments 2 and 3 included a third target type, concatenated keyword-intensity-matched (KIM) sentences, in which the words were concatenated but individually scaled to replicate the intensity contours of the coarticulated sentences. Results Regression analyses evaluated the main effects of target type, masker type, and their interaction. Across all three experiments, effects of target type were small (< 2 dB). In Experiment 1, SRTs were slightly poorer for coarticulated than concatenated sentences. In Experiment 2, coarticulation facilitated speech recognition compared to the concatenated KIM condition. When listeners had access to semantic context (Experiment 3), a coarticulation benefit was observed in noise but not in the speech masker. Conclusions Overall, differences between SRTs for sentences with and without between-word coarticulation were small. Beneficial effects of coarticulation were only observed relative to the concatenated KIM targets; for unscaled concatenated targets, it appeared that consistent audibility across the sentence offsets any benefit of coarticulation. Contrary to our hypothesis, effects of coarticulation generally were not more pronounced in speech maskers than in noise maskers.
Collapse
Affiliation(s)
- Brandi Jett
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina at Chapel Hill
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, MA
| | - Jacob Oleson
- Department of Biostatistics, University of Iowa, Iowa City
| | - Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, OH
| |
Collapse
|
9
|
Wang X, Xu L. Speech perception in noise: Masking and unmasking. J Otol 2021; 16:109-119. [PMID: 33777124 PMCID: PMC7985001 DOI: 10.1016/j.joto.2020.12.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2020] [Revised: 12/03/2020] [Accepted: 12/06/2020] [Indexed: 11/23/2022] Open
Abstract
Speech perception is essential for daily communication. Background noise or concurrent talkers, on the other hand, can make it challenging for listeners to track the target speech (i.e., cocktail party problem). The present study reviews and compares existing findings on speech perception and unmasking in cocktail party listening environments in English and Mandarin Chinese. The review starts with an introduction section followed by related concepts of auditory masking. The next two sections review factors that release speech perception from masking in English and Mandarin Chinese, respectively. The last section presents an overall summary of the findings with comparisons between the two languages. Future research directions with respect to the difference in literature on the reviewed topic between the two languages are also discussed.
Collapse
Affiliation(s)
- Xianhui Wang
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| | - Li Xu
- Communication Sciences and Disorders, Ohio University, Athens, OH, 45701, USA
| |
Collapse
|
10
|
Kidd G, Jennings TR, Byrne AJ. Enhancing the perceptual segregation and localization of sound sources with a triple beamformer. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3598. [PMID: 33379918 PMCID: PMC8097713 DOI: 10.1121/10.0002779] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/04/2020] [Accepted: 11/09/2020] [Indexed: 06/01/2023]
Abstract
A triple beamformer was developed to exploit the capabilities of the binaural auditory system. The goal was to enhance the perceptual segregation of spatially separated sound sources while preserving source localization. The triple beamformer comprised a variant of a standard single-channel beamformer that routes the primary beam output focused on the target source location to both ears. The triple beam algorithm adds two supplementary beams with the left-focused beam routed only to the left ear and the right-focused beam routed only to the right ear. The rationale for the approach is that the triple beam processing exploits sound source segregation in high informational masking (IM) conditions. Furthermore, the exaggerated interaural level differences produced by the triple beam are well-suited for categories of listeners (e.g., bilateral cochlear implant users) who receive limited benefit from interaural time differences. The performance with the triple beamformer was compared to normal binaural hearing (simulated using a Knowles Electronic Manikin for Auditory Research, G.R.A.S. Sound and Vibration, Holte, DK) and to that obtained from a single-channel beamformer. Source localization in azimuth and masked speech identification for multiple masker locations were measured for all three algorithms. Taking both localization and speech intelligibility into account, the triple beam algorithm was considered to be advantageous under high IM listening conditions.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Todd R Jennings
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Andrew J Byrne
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
11
|
Song J, Martin L, Iverson P. Auditory neural tracking and lexical processing of speech in noise: Masker type, spatial location, and language experience. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:253. [PMID: 32752786 DOI: 10.1121/10.0001477] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 06/09/2020] [Indexed: 06/11/2023]
Abstract
The present study investigated how single-talker and babble maskers affect auditory and lexical processing during native (L1) and non-native (L2) speech recognition. Electroencephalogram (EEG) recordings were made while L1 and L2 (Korean) English speakers listened to sentences in the presence of single-talker and babble maskers that were colocated or spatially separated from the target. The predictability of the sentences was manipulated to measure lexical-semantic processing (N400), and selective auditory processing of the target was assessed using neural tracking measures. The results demonstrate that intelligible single-talker maskers cause listeners to attend more to the semantic content of the targets (i.e., greater context-related N400 changes) than when targets are in babble, and that listeners track the acoustics of the target less accurately with single-talker maskers. L1 and L2 listeners both modulated their processing in this way, although L2 listeners had more difficulty with the materials overall (i.e., lower behavioral accuracy, less context-related N400 variation, more listening effort). The results demonstrate that auditory and lexical processing can be simultaneously assessed within a naturalistic speech listening task, and listeners can adjust lexical processing to more strongly track the meaning of a sentence in order to help ignore competing lexical content.
Collapse
Affiliation(s)
- Jieun Song
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Luke Martin
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Paul Iverson
- Department of Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
12
|
Wang Y, Zhang J, Zou J, Luo H, Ding N. Prior Knowledge Guides Speech Segregation in Human Auditory Cortex. Cereb Cortex 2020; 29:1561-1571. [PMID: 29788144 DOI: 10.1093/cercor/bhy052] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 01/22/2018] [Accepted: 02/15/2018] [Indexed: 11/12/2022] Open
Abstract
Segregating concurrent sound streams is a computationally challenging task that requires integrating bottom-up acoustic cues (e.g. pitch) and top-down prior knowledge about sound streams. In a multi-talker environment, the brain can segregate different speakers in about 100 ms in auditory cortex. Here, we used magnetoencephalographic (MEG) recordings to investigate the temporal and spatial signature of how the brain utilizes prior knowledge to segregate 2 speech streams from the same speaker, which can hardly be separated based on bottom-up acoustic cues. In a primed condition, the participants know the target speech stream in advance while in an unprimed condition no such prior knowledge is available. Neural encoding of each speech stream is characterized by the MEG responses tracking the speech envelope. We demonstrate that an effect in bilateral superior temporal gyrus and superior temporal sulcus is much stronger in the primed condition than in the unprimed condition. Priming effects are observed at about 100 ms latency and last more than 600 ms. Interestingly, prior knowledge about the target stream facilitates speech segregation by mainly suppressing the neural tracking of the non-target speech stream. In sum, prior knowledge leads to reliable speech segregation in auditory cortex, even in the absence of reliable bottom-up speech segregation cue.
Collapse
Affiliation(s)
- Yuanye Wang
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Jianfeng Zhang
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Jiajie Zou
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| | - Huan Luo
- School of Psychological and Cognitive Sciences, Peking University, Beijing, China.,McGovern Institute for Brain Research, Peking University, Beijing, China.,Beijing Key Laboratory of Behavior and Mental Health, Peking University, Beijing, China
| | - Nai Ding
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Hangzhou, Zhejiang, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, Zhejiang, China.,State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, Zhejiang, China.,Interdisciplinary Center for Social Sciences, Zhejiang University, Hangzhou, Zhejiang, China
| |
Collapse
|
13
|
Summers RJ, Roberts B. Informational masking of speech by acoustically similar intelligible and unintelligible interferers. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:1113. [PMID: 32113320 DOI: 10.1121/10.0000688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/19/2020] [Indexed: 06/10/2023]
Abstract
Masking experienced when target speech is accompanied by a single interfering voice is often primarily informational masking (IM). IM is generally greater when the interferer is intelligible than when it is not (e.g., speech from an unfamiliar language), but the relative contributions of acoustic-phonetic and linguistic interference are often difficult to assess owing to acoustic differences between interferers (e.g., different talkers). Three-formant analogues (F1+F2+F3) of natural sentences were used as targets and interferers. Targets were presented monaurally either alone or accompanied contralaterally by interferers from another sentence (F0 = 4 semitones higher); a target-to-masker ratio (TMR) between ears of 0, 6, or 12 dB was used. Interferers were either intelligible or rendered unintelligible by delaying F2 and advancing F3 by 150 ms relative to F1, a manipulation designed to minimize spectro-temporal differences between corresponding interferers. Target-sentence intelligibility (keywords correct) was 67% when presented alone, but fell considerably when an unintelligible interferer was present (49%) and significantly further when the interferer was intelligible (41%). Changes in TMR produced neither a significant main effect nor an interaction with interferer type. Interference with acoustic-phonetic processing of the target can explain much of the impact on intelligibility, but linguistic factors-particularly interferer intrusions-also make an important contribution to IM.
Collapse
Affiliation(s)
- Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
14
|
Zou J, Feng J, Xu T, Jin P, Luo C, Zhang J, Pan X, Chen F, Zheng J, Ding N. Auditory and language contributions to neural encoding of speech features in noisy environments. Neuroimage 2019; 192:66-75. [DOI: 10.1016/j.neuroimage.2019.02.047] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Revised: 01/31/2019] [Accepted: 02/19/2019] [Indexed: 11/28/2022] Open
|
15
|
Villard S, Kidd G. Effects of Acquired Aphasia on the Recognition of Speech Under Energetic and Informational Masking Conditions. Trends Hear 2019; 23:2331216519884480. [PMID: 31694486 PMCID: PMC7000861 DOI: 10.1177/2331216519884480] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 09/24/2019] [Accepted: 10/01/2019] [Indexed: 11/16/2022] Open
Abstract
Persons with aphasia (PWA) often report difficulty understanding spoken language in noisy environments that require listeners to identify and selectively attend to target speech while ignoring competing background sounds or “maskers.” This study compared the performance of PWA and age-matched healthy controls (HC) on a masked speech identification task and examined the consequences of different types of masking on performance. Twelve PWA and 12 age-matched HC completed a speech identification task comprising three conditions designed to differentiate between the effects of energetic and informational masking on receptive speech processing. The target and masker speech materials were taken from a closed-set matrix-style corpus, and a forced-choice word identification task was used. Target and maskers were spatially separated from one another in order to simulate real-world listening environments and allow listeners to make use of binaural cues for source segregation. Individualized frequency-specific gain was applied to compensate for the effects of hearing loss. Although both groups showed similar susceptibility to the effects of energetic masking, PWA were more susceptible than age-matched HC to the effects of informational masking. Results indicate that this increased susceptibility cannot be attributed to age, hearing loss, or comprehension deficits and is therefore a consequence of acquired cognitive-linguistic impairments associated with aphasia. This finding suggests that aphasia may result in increased difficulty segregating target speech from masker speech, which in turn may have implications for the ability of PWA to comprehend target speech in multitalker environments, such as restaurants, family gatherings, and other everyday situations.
Collapse
Affiliation(s)
- Sarah Villard
- Department of Speech, Language & Hearing Sciences,
Boston University, MA, USA
| | - Gerald Kidd
- Department of Speech, Language & Hearing Sciences,
Boston University, MA, USA
| |
Collapse
|
16
|
Kidd G, Mason CR, Best V, Roverud E, Swaminathan J, Jennings T, Clayton K, Steven Colburn H. Determining the energetic and informational components of speech-on-speech masking in listeners with sensorineural hearing loss. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:440. [PMID: 30710924 PMCID: PMC6347574 DOI: 10.1121/1.5087555] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2018] [Revised: 11/19/2018] [Accepted: 12/18/2018] [Indexed: 05/20/2023]
Abstract
The ability to identify the words spoken by one talker masked by two or four competing talkers was tested in young-adult listeners with sensorineural hearing loss (SNHL). In a reference/baseline condition, masking speech was colocated with target speech, target and masker talkers were female, and the masker was intelligible. Three comparison conditions included replacing female masker talkers with males, time-reversal of masker speech, and spatial separation of sources. All three variables produced significant release from masking. To emulate energetic masking (EM), stimuli were subjected to ideal time-frequency segregation retaining only the time-frequency units where target energy exceeded masker energy. Subjects were then tested with these resynthesized "glimpsed stimuli." For either two or four maskers, thresholds only varied about 3 dB across conditions suggesting that EM was roughly equal. Compared to normal-hearing listeners from an earlier study [Kidd, Mason, Swaminathan, Roverud, Clayton, and Best, J. Acoust. Soc. Am. 140, 132-144 (2016)], SNHL listeners demonstrated both greater energetic and informational masking as well as higher glimpsed thresholds. Individual differences were correlated across masking release conditions suggesting that listeners could be categorized according to their general ability to solve the task. Overall, both peripheral and central factors appear to contribute to the higher thresholds for SNHL listeners.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Todd Jennings
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - Kameron Clayton
- Department of Speech, Language and Hearing Sciences, Boston University, Boston, Massachusetts 02215, USA
| | - H Steven Colburn
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| |
Collapse
|
17
|
Calandruccio L, Buss E, Bencheck P, Jett B. Does the semantic content or syntactic regularity of masker speech affect speech-on-speech recognition? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:3289. [PMID: 30599661 PMCID: PMC6786886 DOI: 10.1121/1.5081679] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 11/07/2018] [Accepted: 11/09/2018] [Indexed: 05/30/2023]
Abstract
Speech-on-speech recognition differs substantially across stimuli, but it is unclear what role linguistic features of the masker play in this variability. The linguistic similarity hypothesis suggests similarity between sentence-level semantic content of the target and masker speech increases masking. Sentence recognition in a two-talker masker was evaluated with respect to semantic content and syntactic structure of the masker (experiment 1) and linguistic similarity of the target and masker (experiment 2). Target and masker sentences were semantically meaningful or anomalous. Masker syntax was varied or the same across sentences. When other linguistic features of the masker were controlled, variability in syntactic structure across masker tokens was only relevant when the masker was played continuously (as opposed to gated); when played continuously, sentence-recognition thresholds were poorer with variable than consistent masker syntax, but this effect was small (0.5 dB). When the syntactic structure of the masker was held constant, semantic meaningfulness of the masker did not increase masking, and at times performance was better for the meaningful than the anomalous masker. These data indicate that sentence-level semantic content of the masker speech does not influence speech-on-speech masking. Further, no evidence that similarities between target/masker sentence-level semantic content increases masking was found.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Emily Buss
- Department of Head/Neck Surgery and Otolaryngology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA
| | - Penelope Bencheck
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Brandi Jett
- Department of Psychological Sciences, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
18
|
Roberts B, Summers RJ. Informational masking of speech by time-varying competitors: Effects of frequency region and number of interfering formants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:891. [PMID: 29495741 DOI: 10.1121/1.5023476] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
This study explored the extent to which informational masking of speech depends on the frequency region and number of extraneous formants in an interferer. Target formants-monotonized three-formant (F1+F2+F3) analogues of natural sentences-were presented monaurally, with target ear assigned randomly on each trial. Interferers were presented contralaterally. In experiment 1, single-formant interferers were created using the time-reversed F2 frequency contour and constant amplitude, root-mean-square (RMS)-matched to F2. Interferer center frequency was matched to that of F1, F2, or F3, while maintaining the extent of formant-frequency variation (depth) on a log scale. Adding an interferer lowered intelligibility; the effect of frequency region was small and broadly tuned around F2. In experiment 2, interferers comprised either one formant (F1, the most intense) or all three, created using the time-reversed frequency contours of the corresponding targets and RMS-matched constant amplitudes. Interferer formant-frequency variation was scaled to 0%, 50%, or 100% of the original depth. Increasing the depth of formant-frequency variation and number of formants in the interferer had independent and additive effects. These findings suggest that the impact on intelligibility depends primarily on the overall extent of frequency variation in each interfering formant (up to ∼100% depth) and the number of extraneous formants.
Collapse
Affiliation(s)
- Brian Roberts
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | - Robert J Summers
- Psychology, School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
19
|
Farris HE, Ryan MJ. Schema vs. primitive perceptual grouping: the relative weighting of sequential vs. spatial cues during an auditory grouping task in frogs. J Comp Physiol A Neuroethol Sens Neural Behav Physiol 2017; 203:175-182. [PMID: 28197725 PMCID: PMC10084916 DOI: 10.1007/s00359-017-1149-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Revised: 01/19/2017] [Accepted: 01/20/2017] [Indexed: 10/20/2022]
Abstract
Perceptually, grouping sounds based on their sources is critical for communication. This is especially true in túngara frog breeding aggregations, where multiple males produce overlapping calls that consist of an FM 'whine' followed by harmonic bursts called 'chucks'. Phonotactic females use at least two cues to group whines and chucks: whine-chuck spatial separation and sequence. Spatial separation is a primitive cue, whereas sequence is schema-based, as chuck production is morphologically constrained to follow whines, meaning that males cannot produce the components simultaneously. When one cue is available, females perceptually group whines and chucks using relative comparisons: components with the smallest spatial separation or those closest to the natural sequence are more likely grouped. By simultaneously varying the temporal sequence and spatial separation of a single whine and two chucks, this study measured between-cue perceptual weighting during a specific grouping task. Results show that whine-chuck spatial separation is a stronger grouping cue than temporal sequence, as grouping is more likely for stimuli with smaller spatial separation and non-natural sequence than those with larger spatial separation and natural sequence. Compared to the schema-based whine-chuck sequence, we propose that spatial cues have less variance, potentially explaining their preferred use when grouping during directional behavioral responses.
Collapse
Affiliation(s)
- Hamilton E Farris
- Neuroscience Center, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Cell Biology and Anatomy, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA. .,Department of Otorhinolaryingology, Louisiana State University Health Sciences Center, New Orleans, LA, 70112, USA.
| | - Michael J Ryan
- Department of Integrative Biology, University of Texas, 1 University Station C0930, Austin, TX, 78712, USA.,Smithsonian Tropical Research Institute, Balboa, Panama
| |
Collapse
|
20
|
Kidd G, Colburn HS. Informational Masking in Speech Recognition. SPRINGER HANDBOOK OF AUDITORY RESEARCH 2017. [DOI: 10.1007/978-3-319-51662-2_4] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
21
|
Helfer KS, Merchant GR, Freyman RL. Aging and the effect of target-masker alignment. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:3844. [PMID: 27908027 PMCID: PMC5392104 DOI: 10.1121/1.4967297] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 10/05/2016] [Accepted: 10/25/2016] [Indexed: 05/29/2023]
Abstract
Similarity between target and competing speech messages plays a large role in how easy or difficult it is to understand messages of interest. Much research on informational masking has used highly aligned target and masking utterances that are very similar semantically and syntactically. However, listeners rarely encounter situations in real life where they must understand one sentence in the presence of another (or more than one) highly aligned, syntactically similar competing sentence(s). The purpose of the present study was to examine the effect of syntactic/semantic similarity of target and masking speech in different spatial conditions among younger, middle-aged, and older adults. The results of this experiment indicate that differences in speech recognition between older and younger participants were largest when the masker surrounded the target and was more similar to the target, especially at more adverse signal-to-noise ratios. Differences among listeners and the effect of similarity were much less robust, and all listeners were relatively resistant to masking, when maskers were located on one side of the target message. The present results suggest that previous studies using highly aligned stimuli may have overestimated age-related speech recognition problems.
Collapse
Affiliation(s)
- Karen S Helfer
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Gabrielle R Merchant
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| | - Richard L Freyman
- Department of Communication Disorders, University of Massachusetts Amherst, 358 North Pleasant Street, Amherst, Massachusetts 01003, USA
| |
Collapse
|
22
|
Kidd G, Mason CR, Swaminathan J, Roverud E, Clayton KK, Best V. Determining the energetic and informational components of speech-on-speech masking. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:132. [PMID: 27475139 PMCID: PMC5392100 DOI: 10.1121/1.4954748] [Citation(s) in RCA: 73] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated "glimpses" were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions.
Collapse
Affiliation(s)
- Gerald Kidd
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Christine R Mason
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Jayaganesh Swaminathan
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Elin Roverud
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Kameron K Clayton
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| | - Virginia Best
- Department of Speech, Language and Hearing Sciences and Hearing Research Center, Boston University, 635 Commonwealth Avenue, Boston, Massachusetts 02215, USA
| |
Collapse
|
23
|
Meister H, Schreitmüller S, Ortmann M, Rählmann S, Walger M. Effects of Hearing Loss and Cognitive Load on Speech Recognition with Competing Talkers. Front Psychol 2016; 7:301. [PMID: 26973585 PMCID: PMC4777916 DOI: 10.3389/fpsyg.2016.00301] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 02/16/2016] [Indexed: 12/30/2022] Open
Abstract
Everyday communication frequently comprises situations with more than one talker speaking at a time. These situations are challenging since they pose high attentional and memory demands placing cognitive load on the listener. Hearing impairment additionally exacerbates communication problems under these circumstances. We examined the effects of hearing loss and attention tasks on speech recognition with competing talkers in older adults with and without hearing impairment. We hypothesized that hearing loss would affect word identification, talker separation and word recall and that the difficulties experienced by the hearing impaired listeners would be especially pronounced in a task with high attentional and memory demands. Two listener groups closely matched for their age and neuropsychological profile but differing in hearing acuity were examined regarding their speech recognition with competing talkers in two different tasks. One task required repeating back words from one target talker (1TT) while ignoring the competing talker whereas the other required repeating back words from both talkers (2TT). The competing talkers differed with respect to their voice characteristics. Moreover, sentences either with low or high context were used in order to consider linguistic properties. Compared to their normal hearing peers, listeners with hearing loss revealed limited speech recognition in both tasks. Their difficulties were especially pronounced in the more demanding 2TT task. In order to shed light on the underlying mechanisms, different error sources, namely having misunderstood, confused, or omitted words were investigated. Misunderstanding and omitting words were more frequently observed in the hearing impaired than in the normal hearing listeners. In line with common speech perception models, it is suggested that these effects are related to impaired object formation and taxed working memory capacity (WMC). In a post-hoc analysis, the listeners were further separated with respect to their WMC. It appeared that higher capacity could be used in the sense of a compensatory mechanism with respect to the adverse effects of hearing loss, especially with low context speech.
Collapse
Affiliation(s)
- Hartmut Meister
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne Cologne, Germany
| | - Stefan Schreitmüller
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne Cologne, Germany
| | - Magdalene Ortmann
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne Cologne, Germany
| | - Sebastian Rählmann
- Jean-Uhrmacher-Institute for Clinical ENT-Research, University of Cologne Cologne, Germany
| | - Martin Walger
- Clinic of Otorhinolaryngology, Head and Neck Surgery, University of Cologne Cologne, Germany
| |
Collapse
|