1
|
Warnke L, de Ruiter JP. Top-down effect of dialogue coherence on perceived speaker identity. Sci Rep 2023; 13:3458. [PMID: 36859459 PMCID: PMC9977839 DOI: 10.1038/s41598-023-30435-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 02/23/2023] [Indexed: 03/03/2023] Open
Abstract
A key mechanism in the comprehension of conversation is the ability for listeners to recognize who is speaking and when a speaker switch occurs. Some authors suggest that speaker change detection is accomplished through bottom-up mechanisms in which listeners draw on changes in the acoustic features of the auditory signal. Other accounts propose that speaker change detection involves drawing on top-down linguistic representations to identify who is speaking. The present study investigates these hypotheses experimentally by manipulating the pragmatic coherence of conversational utterances. In experiment 1, participants listened to pairs of utterances and had to indicate whether they heard the same or different speakers. Even though all utterances were spoken by the same speaker, our results show that when two segments of conversation are spoken by the same speaker but make sense for different speakers to say, listeners report hearing different speakers. In experiment 2 we removed pragmatic information from the same stimuli by scrambling word order while leaving acoustic information intact. In contrast to experiment 1, results from the second experiment indicate no difference between our experimental conditions. We interpret these results as a top-down effect of pragmatic expectations: knowledge of conversational structure at least partially determines a listener's perception of speaker changes in conversation.
Collapse
Affiliation(s)
- Lena Warnke
- Department of Psychology, Tufts University, Medford, MA, USA.
| | - Jan P. de Ruiter
- grid.429997.80000 0004 1936 7531Department of Psychology, Tufts University, Medford, MA USA ,grid.429997.80000 0004 1936 7531Department of Computer Science, Tufts University, Medford, MA USA
| |
Collapse
|
2
|
Flaherty MM, Buss E, Libert K. Effects of Target and Masker Fundamental Frequency Contour Depth on School-Age Children's Speech Recognition in a Two-Talker Masker. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:400-414. [PMID: 36580582 DOI: 10.1044/2022_jslhr-22-00207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
PURPOSE Maturation of the ability to recognize target speech in the presence of a two-talker speech masker extends into early adolescence. This study evaluated whether children benefit from differences in fundamental frequency (f o) contour depth between the target and masker speech, a cue that has been shown to improve recognition in adults. METHOD Speech stimuli were recorded from talkers using three speaking styles, with f o contour depths that were Flat, Normal, or Exaggerated. Targets were open-set, declarative sentences produced by a female talker, and maskers were two streams of concatenated sentences produced by a second female talker. Listeners were children (ages 5-17 years) and adults (ages 18-24 years) with normal hearing. Each listener was tested in one of the three masker styles paired with all three target styles. Speech recognition thresholds (SRTs) corresponding to 50% correct were estimated by fitting psychometric functions to adaptive track data. RESULTS For adults, performance did not differ significantly across conditions with matched speaking styles. A mismatch benefit was observed when combining Flat targets with the Exaggerated masker and Exaggerated targets with the Flat masker, and for both Flat and Exaggerated targets paired with the Normal masker. For children, there was a significant effect of age in all conditions. Flat targets in the Flat masker were associated with lower SRTs than the other two matched conditions, and a mismatch benefit was observed for young children only when the target f o contour was less variable than the masker f o contour. CONCLUSIONS Whereas child-directed speech often has exaggerated pitch contours, young children were better able to recognize speech with less variable f o. Age effects were observed in the benefit of mismatched speaking styles for some conditions, which could be related to differences in baseline SRTs rather than differences in segregation abilities.
Collapse
Affiliation(s)
- Mary M Flaherty
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, The University of North Carolina at Chapel Hill
| | - Kelsey Libert
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, Champaign
| |
Collapse
|
3
|
Kegler M, Weissbart H, Reichenbach T. The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Front Neurosci 2022; 16:915744. [PMID: 35942153 PMCID: PMC9355803 DOI: 10.3389/fnins.2022.915744] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2022] [Accepted: 07/04/2022] [Indexed: 11/21/2022] Open
Abstract
Spoken language comprehension requires rapid and continuous integration of information, from lower-level acoustic to higher-level linguistic features. Much of this processing occurs in the cerebral cortex. Its neural activity exhibits, for instance, correlates of predictive processing, emerging at delays of a few 100 ms. However, the auditory pathways are also characterized by extensive feedback loops from higher-level cortical areas to lower-level ones as well as to subcortical structures. Early neural activity can therefore be influenced by higher-level cognitive processes, but it remains unclear whether such feedback contributes to linguistic processing. Here, we investigated early speech-evoked neural activity that emerges at the fundamental frequency. We analyzed EEG recordings obtained when subjects listened to a story read by a single speaker. We identified a response tracking the speaker's fundamental frequency that occurred at a delay of 11 ms, while another response elicited by the high-frequency modulation of the envelope of higher harmonics exhibited a larger magnitude and longer latency of about 18 ms with an additional significant component at around 40 ms. Notably, while the earlier components of the response likely originate from the subcortical structures, the latter presumably involves contributions from cortical regions. Subsequently, we determined the magnitude of these early neural responses for each individual word in the story. We then quantified the context-independent frequency of each word and used a language model to compute context-dependent word surprisal and precision. The word surprisal represented how predictable a word is, given the previous context, and the word precision reflected the confidence about predicting the next word from the past context. We found that the word-level neural responses at the fundamental frequency were predominantly influenced by the acoustic features: the average fundamental frequency and its variability. Amongst the linguistic features, only context-independent word frequency showed a weak but significant modulation of the neural response to the high-frequency envelope modulation. Our results show that the early neural response at the fundamental frequency is already influenced by acoustic as well as linguistic information, suggesting top-down modulation of this neural response.
Collapse
Affiliation(s)
- Mikolaj Kegler
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
| | - Hugo Weissbart
- Donders Centre for Cognitive Neuroimaging, Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, Netherlands
| | - Tobias Reichenbach
- Department of Bioengineering, Centre for Neurotechnology, Imperial College London, London, United Kingdom
- Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany
- *Correspondence: Tobias Reichenbach
| |
Collapse
|
4
|
Jiang J, Johnson JCS, Requena-Komuro MC, Benhamou E, Sivasathiaseelan H, Sheppard DL, Volkmer A, Crutch SJ, Hardy CJD, Warren JD. Phonemic restoration in Alzheimer's disease and semantic dementia: a preliminary investigation. Brain Commun 2022; 4:fcac118. [PMID: 35611314 PMCID: PMC9123842 DOI: 10.1093/braincomms/fcac118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 12/20/2021] [Accepted: 05/04/2022] [Indexed: 11/15/2022] Open
Abstract
Phonemic restoration-perceiving speech sounds that are actually missing-is a fundamental perceptual process that 'repairs' interrupted spoken messages during noisy everyday listening. As a dynamic, integrative process, phonemic restoration is potentially affected by neurodegenerative pathologies, but this has not been clarified. Here, we studied this phenomenon in 5 patients with typical Alzheimer's disease and 4 patients with semantic dementia, relative to 22 age-matched healthy controls. Participants heard isolated sounds, spoken real words and pseudowords in which noise bursts either overlaid a consonant or replaced it; a tendency to hear replaced (missing) speech sounds as present signified phonemic restoration. All groups perceived isolated noises normally and showed phonemic restoration of real words, most marked in Alzheimer's patients. For pseudowords, healthy controls showed no phonemic restoration, while Alzheimer's patients showed marked suppression of phonemic restoration and patients with semantic dementia contrastingly showed phonemic restoration comparable to real words. Our findings provide the first evidence that phonemic restoration is preserved or even enhanced in neurodegenerative diseases, with distinct syndromic profiles that may reflect the relative integrity of bottom-up phonological representation and top-down lexical disambiguation mechanisms in different diseases. This work has theoretical implications for predictive coding models of language and neurodegenerative disease and for understanding cognitive 'repair' processes in dementia. Future research should expand on these preliminary observations with larger cohorts.
Collapse
Affiliation(s)
- Jessica Jiang
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Jeremy C. S. Johnson
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Maï-Carmen Requena-Komuro
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Elia Benhamou
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Harri Sivasathiaseelan
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Damion L. Sheppard
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Anna Volkmer
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Sebastian J. Crutch
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Chris J. D. Hardy
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| | - Jason D Warren
- Dementia Research Centre, Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London WC1N 3AR, UK
| |
Collapse
|
5
|
Huet MP, Micheyl C, Gaudrain E, Parizet E. Vocal and semantic cues for the segregation of long concurrent speech stimuli in diotic and dichotic listening-The Long-SWoRD test. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:1557. [PMID: 35364949 DOI: 10.1121/10.0007225] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 10/25/2021] [Indexed: 06/14/2023]
Abstract
It is not always easy to follow a conversation in a noisy environment. To distinguish between two speakers, a listener must mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background noise. The development of an intelligibility task with long stimuli-the Long-SWoRD test-is introduced. This protocol allows participants to fully benefit from the cognitive resources, such as semantic knowledge, to separate two talkers in a realistic listening environment. Moreover, this task also provides the experimenters with a means to infer fluctuations in auditory selective attention. Two experiments document the performance of normal-hearing listeners in situations where the perceptual separability of the competing voices ranges from easy to hard using a combination of voice and binaural cues. The results show a strong effect of voice differences when the voices are presented diotically. In addition, analyzing the influence of the semantic context on the pattern of responses indicates that the semantic information induces a response bias in situations where the competing voices are distinguishable and indistinguishable from one another.
Collapse
Affiliation(s)
- Moïra-Phoebé Huet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| | | | - Etienne Gaudrain
- Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Centre National de la Recerche Scientifique UMR5292, Institut National de la Santé et de la Recherche Médicale U1028, Université Claude Bernard Lyon 1, Université de Lyon, Centre Hospitalier Le Vinatier, Neurocampus, 95 boulevard Pinel, Bron Cedex, 69675, France
| | - Etienne Parizet
- Laboratory of Vibration and Acoustics, National Institute of Applied Sciences, University of Lyon, 20 Avenue Albert Einstein, Villeurbanne, 69100, France
| |
Collapse
|
6
|
Wasiuk PA, Lavandier M, Buss E, Oleson J, Calandruccio L. The effect of fundamental frequency contour similarity on multi-talker listening in older and younger adults. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:3527. [PMID: 33379934 PMCID: PMC7863686 DOI: 10.1121/10.0002661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Older adults with hearing loss have greater difficulty recognizing target speech in multi-talker environments than young adults with normal hearing, especially when target and masker speech streams are perceptually similar. A difference in fundamental frequency (f0) contour depth is an effective stream segregation cue for young adults with normal hearing. This study examined whether older adults with varying degrees of sensorineural hearing loss are able to utilize differences in target/masker f0 contour depth to improve speech recognition in multi-talker listening. Speech recognition thresholds (SRTs) were measured for speech mixtures composed of target/masker streams with flat, normal, and exaggerated speaking styles, in which f0 contour depth systematically varied. Computational modeling estimated differences in energetic masking across listening conditions. Young adults had lower SRTs than older adults; a result that was partially explained by differences in audibility predicted by the model. However, audibility differences did not explain why young adults experienced a benefit from mismatched target/masker f0 contour depth, while in most conditions, older adults did not. Reduced ability to use segregation cues (differences in target/masker f0 contour depth), and deficits grouping speech with variable f0 contours likely contribute to difficulties experienced by older adults in challenging acoustic environments.
Collapse
Affiliation(s)
- Peter A Wasiuk
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| | - Mathieu Lavandier
- Univ. Lyon, ENTPE, Laboratoire Génie Civil et Bâtiment, Rue M. Audin, Vaulx-en-Velin Cedex, 69518, France
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, University of North Carolina, CB#7070, Chapel Hill, North Carolina 27599, USA
| | - Jacob Oleson
- Department of Biostatistics, N300 CPHB, University of Iowa, 145 North Riverside Drive, Iowa City, Iowa 52242-2007, USA
| | - Lauren Calandruccio
- Department of Psychological Sciences, 11635 Euclid Avenue, Case Western Reserve University, Cleveland, Ohio 44106, USA
| |
Collapse
|
7
|
Flaherty MM, Buss E, Leibold LJ. Developmental Effects in Children's Ability to Benefit From F0 Differences Between Target and Masker Speech. Ear Hear 2020; 40:927-937. [PMID: 30334835 PMCID: PMC6467703 DOI: 10.1097/aud.0000000000000673] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The objectives of this study were to (1) evaluate the extent to which school-age children benefit from fundamental frequency (F0) differences between target words and competing two-talker speech, and (2) assess whether this benefit changes with age. It was predicted that while children would be more susceptible to speech-in-speech masking compared to adults, they would benefit from differences in F0 between target and masker speech. A second experiment was conducted to evaluate the relationship between frequency discrimination thresholds and the ability to benefit from target/masker differences in F0. DESIGN Listeners were children (5 to 15 years) and adults (20 to 36 years) with normal hearing. In the first experiment, speech reception thresholds (SRTs) for disyllabic words were measured in a continuous, 60-dB SPL two-talker speech masker. The same male talker produced both the target and masker speech (average F0 = 120 Hz). The level of the target words was adaptively varied to estimate the level associated with 71% correct identification. The procedure was a four-alternative forced-choice with a picture-pointing response. Target words either had the same mean F0 as the masker or it was shifted up by 3, 6, or 9 semitones. To determine the benefit of target/masker F0 separation on word recognition, masking release was computed by subtracting thresholds in each shifted-F0 condition from the threshold in the unshifted-F0 condition. In the second experiment, frequency discrimination thresholds were collected for a subset of listeners to determine whether sensitivity to F0 differences would be predictive of SRTs. The standard was the syllable /ba/ with an F0 of 250 Hz; the target stimuli had a higher F0. Discrimination thresholds were measured using a three-alternative, three-interval forced choice procedure. RESULTS Younger children (5 to 12 years) had significantly poorer SRTs than older children (13 to 15 years) and adults in the unshifted-F0 condition. The benefit of F0 separations generally increased with increasing child age and magnitude of target/masker F0 separation. For 5- to 7-year-olds, there was a small benefit of F0 separation in the 9-semitone condition only. For 8- to 12-year-olds, there was a benefit from both 6- and 9-semitone separations, but to a lesser degree than what was observed for older children (13 to 15 years) and adults, who showed a substantial benefit in the 6- and 9-semitone conditions. Examination of individual data found that children younger than 7 years of age did not benefit from any of the F0 separations tested. Results for the frequency discrimination task indicated that, while there was a trend for improved thresholds with increasing age, these thresholds were not predictive of the ability to use F0 differences in the speech-in-speech recognition task after controlling for age. CONCLUSIONS The overall pattern of results suggests that children's ability to benefit from F0 differences in speech-in-speech recognition follows a prolonged developmental trajectory. Younger children are less able to capitalize on differences in F0 between target and masker speech. The extent to which individual children benefitted from target/masker F0 differences was not associated with their frequency discrimination thresholds.
Collapse
Affiliation(s)
- Mary M. Flaherty
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska, USA
| | - Emily Buss
- Department of Otolaryngology/Head and Neck Surgery, School of Medicine, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Lori J. Leibold
- Center for Hearing Research, Boys Town National Research Hospital, Omaha, Nebraska, USA
| |
Collapse
|
8
|
Tamati TN, Janse E, Başkent D. Perceptual Discrimination of Speaking Style Under Cochlear Implant Simulation. Ear Hear 2019; 40:63-76. [PMID: 29742545 PMCID: PMC6319584 DOI: 10.1097/aud.0000000000000591] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 03/12/2018] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Real-life, adverse listening conditions involve a great deal of speech variability, including variability in speaking style. Depending on the speaking context, talkers may use a more casual, reduced speaking style or a more formal, careful speaking style. Attending to fine-grained acoustic-phonetic details characterizing different speaking styles facilitates the perception of the speaking style used by the talker. These acoustic-phonetic cues are poorly encoded in cochlear implants (CIs), potentially rendering the discrimination of speaking style difficult. As a first step to characterizing CI perception of real-life speech forms, the present study investigated the perception of different speaking styles in normal-hearing (NH) listeners with and without CI simulation. DESIGN The discrimination of three speaking styles (conversational reduced speech, speech from retold stories, and carefully read speech) was assessed using a speaking style discrimination task in two experiments. NH listeners classified sentence-length utterances, produced in one of the three styles, as either formal (careful) or informal (conversational). Utterances were presented with unmodified speaking rates in experiment 1 (31 NH, young adult Dutch speakers) and with modified speaking rates set to the average rate across all utterances in experiment 2 (28 NH, young adult Dutch speakers). In both experiments, acoustic noise-vocoder simulations of CIs were used to produce 12-channel (CI-12) and 4-channel (CI-4) vocoder simulation conditions, in addition to a no-simulation condition without CI simulation. RESULTS In both experiments 1 and 2, NH listeners were able to reliably discriminate the speaking styles without CI simulation. However, this ability was reduced under CI simulation. In experiment 1, participants showed poor discrimination of speaking styles under CI simulation. Listeners used speaking rate as a cue to make their judgements, even though it was not a reliable cue to speaking style in the study materials. In experiment 2, without differences in speaking rate among speaking styles, listeners showed better discrimination of speaking styles under CI simulation, using additional cues to complete the task. CONCLUSIONS The findings from the present study demonstrate that perceiving differences in three speaking styles under CI simulation is a difficult task because some important cues to speaking style are not fully available in these conditions. While some cues like speaking rate are available, this information alone may not always be a reliable indicator of a particular speaking style. Some other reliable speaking styles cues, such as degraded acoustic-phonetic information and variability in speaking rate within an utterance, may be available but less salient. However, as in experiment 2, listeners' perception of speaking styles may be modified if they are constrained or trained to use these additional cues, which were more reliable in the context of the present study. Taken together, these results suggest that dealing with speech variability in real-life listening conditions may be a challenge for CI users.
Collapse
Affiliation(s)
- Terrin N. Tamati
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| | - Esther Janse
- Centre for Language Studies, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
9
|
Deroche MLD, Gracco VL. Segregation of voices with single or double fundamental frequencies. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:847. [PMID: 30823786 DOI: 10.1121/1.5090107] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2018] [Accepted: 01/23/2019] [Indexed: 06/09/2023]
Abstract
In cocktail-party situations, listeners can use the fundamental frequency (F0) of a voice to segregate it from competitors, but other cues in speech could help, such as co-modulation of envelopes across frequency or more complex cues related to the semantic/syntactic content of the utterances. For simplicity, this (non-pitch) form of grouping is referred to as "articulatory." By creating a new type of speech with two steady F0s, it was examined how these two forms of segregation compete: articulatory grouping would bind the partials of a double-F0 source together, whereas harmonic segregation would tend to split them in two subsets. In experiment 1, maskers were two same-male sentences. Speech reception thresholds were high in this task (vicinity of 0 dB), and harmonic segregation behaved as though double-F0 stimuli were two independent sources. This was not the case in experiment 2, where maskers were speech-shaped complexes (buzzes). First, double-F0 targets were immune to the masking of a single-F0 buzz matching one of the two target F0s. Second, double-F0 buzzes were particularly effective at masking a single-F0 target matching one of the two buzz F0s. As a conclusion, the strength of F0-segregation appears to depend on whether the masker is speech or not.
Collapse
Affiliation(s)
- Mickael L D Deroche
- Centre for Research on Brain, Language and Music, McGill University, 3640 rue de la Montagne, Montreal, H3G 2A8, Canada
| | - Vincent L Gracco
- Haskins Laboratories, 300 George Street, New Haven, Connecticut 06511, USA
| |
Collapse
|
10
|
El Boghdady N, Gaudrain E, Başkent D. Does good perception of vocal characteristics relate to better speech-on-speech intelligibility for cochlear implant users? THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 145:417. [PMID: 30710943 DOI: 10.1121/1.5087693] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Accepted: 12/21/2018] [Indexed: 06/09/2023]
Abstract
Differences in voice pitch (F0) and vocal tract length (VTL) improve intelligibility of speech masked by a background talker (speech-on-speech; SoS) for normal-hearing (NH) listeners. Cochlear implant (CI) users, who are less sensitive to these two voice cues compared to NH listeners, experience difficulties in SoS perception. Three research questions were addressed: (1) whether increasing the F0 and VTL difference (ΔF0; ΔVTL) between two competing talkers benefits CI users in SoS intelligibility and comprehension, (2) whether this benefit is related to their F0 and VTL sensitivity, and (3) whether their overall SoS intelligibility and comprehension are related to their F0 and VTL sensitivity. Results showed: (1) CI users did not benefit in SoS perception from increasing ΔF0 and ΔVTL; increasing ΔVTL had a slightly detrimental effect on SoS intelligibility and comprehension. Results also showed: (2) the effect from increasing ΔF0 on SoS intelligibility was correlated with F0 sensitivity, while the effect from increasing ΔVTL on SoS comprehension was correlated with VTL sensitivity. Finally, (3) the sensitivity to both F0 and VTL, and not only one of them, was found to be correlated with overall SoS performance, elucidating important aspects of voice perception that should be optimized through future coding strategies.
Collapse
Affiliation(s)
- Nawal El Boghdady
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
11
|
Shafiro V, Fogerty D, Smith K, Sheft S. Perceptual Organization of Interrupted Speech and Text. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2018; 61:2578-2588. [PMID: 30458532 PMCID: PMC6428238 DOI: 10.1044/2018_jslhr-h-17-0477] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Revised: 03/23/2018] [Accepted: 06/03/2018] [Indexed: 05/19/2023]
Abstract
PURPOSE Visual recognition of interrupted text may predict speech intelligibility under adverse listening conditions. This study investigated the nature of the linguistic information and perceptual processes underlying this relationship. METHOD To directly compare the perceptual organization of interrupted speech and text, we examined the recognition of spoken and printed sentences interrupted at different rates in 14 adults with normal hearing. The interruption method approximated deletion and retention of rate-specific linguistic information (0.5-64 Hz) in speech by substituting either white space or silent intervals for text or speech in the original sentences. RESULTS A similar U-shaped pattern of cross-rate variation in performance was observed in both modalities, with minima at 2 Hz. However, at the highest and lowest interruption rates, recognition accuracy was greater for text than speech, whereas the reverse was observed at middle rates. An analysis of word duration and the frequency of word sampling across interruption rates suggested that the location of the function minima was influenced by perceptual reconstruction of whole words. Overall, the findings indicate a high degree of similarity in the perceptual organization of interrupted speech and text. CONCLUSION The observed rate-specific variation in the perception of speech and text may potentially affect the degree to which recognition accuracy in one modality is predictive of the other.
Collapse
Affiliation(s)
- Valeriy Shafiro
- Department of Communication Disorders & Sciences, Rush University Medical Center, Chicago, IL
| | - Daniel Fogerty
- Department of Communication Sciences & Disorders, University of South Carolina, Columbia
| | - Kimberly Smith
- Department of Speech Pathology & Audiology, University of South Alabama, Mobile
| | - Stanley Sheft
- Department of Communication Disorders & Sciences, Rush University Medical Center, Chicago, IL
| |
Collapse
|
12
|
Kreitewolf J, Mathias SR, Trapeau R, Obleser J, Schönwiesner M. Perceptual grouping in the cocktail party: Contributions of voice-feature continuity. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:2178. [PMID: 30404485 DOI: 10.1121/1.5058684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 09/18/2018] [Indexed: 06/08/2023]
Abstract
Cocktail parties pose a difficult yet solvable problem for the auditory system. Previous work has shown that the cocktail-party problem is considerably easier when all sounds in the target stream are spoken by the same talker (the voice-continuity benefit). The present study investigated the contributions of two of the most salient voice features-glottal-pulse rate (GPR) and vocal-tract length (VTL)-to the voice-continuity benefit. Twenty young, normal-hearing listeners participated in two experiments. On each trial, listeners heard concurrent sequences of spoken digits from three different spatial locations and reported the digits coming from a target location. Critically, across conditions, GPR and VTL either remained constant or varied across target digits. Additionally, across experiments, the target location either remained constant (Experiment 1) or varied (Experiment 2) within a trial. In Experiment 1, listeners benefited from continuity in either voice feature, but VTL continuity was more helpful than GPR continuity. In Experiment 2, spatial discontinuity greatly hindered listeners' abilities to exploit continuity in GPR and VTL. The present results suggest that selective attention benefits from continuity in target voice features and that VTL and GPR play different roles for perceptual grouping and stream segregation in the cocktail party.
Collapse
Affiliation(s)
- Jens Kreitewolf
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Samuel R Mathias
- Neurocognition, Neurocomputation and Neurogenetics (n3) Division, Yale University School of Medicine, 40 Temple Street, New Haven, Connecticut 06511, USA
| | - Régis Trapeau
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| | - Jonas Obleser
- Department of Psychology, University of Lübeck, Maria-Goeppert-Straße 9a, D-23562 Lübeck, Germany
| | - Marc Schönwiesner
- International Laboratory for Brain, Music and Sound Research (BRAMS), Department of Psychology, Université de Montréal, Pavillon 1420 Boulevard Mont-Royal, Outremont, Quebec, H2V 4P3, Canada
| |
Collapse
|
13
|
Oganyan M, Wright R, Herschensohn J. The role of the root in auditory word recognition of Hebrew. Cortex 2018; 116:286-293. [PMID: 30037635 DOI: 10.1016/j.cortex.2018.06.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Revised: 04/14/2018] [Accepted: 06/23/2018] [Indexed: 11/16/2022]
Abstract
Evidence from visual word recognition has shown that the root morpheme plays a particularly important role in recognition of nouns in templatic languages [e.g., Velan & Frost, 2009 (Hebrew), Perea, abu Mallouh, & Carreiras, 2010 (Arabic)]. Letter transposition studies in masked priming have proved a useful tool for investigating letter flexibility in the visual domain. Due to the linear nature of the auditory signal, such manipulation is not possible for spoken words. In this study, we use a novel application of the phonemic restoration paradigm to explore the role of morphology in auditory word recognition. In two separate experiments, we show that in auditory word recognition the root plays an important role in Hebrew noun recognition, with words with masked root sounds being especially difficult to recover. This study provides additional evidence in favor of the privileged role of the root in Semitic lexical access and its function in morphological decomposition.
Collapse
Affiliation(s)
- Marina Oganyan
- Linguistics, University of Washington, Seattle, United States.
| | - Richard Wright
- Linguistics, University of Washington, Seattle, United States.
| | | |
Collapse
|
14
|
van Knijff EC, Coene M, Govaerts PJ. Speech understanding in noise in elderly adults: the effect of inhibitory control and syntactic complexity. INTERNATIONAL JOURNAL OF LANGUAGE & COMMUNICATION DISORDERS 2018; 53:628-642. [PMID: 29446191 DOI: 10.1111/1460-6984.12376] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 01/12/2018] [Accepted: 01/16/2018] [Indexed: 06/08/2023]
Abstract
BACKGROUND Previous research has suggested that speech perception in elderly adults is influenced not only by age-related hearing loss or presbycusis but also by declines in cognitive abilities, by background noise and by the syntactic complexity of the message. AIMS To gain further insight into the influence of these cognitive as well as acoustic and linguistic factors on speech perception in elderly adults by investigating inhibitory control as a listener characteristic and background noise type and syntactic complexity as input characteristics. METHODS & PROCEDURES Phoneme identification was measured in different noise conditions and in different linguistic contexts (single words, sentences with varying syntactic complexity). Additionally, inhibitory control was measured using a visual stimulus-response matching task. Fifty-one adults participated in this study, including elderly adults with age-related hearing loss (n = 9) and with normal hearing (n = 17), and a control group of normal hearing younger adults (n = 25). OUTCOMES & RESULTS The analysis revealed that elderly adults with normal hearing and with hearing loss were less likely to identify successfully phonemes in single words than younger normal hearing controls. In the context of sentences, only elderly adults with hearing loss had a lower odds of correct phoneme perception than the control group. Additionally, in elderly adults with hearing loss, phoneme-in-sentence perception was linked to age-related declines in inhibitory control. In all participants, phoneme identification in sentences was influenced by both noise type and syntactic complexity. CONCLUSIONS & IMPLICATIONS Inhibitory control and syntactic complexity might play a significant role in speech perception, especially in elderly listeners. These factors might also influence the results of clinical assessments of speech perception. Testing procedures thus need to be selected and their results interpreted carefully with these influences in mind.
Collapse
Affiliation(s)
- Eline C van Knijff
- Language and Hearing Center Amsterdam, Vrije Universiteit Amsterdam, the Netherlands
| | - Martine Coene
- Language and Hearing Center Amsterdam, Vrije Universiteit Amsterdam, the Netherlands
- The Eargroup, Antwerp, Belgium
| | - Paul J Govaerts
- Language and Hearing Center Amsterdam, Vrije Universiteit Amsterdam, the Netherlands
- The Eargroup, Antwerp, Belgium
| |
Collapse
|
15
|
Clarke J, Kazanoğlu D, Başkent D, Gaudrain E. Effect of F0 contours on top-down repair of interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:EL7. [PMID: 28764445 DOI: 10.1121/1.4990398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Top-down repair of interrupted speech can be influenced by bottom-up acoustic cues such as voice pitch (F0). This study aims to investigate the role of the dynamic information of pitch, i.e., F0 contours, in top-down repair of speech. Intelligibility of sentences interrupted with silence or noise was measured in five F0 contour conditions (inverted, flat, original, exaggerated with a factor of 1.5 and 1.75). The main hypothesis was that manipulating F0 contours would impair linking successive segments of interrupted speech and thus negatively affect top-down repair. Intelligibility of interrupted speech was impaired only by misleading dynamic information (inverted F0 contours). The top-down repair of interrupted speech was not affected by any F0 contours manipulation.
Collapse
Affiliation(s)
- Jeanne Clarke
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Deniz Kazanoğlu
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| |
Collapse
|
16
|
Carroll R, Ruigendijk E. ERP responses to processing prosodic phrasing of sentences in amplitude modulated noise. Neuropsychologia 2016; 82:91-103. [PMID: 26776233 DOI: 10.1016/j.neuropsychologia.2016.01.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2015] [Revised: 01/11/2016] [Accepted: 01/12/2016] [Indexed: 10/22/2022]
Abstract
Intonation phrase boundaries (IPBs) were hypothesized to be especially difficult to process in the presence of an amplitude modulated noise masker because of a potential rhythmic competition. In an event-related potential study, IPBs were presented in silence, stationary, and amplitude modulated noise. We elicited centro-parietal Closure Positive Shifts (CPS) in 23 young adults with normal hearing at IPBs in all acoustic conditions, albeit with some differences. CPS peak amplitudes were highest in stationary noise, followed by modulated noise, and lowest in silence. Both noise types elicited CPS delays, slightly more so in stationary compared to amplitude modulated noise. These data suggest that amplitude modulation is not tantamount to a rhythmic competitor for prosodic phrasing but rather supports an assumed speech perception benefit due to local release from masking. The duration of CPS time windows was, however, not only longer in noise compared to silence, but also longer for amplitude modulated compared to stationary noise. This is interpreted as support for additional processing load associated with amplitude modulation for the CPS component. Taken together, processing prosodic phrasing of sentences in amplitude modulated noise seems to involve the same issues that have been observed for the perception and processing of segmental information that are related to lexical items presented in noise: a benefit from local release from masking, even for prosodic cues, and a detrimental additional processing load that is associated with either stream segregation or signal reconstruction.
Collapse
Affiliation(s)
- Rebecca Carroll
- Cluster of Excellence 'Hearing4all', University of Oldenburg, Germany; Institute of Dutch Studies, University of Oldenburg, Ammerländer Heerstraße 114-118, 26111 Oldenburg, Germany.
| | - Esther Ruigendijk
- Cluster of Excellence 'Hearing4all', University of Oldenburg, Germany; Institute of Dutch Studies, University of Oldenburg, Ammerländer Heerstraße 114-118, 26111 Oldenburg, Germany
| |
Collapse
|
17
|
Clarke J, Başkent D, Gaudrain E. Pitch and spectral resolution: A systematic comparison of bottom-up cues for top-down repair of degraded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:395-405. [PMID: 26827034 DOI: 10.1121/1.4939962] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The brain is capable of restoring missing parts of speech, a top-down repair mechanism that enhances speech understanding in noisy environments. This enhancement can be quantified using the phonemic restoration paradigm, i.e., the improvement in intelligibility when silent interruptions of interrupted speech are filled with noise. Benefit from top-down repair of speech differs between cochlear implant (CI) users and normal-hearing (NH) listeners. This difference could be due to poorer spectral resolution and/or weaker pitch cues inherent to CI transmitted speech. In CIs, those two degradations cannot be teased apart because spectral degradation leads to weaker pitch representation. A vocoding method was developed to evaluate independently the roles of pitch and spectral resolution for restoration in NH individuals. Sentences were resynthesized with different spectral resolutions and with either retaining the original pitch cues or discarding them all. The addition of pitch significantly improved restoration only at six-bands spectral resolution. However, overall intelligibility of interrupted speech was improved both with the addition of pitch and with the increase in spectral resolution. This improvement may be due to better discrimination of speech segments from the filler noise, better grouping of speech segments together, and/or better bottom-up cues available in the speech segments.
Collapse
Affiliation(s)
- Jeanne Clarke
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands
| |
Collapse
|
18
|
The effect of visual cues on top-down restoration of temporally interrupted speech, with and without further degradations. Hear Res 2015; 328:24-33. [PMID: 26117407 DOI: 10.1016/j.heares.2015.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2015] [Revised: 06/15/2015] [Accepted: 06/22/2015] [Indexed: 11/21/2022]
Abstract
In complex listening situations, cognitive restoration mechanisms are commonly used to enhance perception of degraded speech with inaudible segments. Profoundly hearing-impaired people with a cochlear implant (CI) show less benefit from such mechanisms. However, both normal hearing (NH) listeners and CI users do benefit from visual speech cues in these listening situations. In this study we investigated if an accompanying video of the speaker can enhance the intelligibility of interrupted sentences and the phonemic restoration benefit, measured by an increase in intelligibility when the silent intervals are filled with noise. Similar to previous studies, restoration benefit was observed with interrupted speech without spectral degradations (Experiment 1), but was absent in acoustic simulations of CIs (Experiment 2) and was present again in simulations of electric-acoustic stimulation (Experiment 3). In all experiments, the additional speech information provided by the complementary visual cues lead to overall higher intelligibility, however, these cues did not influence the occurrence or extent of the phonemic restoration benefit of filler noise. Results imply that visual cues do not show a synergistic effect with the filler noise, as adding them equally increased the intelligibility of interrupted sentences with or without the filler noise.
Collapse
|
19
|
The verbal transformation effect and the perceptual organization of speech: influence of formant transitions and F0-contour continuity. Hear Res 2015; 323:22-31. [PMID: 25620314 DOI: 10.1016/j.heares.2015.01.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Revised: 01/09/2015] [Accepted: 01/12/2015] [Indexed: 11/22/2022]
Abstract
This study explored the role of formant transitions and F0-contour continuity in binding together speech sounds into a coherent stream. Listening to a repeating recorded word produces verbal transformations to different forms; stream segregation contributes to this effect and so it can be used to measure changes in perceptual coherence. In experiment 1, monosyllables with strong formant transitions between the initial consonant and following vowel were monotonized; each monosyllable was paired with a weak-transitions counterpart. Further stimuli were derived by replacing the consonant-vowel transitions with samples from adjacent steady portions. Each stimulus was concatenated into a 3-min-long sequence. Listeners only reported more forms in the transitions-removed condition for strong-transitions words, for which formant-frequency discontinuities were substantial. In experiment 2, the F0 contour of all-voiced monosyllables was shaped to follow a rising or falling pattern, spanning one octave. Consecutive tokens either had the same contour, giving an abrupt F0 change between each token, or alternated, giving a continuous contour. Discontinuous sequences caused more transformations and forms, and shorter times to the first transformation. Overall, these findings support the notion that continuity cues provided by formant transitions and the F0 contour play an important role in maintaining the perceptual coherence of speech.
Collapse
|
20
|
Fuller CD, Gaudrain E, Clarke JN, Galvin JJ, Fu QJ, Free RH, Başkent D. Gender categorization is abnormal in cochlear implant users. J Assoc Res Otolaryngol 2014; 15:1037-48. [PMID: 25172111 DOI: 10.1007/s10162-014-0483-7] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 07/29/2014] [Indexed: 11/29/2022] Open
Abstract
In normal hearing (NH), the perception of the gender of a speaker is strongly affected by two anatomically related vocal characteristics: the fundamental frequency (F0), related to vocal pitch, and the vocal tract length (VTL), related to the height of the speaker. Previous studies on gender categorization in cochlear implant (CI) users found that performance was variable, with few CI users performing at the level of NH listeners. Data collected with recorded speech produced by multiple talkers suggests that CI users might rely more on F0 and less on VTL than NH listeners. However, because VTL cannot be accurately estimated from recordings, it is difficult to know how VTL contributes to gender categorization. In the present study, speech was synthesized to systematically vary F0, VTL, or both. Gender categorization was measured in CI users, as well as in NH participants listening to unprocessed (only synthesized) and vocoded (and synthesized) speech. Perceptual weights for F0 and VTL were derived from the performance data. With unprocessed speech, NH listeners used both cues (normalized perceptual weight: F0 = 3.76, VTL = 5.56). With vocoded speech, NH listeners still made use of both cues but less efficiently (normalized perceptual weight: F0 = 1.68, VTL = 0.63). CI users relied almost exclusively on F0 while VTL perception was profoundly impaired (normalized perceptual weight: F0 = 6.88, VTL = 0.59). As a result, CI users' gender categorization was abnormal compared to NH listeners. Future CI signal processing should aim to improve the transmission of both F0 cues and VTL cues, as a normal gender categorization may benefit speech understanding in competing talker situations.
Collapse
Affiliation(s)
- Christina D Fuller
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, P.O. Box 30.001, BB21, 9700 RB, Groningen, The Netherlands,
| | | | | | | | | | | | | |
Collapse
|