1
|
Ueda K, Hashimoto M, Takeichi H, Wakamiya K. Interrupted mosaic speech revisited: Gain and loss in intelligibility by stretchinga). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:1767-1779. [PMID: 38441439 DOI: 10.1121/10.0025132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 02/16/2024] [Indexed: 03/07/2024]
Abstract
Our previous investigation on the effect of stretching spectrotemporally degraded and temporally interrupted speech stimuli showed remarkable intelligibility gains [Udea, Takeichi, and Wakamiya (2022). J. Acoust. Soc. Am. 152(2), 970-980]. In this previous study, however, gap durations and temporal resolution were confounded. In the current investigation, we therefore observed the intelligibility of so-called mosaic speech while dissociating the effects of interruption and temporal resolution. The intelligibility of mosaic speech (20 frequency bands and 20 ms segment duration) declined from 95% to 78% and 33% by interrupting it with 20 and 80 ms gaps. Intelligibility improved, however, to 92% and 54% (14% and 21% gains for 20 and 80 ms gaps, respectively) by stretching mosaic segments to fill silent gaps (n = 21). By contrast, the intelligibility was impoverished to a minimum of 9% (7% loss) when stretching stimuli interrupted with 160 ms gaps. Explanations based on auditory grouping, modulation unmasking, or phonemic restoration may account for the intelligibility improvement by stretching, but not for the loss. The probability summation model accounted for "U"-shaped intelligibility curves and the gain and loss of intelligibility, suggesting that perceptual unit length and speech rate may affect the intelligibility of spectrotemporally degraded speech stimuli.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Masashi Hashimoto
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
2
|
Ueda K, Doan LLD, Takeichi H. Checkerboard and interrupted speech: Intelligibility contrasts related to factor-analysis-based frequency bandsa). THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:2010-2020. [PMID: 37782122 DOI: 10.1121/10.0021165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 09/08/2023] [Indexed: 10/03/2023]
Abstract
It has been shown that the intelligibility of checkerboard speech stimuli, in which speech signals were periodically interrupted in time and frequency, drastically varied according to the combination of the number of frequency bands (2-20) and segment duration (20-320 ms). However, the effects of the number of frequency bands between 4 and 20 and the frequency division parameters on intelligibility have been largely unknown. Here, we show that speech intelligibility was lowest in four-band checkerboard speech stimuli, except for the 320-ms segment duration. Then, temporally interrupted speech stimuli and eight-band checkerboard speech stimuli came in this order (N = 19 and 20). At the same time, U-shaped intelligibility curves were observed for four-band and possibly eight-band checkerboard speech stimuli. Furthermore, different parameters of frequency division resulted in small but significant intelligibility differences at the 160- and 320-ms segment duration in four-band checkerboard speech stimuli. These results suggest that factor-analysis-based four frequency bands, representing groups of critical bands correlating with each other in speech power fluctuations, work as speech cue channels essential for speech perception. Moreover, a probability summation model for perceptual units, consisting of a sub-unit process and a supra-unit process that receives outputs of the speech cue channels, may account for the U-shaped intelligibility curves.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Acoustic Design, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Linh Le Dieu Doan
- Human Science Course, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information R&D and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
3
|
Gao T, Pan Q, Zhou J, Wang H, Tao L, Kwan HK. A Novel Attention-Guided Generative Adversarial Network for Whisper-to-Normal Speech Conversion. Cognit Comput 2023. [DOI: 10.1007/s12559-023-10108-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
4
|
Burleson AM, Souza PE. Cognitive and linguistic abilities and perceptual restoration of missing speech: Evidence from online assessment. Front Psychol 2022; 13:1059192. [PMID: 36571056 PMCID: PMC9773209 DOI: 10.3389/fpsyg.2022.1059192] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Accepted: 11/23/2022] [Indexed: 12/13/2022] Open
Abstract
When speech is clear, speech understanding is a relatively simple and automatic process. However, when the acoustic signal is degraded, top-down cognitive and linguistic abilities, such as working memory capacity, lexical knowledge (i.e., vocabulary), inhibitory control, and processing speed can often support speech understanding. This study examined whether listeners aged 22-63 (mean age 42 years) with better cognitive and linguistic abilities would be better able to perceptually restore missing speech information than those with poorer scores. Additionally, the role of context and everyday speech was investigated using high-context, low-context, and realistic speech corpi to explore these effects. Sixty-three adult participants with self-reported normal hearing completed a short cognitive and linguistic battery before listening to sentences interrupted by silent gaps or noise bursts. Results indicated that working memory was the most reliable predictor of perceptual restoration ability, followed by lexical knowledge, and inhibitory control and processing speed. Generally, silent gap conditions were related to and predicted by a broader range of cognitive abilities, whereas noise burst conditions were related to working memory capacity and inhibitory control. These findings suggest that higher-order cognitive and linguistic abilities facilitate the top-down restoration of missing speech information and contribute to individual variability in perceptual restoration.
Collapse
|
5
|
Ueda K, Takeichi H, Wakamiya K. Auditory grouping is necessary to understand interrupted mosaic speech stimuli. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:970. [PMID: 36050149 PMCID: PMC9553289 DOI: 10.1121/10.0013425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/02/2022] [Revised: 07/13/2022] [Accepted: 07/21/2022] [Indexed: 06/15/2023]
Abstract
The intelligibility of interrupted speech stimuli has been known to be almost perfect when segment duration is shorter than 80 ms, which means that the interrupted segments are perceptually organized into a coherent stream under this condition. However, why listeners can successfully group the interrupted segments into a coherent stream has been largely unknown. Here, we show that the intelligibility for mosaic speech in which original speech was segmented in frequency and time and noise-vocoded with the average power in each unit was largely reduced by periodical interruption. At the same time, the intelligibility could be recovered by promoting auditory grouping of the interrupted segments by stretching the segments up to 40 ms and reducing the gaps, provided that the number of frequency bands was enough ( ≥ 4) and the original segment duration was equal to or less than 40 ms. The interruption was devastating for mosaic speech stimuli, very likely because the deprivation of periodicity and temporal fine structure with mosaicking prevented successful auditory grouping for the interrupted segments.
Collapse
Affiliation(s)
- Kazuo Ueda
- Department of Human Science, Faculty of Design/Research Center for Applied Perceptual Science/Research and Development Center for Five-Sense Devices, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| | - Hiroshige Takeichi
- Open Systems Information Science Team, Advanced Data Science Project (ADSP), RIKEN Information Research and Development and Strategy Headquarters (R-IH), RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kohei Wakamiya
- Department of Communication Design Science, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, Japan
| |
Collapse
|
6
|
Jaekel BN, Weinstein S, Newman RS, Goupell MJ. Impacts of signal processing factors on perceptual restoration in cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:2898. [PMID: 35649892 PMCID: PMC9054268 DOI: 10.1121/10.0010258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Cochlear-implant (CI) users have previously demonstrated perceptual restoration, or successful repair of noise-interrupted speech, using the interrupted sentences paradigm [Bhargava, Gaudrain, and Başkent (2014). "Top-down restoration of speech in cochlear-implant users," Hear. Res. 309, 113-123]. The perceptual restoration effect was defined experimentally as higher speech understanding scores with noise-burst interrupted sentences compared to silent-gap interrupted sentences. For the perceptual restoration illusion to occur, it is often necessary for the masking or interrupting noise bursts to have a higher intensity than the adjacent speech signal to be perceived as a plausible masker. Thus, signal processing factors like noise reduction algorithms and automatic gain control could have a negative impact on speech repair in this population. Surprisingly, evidence that participants with cochlear implants experienced the perceptual restoration illusion was not observed across the two planned experiments. A separate experiment, which aimed to provide a close replication of previous work on perceptual restoration in CI users, also found no consistent evidence of perceptual restoration, contrasting the original study's previously reported findings. Typical speech repair of interrupted sentences was not observed in the present work's sample of CI users, and signal-processing factors did not appear to affect speech repair.
Collapse
Affiliation(s)
- Brittany N Jaekel
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Sarah Weinstein
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
7
|
Koelewijn T, Gaudrain E, Tamati T, Başkent D. The effects of lexical content, acoustic and linguistic variability, and vocoding on voice cue perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1620. [PMID: 34598602 DOI: 10.1121/10.0005938] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Accepted: 08/02/2021] [Indexed: 06/13/2023]
Abstract
Perceptual differences in voice cues, such as fundamental frequency (F0) and vocal tract length (VTL), can facilitate speech understanding in challenging conditions. Yet, we hypothesized that in the presence of spectrotemporal signal degradations, as imposed by cochlear implants (CIs) and vocoders, acoustic cues that overlap for voice perception and phonemic categorization could be mistaken for one another, leading to a strong interaction between linguistic and indexical (talker-specific) content. Fifteen normal-hearing participants performed an odd-one-out adaptive task measuring just-noticeable differences (JNDs) in F0 and VTL. Items used were words (lexical content) or time-reversed words (no lexical content). The use of lexical content was either promoted (by using variable items across comparison intervals) or not (fixed item). Finally, stimuli were presented without or with vocoding. Results showed that JNDs for both F0 and VTL were significantly smaller (better) for non-vocoded compared with vocoded speech and for fixed compared with variable items. Lexical content (forward vs reversed) affected VTL JNDs in the variable item condition, but F0 JNDs only in the non-vocoded, fixed condition. In conclusion, lexical content had a positive top-down effect on VTL perception when acoustic and linguistic variability was present but not on F0 perception. Lexical advantage persisted in the most degraded conditions and vocoding even enhanced the effect of item variability, suggesting that linguistic content could support compensation for poor voice perception in CI users.
Collapse
Affiliation(s)
- Thomas Koelewijn
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Etienne Gaudrain
- CNRS Unité Mixte de Recherche 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Institut National de la Santé et de la Recherche Médicale, UMRS 1028, Université Claude Bernard Lyon 1, Université de Lyon, Lyon, France
| | - Terrin Tamati
- Department of Otolaryngology-Head & Neck Surgery, The Ohio State University Wexner Medical Center, The Ohio State University, Columbus, Ohio, USA
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
8
|
Jaekel BN, Weinstein S, Newman RS, Goupell MJ. Access to semantic cues does not lead to perceptual restoration of interrupted speech in cochlear-implant users. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:1488. [PMID: 33765790 PMCID: PMC7935498 DOI: 10.1121/10.0003573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 02/01/2021] [Accepted: 02/04/2021] [Indexed: 05/19/2023]
Abstract
Cochlear-implant (CI) users experience less success in understanding speech in noisy, real-world listening environments than normal-hearing (NH) listeners. Perceptual restoration is one method NH listeners use to repair noise-interrupted speech. Whereas previous work has reported that CI users can use perceptual restoration in certain cases, they failed to do so under listening conditions in which NH listeners can successfully restore. Providing increased opportunities to use top-down linguistic knowledge is one possible method to increase perceptual restoration use in CI users. This work tested perceptual restoration abilities in 18 CI users and varied whether a semantic cue (presented visually) was available prior to the target sentence (presented auditorily). Results showed that whereas access to a semantic cue generally improved performance with interrupted speech, CI users failed to perceptually restore speech regardless of the semantic cue availability. The lack of restoration in this population directly contradicts previous work in this field and raises questions of whether restoration is possible in CI users. One reason for speech-in-noise understanding difficulty in CI users could be that they are unable to use tools like restoration to process noise-interrupted speech effectively.
Collapse
Affiliation(s)
- Brittany N Jaekel
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Sarah Weinstein
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
9
|
Auditory processing in children: Role of working memory and lexical ability in auditory closure. PLoS One 2020; 15:e0240534. [PMID: 33147602 PMCID: PMC7641369 DOI: 10.1371/journal.pone.0240534] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Accepted: 09/28/2020] [Indexed: 11/19/2022] Open
Abstract
We examined the relationship between cognitive-linguistic mechanisms and auditory closure ability in children. Sixty-seven school-age children recognized isolated words and keywords in sentences that were interrupted at a rate of 2.5 Hz and 5 Hz. In essence, children were given only 50% of speech information and asked to repeat the complete word or sentence. Children’s working memory capacity (WMC), attention, lexical knowledge, and retrieval from long-term memory (LTM) abilities were also measured to model their role in auditory closure ability. Overall, recognition of monosyllabic words and lexically easy multisyllabic words was significantly better at 2.5 Hz interruption rate than 5 Hz. Recognition of lexically hard multisyllabic words and keywords in sentences was better at 5 Hz relative to 2.5 Hz. Based on the best fit generalized “logistic” linear mixed effects models, there was a significant interaction between WMC and lexical difficulty of words. WMC was positively related only to recognition of lexically easy words. Lexical knowledge was found to be crucial for recognition of words and sentences, regardless of interruption rate. In addition, LTM retrieval ability was significantly associated with sentence recognition. These results suggest that lexical knowledge and the ability to retrieve information from LTM is crucial for children’s speech recognition in adverse listening situations. Study findings make a compelling case for the assessment and intervention of lexical knowledge and retrieval abilities in children with listening difficulties.
Collapse
|
10
|
Vijayasarathy S, Barman A. Relationship between Speech Perception in Noise and Phonemic Restoration of Speech in Noise in Individuals with Normal Hearing. J Audiol Otol 2020; 24:167-173. [PMID: 32829626 PMCID: PMC7575917 DOI: 10.7874/jao.2019.00472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 06/10/2020] [Indexed: 11/24/2022] Open
Abstract
Background and Objectives Top-down restoration of distorted speech, tapped as phonemic restoration of speech in noise, maybe a useful tool to understand robustness of perception in adverse listening situations. However, the relationship between phonemic restoration and speech perception in noise is not empirically clear. Subjects and Methods 20 adults (40-55 years) with normal audiometric findings were part of the study. Sentence perception in noise performance was studied with various signal-to-noise ratios (SNRs) to estimate the SNR with 50% score. Performance was also measured for sentences interrupted with silence and for those interrupted by speech noise at -10, -5, 0, and 5 dB SNRs. The performance score in the noise interruption condition was subtracted by quiet interruption condition to determine the phonemic restoration magnitude. Results Fairly robust improvements in speech intelligibility was found when the sentences were interrupted with speech noise instead of silence. Improvement with increasing noise levels was non-monotonic and reached a maximum at -10 dB SNR. Significant correlation between speech perception in noise performance and phonemic restoration of sentences interrupted with -10 dB SNR speech noise was found. Conclusions It is possible that perception of speech in noise is associated with top-down processing of speech, tapped as phonemic restoration of interrupted speech. More research with a larger sample size is indicated since the restoration is affected by the type of speech material and noise used, age, working memory, and linguistic proficiency, and has a large individual variability.
Collapse
Affiliation(s)
- Srikar Vijayasarathy
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, University of Mysore-Mysuru, Karnataka, India
| | - Animesh Barman
- Department of Audiology, All India Institute of Speech and Hearing, Manasagangothri, University of Mysore-Mysuru, Karnataka, India
| |
Collapse
|
11
|
Gaudrain E, Başkent D. Discrimination of Voice Pitch and Vocal-Tract Length in Cochlear Implant Users. Ear Hear 2019; 39:226-237. [PMID: 28799983 PMCID: PMC5839701 DOI: 10.1097/aud.0000000000000480] [Citation(s) in RCA: 73] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 06/29/2017] [Indexed: 12/02/2022]
Abstract
OBJECTIVES When listening to two competing speakers, normal-hearing (NH) listeners can take advantage of voice differences between the speakers. Users of cochlear implants (CIs) have difficulty in perceiving speech on speech. Previous literature has indicated sensitivity to voice pitch (related to fundamental frequency, F0) to be poor among implant users, while sensitivity to vocal-tract length (VTL; related to the height of the speaker and formant frequencies), the other principal voice characteristic, has not been directly investigated in CIs. A few recent studies evaluated F0 and VTL perception indirectly, through voice gender categorization, which relies on perception of both voice cues. These studies revealed that, contrary to prior literature, CI users seem to rely exclusively on F0 while not utilizing VTL to perform this task. The objective of the present study was to directly and systematically assess raw sensitivity to F0 and VTL differences in CI users to define the extent of the deficit in voice perception. DESIGN The just-noticeable differences (JNDs) for F0 and VTL were measured in 11 CI listeners using triplets of consonant-vowel syllables in an adaptive three-alternative forced choice method. RESULTS The results showed that while NH listeners had average JNDs of 1.95 and 1.73 semitones (st) for F0 and VTL, respectively, CI listeners showed JNDs of 9.19 and 7.19 st. These JNDs correspond to differences of 70% in F0 and 52% in VTL. For comparison to the natural range of voices in the population, the F0 JND in CIs remains smaller than the typical male-female F0 difference. However, the average VTL JND in CIs is about twice as large as the typical male-female VTL difference. CONCLUSIONS These findings, thus, directly confirm that CI listeners do not seem to have sufficient access to VTL cues, likely as a result of limited spectral resolution, and, hence, that CI listeners' voice perception deficit goes beyond poor perception of F0. These results provide a potential common explanation not only for a number of deficits observed in CI listeners, such as voice identification and gender categorization, but also for competing speech perception.
Collapse
Affiliation(s)
- Etienne Gaudrain
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology-Head and Neck Surgery, Groningen, The Netherlands; CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Université Lyon, Lyon, France; and Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| | - Deniz Başkent
- University of Groningen, University Medical Center Groningen, Department of Otorhinolaryngology-Head and Neck Surgery, Groningen, The Netherlands; CNRS UMR 5292, Lyon Neuroscience Research Center, Auditory Cognition and Psychoacoustics, Université Lyon, Lyon, France; and Research School of Behavioral and Cognitive Neurosciences, Graduate School of Medical Sciences, University of Groningen, Groningen, The Netherlands
| |
Collapse
|
12
|
Amichetti NM, Atagi E, Kong YY, Wingfield A. Linguistic Context Versus Semantic Competition in Word Recognition by Younger and Older Adults With Cochlear Implants. Ear Hear 2019; 39:101-109. [PMID: 28700448 PMCID: PMC5741484 DOI: 10.1097/aud.0000000000000469] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
OBJECTIVES The increasing numbers of older adults now receiving cochlear implants raises the question of how the novel signal produced by cochlear implants may interact with cognitive aging in the recognition of words heard spoken within a linguistic context. The objective of this study was to pit the facilitative effects of a constraining linguistic context against a potential age-sensitive negative effect of response competition on effectiveness of word recognition. DESIGN Younger (n = 8; mean age = 22.5 years) and older (n = 8; mean age = 67.5 years) adult implant recipients heard 20 target words as the final words in sentences that manipulated the target word's probability of occurrence within the sentence context. Data from published norms were also used to measure response entropy, calculated as the total number of different responses and the probability distribution of the responses suggested by the sentence context. Sentence-final words were presented to participants using a word-onset gating paradigm, in which a target word was presented with increasing amounts of its onset duration in 50 msec increments until the word was correctly identified. RESULTS Results showed that for both younger and older adult implant users, the amount of word-onset information needed for correct recognition of sentence-final words was inversely proportional to their likelihood of occurrence within the sentence context, with older adults gaining differential advantage from the contextual constraints offered by a sentence context. On the negative side, older adults' word recognition was differentially hampered by high response entropy, with this effect being driven primarily by the number of competing responses that might also fit the sentence context. CONCLUSIONS Consistent with previous research with normal-hearing younger and older adults, the present results showed older adult implant users' recognition of spoken words to be highly sensitive to linguistic context. This sensitivity, however, also resulted in a greater degree of interference from other words that might also be activated by the context, with negative effects on ease of word recognition. These results are consistent with an age-related inhibition deficit extending to the domain of semantic constraints on word recognition.
Collapse
Affiliation(s)
- Nicole M. Amichetti
- Volen National Center for Complex Systems, Brandeis University, Waltham, MA, USA
| | - Eriko Atagi
- Volen National Center for Complex Systems, Brandeis University, Waltham, MA, USA
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, USA
| | - Ying-Yee Kong
- Department of Communication Sciences and Disorders, Northeastern University, Boston, MA, USA
| | - Arthur Wingfield
- Volen National Center for Complex Systems, Brandeis University, Waltham, MA, USA
| |
Collapse
|
13
|
Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation. Symmetry (Basel) 2018. [DOI: 10.3390/sym10120750] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The performance of many speech processing algorithms depends on modeling speech signals using appropriate probability distributions. Various distributions such as the Gamma distribution, Gaussian distribution, Generalized Gaussian distribution, Laplace distribution as well as multivariate Gaussian and Laplace distributions have been proposed in the literature to model different segment lengths of speech, typically below 200 ms in different domains. In this paper, we attempted to fit Laplace and Gaussian distributions to obtain a statistical model of speech short-time Fourier transform coefficients with high spectral resolution (segment length >500 ms) and low spectral resolution (segment length <10 ms). Distribution fitting of Laplace and Gaussian distributions was performed using maximum-likelihood estimation. It was found that speech short-time Fourier transform coefficients with high spectral resolution can be modeled using Laplace distribution. For low spectral resolution, neither the Laplace nor Gaussian distribution provided a good fit. Spectral domain modeling of speech with different depths of spectral resolution is useful in understanding the perceptual stability of hearing which is necessary for the design of digital hearing aids.
Collapse
|
14
|
Nourski KV, Steinschneider M, Rhone AE, Kovach CK, Kawasaki H, Howard MA. Differential responses to spectrally degraded speech within human auditory cortex: An intracranial electrophysiology study. Hear Res 2018; 371:53-65. [PMID: 30500619 DOI: 10.1016/j.heares.2018.11.009] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 11/15/2018] [Accepted: 11/19/2018] [Indexed: 12/28/2022]
Abstract
Understanding cortical processing of spectrally degraded speech in normal-hearing subjects may provide insights into how sound information is processed by cochlear implant (CI) users. This study investigated electrocorticographic (ECoG) responses to noise-vocoded speech and related these responses to behavioral performance in a phonemic identification task. Subjects were neurosurgical patients undergoing chronic invasive monitoring for medically refractory epilepsy. Stimuli were utterances /aba/ and /ada/, spectrally degraded using a noise vocoder (1-4 bands). ECoG responses were obtained from Heschl's gyrus (HG) and superior temporal gyrus (STG), and were examined within the high gamma frequency range (70-150 Hz). All subjects performed at chance accuracy with speech degraded to 1 and 2 spectral bands, and at or near ceiling for clear speech. Inter-subject variability was observed in the 3- and 4-band conditions. High gamma responses in posteromedial HG (auditory core cortex) were similar for all vocoded conditions and clear speech. A progressive preference for clear speech emerged in anterolateral segments of HG, regardless of behavioral performance. On the lateral STG, responses to all vocoded stimuli were larger in subjects with better task performance. In contrast, both behavioral and neural responses to clear speech were comparable across subjects regardless of their ability to identify degraded stimuli. Findings highlight differences in representation of spectrally degraded speech across cortical areas and their relationship to perception. The results are in agreement with prior non-invasive results. The data provide insight into the neural mechanisms associated with variability in perception of degraded speech and potentially into sources of such variability in CI users.
Collapse
Affiliation(s)
- Kirill V Nourski
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, USA.
| | - Mitchell Steinschneider
- Departments of Neurology and Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Ariane E Rhone
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA
| | | | - Hiroto Kawasaki
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA
| | - Matthew A Howard
- Department of Neurosurgery, The University of Iowa, Iowa City, IA, USA; Iowa Neuroscience Institute, The University of Iowa, Iowa City, IA, USA; Pappajohn Biomedical Institute, The University of Iowa, Iowa City, IA, USA
| |
Collapse
|
15
|
Jaekel BN, Newman RS, Goupell MJ. Age effects on perceptual restoration of degraded interrupted sentences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 143:84. [PMID: 29390768 PMCID: PMC5758365 DOI: 10.1121/1.5016968] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Adult cochlear-implant (CI) users show small or non-existent perceptual restoration effects when listening to interrupted speech. Perceptual restoration is believed to be a top-down mechanism that enhances speech perception in adverse listening conditions, and appears to be particularly utilized by older normal-hearing participants. Whether older normal-hearing participants can derive any restoration benefits from degraded speech (as would be presented through a CI speech processor) is the focus of this study. Two groups of normal-hearing participants (younger: age ≤30 yrs; older: age ≥60 yrs) were tested for perceptual restoration effects in the context of interrupted sentences. Speech signal degradations were controlled by manipulating parameters of a noise vocoder and were used to analyze effects of spectral resolution and noise burst spectral content on perceptual restoration. Older normal-hearing participants generally showed larger and more consistent perceptual restoration benefits for vocoded speech than did younger normal-hearing participants, even in the lowest spectral resolution conditions. Reduced restoration in CI users thus may be caused by factors like noise reduction strategies or small dynamic ranges rather than an interaction of aging effects and low spectral resolution.
Collapse
Affiliation(s)
- Brittany N Jaekel
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Rochelle S Newman
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
16
|
Nagaraj NK, Magimairaj BM. Role of working memory and lexical knowledge in perceptual restoration of interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:3756. [PMID: 29289104 DOI: 10.1121/1.5018429] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Collapse
Affiliation(s)
- Naveen K Nagaraj
- Cognitive Hearing Science Lab, University of Arkansas for Medical Sciences and University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA
| | - Beula M Magimairaj
- Cognition and Language Lab, Communication Sciences and Disorders, University of Central Arkansas, Conway, Arkansas 72035, USA
| |
Collapse
|
17
|
Clarke J, Kazanoğlu D, Başkent D, Gaudrain E. Effect of F0 contours on top-down repair of interrupted speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 142:EL7. [PMID: 28764445 DOI: 10.1121/1.4990398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Top-down repair of interrupted speech can be influenced by bottom-up acoustic cues such as voice pitch (F0). This study aims to investigate the role of the dynamic information of pitch, i.e., F0 contours, in top-down repair of speech. Intelligibility of sentences interrupted with silence or noise was measured in five F0 contour conditions (inverted, flat, original, exaggerated with a factor of 1.5 and 1.75). The main hypothesis was that manipulating F0 contours would impair linking successive segments of interrupted speech and thus negatively affect top-down repair. Intelligibility of interrupted speech was impaired only by misleading dynamic information (inverted F0 contours). The top-down repair of interrupted speech was not affected by any F0 contours manipulation.
Collapse
Affiliation(s)
- Jeanne Clarke
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Deniz Kazanoğlu
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University of Groningen, University Medical Center Groningen, P.O. Box 30.001, BB21, 9700 RB Groningen, The Netherlands , , ,
| |
Collapse
|
18
|
Başkent D, Clarke J, Pals C, Benard MR, Bhargava P, Saija J, Sarampalis A, Wagner A, Gaudrain E. Cognitive Compensation of Speech Perception With Hearing Impairment, Cochlear Implants, and Aging. Trends Hear 2016. [PMCID: PMC5056620 DOI: 10.1177/2331216516670279] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
External degradations in incoming speech reduce understanding, and hearing impairment further compounds the problem. While cognitive mechanisms alleviate some of the difficulties, their effectiveness may change with age. In our research, reviewed here, we investigated cognitive compensation with hearing impairment, cochlear implants, and aging, via (a) phonemic restoration as a measure of top-down filling of missing speech, (b) listening effort and response times as a measure of increased cognitive processing, and (c) visual world paradigm and eye gazing as a measure of the use of context and its time course. Our results indicate that between speech degradations and their cognitive compensation, there is a fine balance that seems to vary greatly across individuals. Hearing impairment or inadequate hearing device settings may limit compensation benefits. Cochlear implants seem to allow the effective use of sentential context, but likely at the cost of delayed processing. Linguistic and lexical knowledge, which play an important role in compensation, may be successfully employed in advanced age, as some compensatory mechanisms seem to be preserved. These findings indicate that cognitive compensation in hearing impairment can be highly complicated—not always absent, but also not easily predicted by speech intelligibility tests only.
Collapse
Affiliation(s)
- Deniz Başkent
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Jeanne Clarke
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Carina Pals
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Michel R. Benard
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Pento Speech and Hearing Center Zwolle, Zwolle, Netherlands
| | - Pranesh Bhargava
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Jefta Saija
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Anastasios Sarampalis
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
- Department of Psychology, University of Groningen, Netherlands
| | - Anita Wagner
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
| | - Etienne Gaudrain
- Department of Otorhinolaryngology/Head and Neck Surgery, University Medical Center Groningen, University of Groningen, Netherlands
- Graduate School of Medical Sciences, University of Groningen, Netherlands
- Research School of Behavioural and Cognitive Neurosciences, University of Groningen, Netherlands
- Auditory Cognition and Psychoacoustics, CNRS, Lyon Neuroscience Research Center, Lyon, France
| |
Collapse
|
19
|
Vermeulen A, Verschuur C. Robustness against distortion of fundamental frequency cues in simulated electro-acoustic hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:229. [PMID: 27475149 DOI: 10.1121/1.4954752] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Speech recognition by cochlear implant users can be improved by adding an audible low frequency acoustic signal to electrical hearing; the resulting improvement is deemed "electro-acoustic stimulation (EAS) benefit." However, a crucial low frequency cue, fundamental frequency (F0), can be distorted via the impaired auditory system. In order to understand how F0 distortions may affect EAS benefit, normal-hearing listeners were presented monaurally with vocoded speech (frequencies >250 Hz) and an acoustical signal (frequencies <250 Hz) with differing manipulations of the F0 signal, specifically: a pure tone with the correct mean F0 but with smaller variations around this mean, or a narrowband of white noise centered around F0, at varying bandwidths; a pure tone down-shifted in frequency by 50 Hz but keeping overall frequency modulations. Speech-recognition thresholds improved when tones with reduced frequency modulation were presented, and improved significantly for noise bands maintaining F0 information. A down-shifted tone, or only a tone to indicate voicing, showed no EAS benefit. These results confirm that the presence of the target's F0 is beneficial for EAS hearing in a noisy environment, and they indicate that the benefit is robust to F0 distortion, as long as the mean F0 and frequency modulations of F0 are preserved.
Collapse
Affiliation(s)
- Arthur Vermeulen
- Hearing and Balance Centre, Institute of Sound and Vibration Research, University of Southampton, Highfield, Southampton SO17 1BJ, United Kingdom
| | - Carl Verschuur
- University of Southampton, Auditory Implant Service, Highfield, Southampton SO17 1BJ, United Kingdom
| |
Collapse
|