1
|
Shahidi LK, Collins LM, Mainsah BO. Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2151-2168. [PMID: 38501923 PMCID: PMC10959555 DOI: 10.1121/10.0025285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Accepted: 02/24/2024] [Indexed: 03/20/2024]
Abstract
Cochlear implant (CI) recipients often struggle to understand speech in reverberant environments. Speech enhancement algorithms could restore speech perception for CI listeners by removing reverberant artifacts from the CI stimulation pattern. Listening studies, either with cochlear-implant recipients or normal-hearing (NH) listeners using a CI acoustic model, provide a benchmark for speech intelligibility improvements conferred by the enhancement algorithm but are costly and time consuming. To reduce the associated costs during algorithm development, speech intelligibility could be estimated offline using objective intelligibility measures. Previous evaluations of objective measures that considered CIs primarily assessed the combined impact of noise and reverberation and employed highly accurate enhancement algorithms. To facilitate the development of enhancement algorithms, we evaluate twelve objective measures in reverberant-only conditions characterized by a gradual reduction of reverberant artifacts, simulating the performance of an enhancement algorithm during development. Measures are validated against the performance of NH listeners using a CI acoustic model. To enhance compatibility with reverberant CI-processed signals, measure performance was assessed after modifying the reference signal and spectral filterbank. Measures leveraging the speech-to-reverberant ratio, cepstral distance and, after modifying the reference or filterbank, envelope correlation are strong predictors of intelligibility for reverberant CI-processed speech.
Collapse
Affiliation(s)
- Lidea K Shahidi
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Leslie M Collins
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Boyla O Mainsah
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| |
Collapse
|
2
|
Fleming JT, Winn MB. Strategic perceptual weighting of acoustic cues for word stress in listeners with cochlear implants, acoustic hearing, or simulated bimodal hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:1300. [PMID: 36182279 PMCID: PMC9439712 DOI: 10.1121/10.0013890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 08/08/2022] [Accepted: 08/16/2022] [Indexed: 05/28/2023]
Abstract
Perception of word stress is an important aspect of recognizing speech, guiding the listener toward candidate words based on the perceived stress pattern. Cochlear implant (CI) signal processing is likely to disrupt some of the available cues for word stress, particularly vowel quality and pitch contour changes. In this study, we used a cue weighting paradigm to investigate differences in stress cue weighting patterns between participants listening with CIs and those with normal hearing (NH). We found that participants with CIs gave less weight to frequency-based pitch and vowel quality cues than NH listeners but compensated by upweighting vowel duration and intensity cues. Nonetheless, CI listeners' stress judgments were also significantly influenced by vowel quality and pitch, and they modulated their usage of these cues depending on the specific word pair in a manner similar to NH participants. In a series of separate online experiments with NH listeners, we simulated aspects of bimodal hearing by combining low-pass filtered speech with a vocoded signal. In these conditions, participants upweighted pitch and vowel quality cues relative to a fully vocoded control condition, suggesting that bimodal listening holds promise for restoring the stress cue weighting patterns exhibited by listeners with NH.
Collapse
Affiliation(s)
- Justin T Fleming
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
3
|
Svirsky MA, Capach NH, Neukam JD, Azadpour M, Sagi E, Hight AE, Glassman EK, Lavender A, Seward KP, Miller MK, Ding N, Tan CT, Fitzgerald MB. Valid Acoustic Models of Cochlear Implants: One Size Does Not Fit All. Otol Neurotol 2021; 42:S2-S10. [PMID: 34766938 PMCID: PMC8691967 DOI: 10.1097/mao.0000000000003373] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
HYPOTHESIS This study tests the hypothesis that it is possible to find tone or noise vocoders that sound similar and result in similar speech perception scores to a cochlear implant (CI). This would validate the use of such vocoders as acoustic models of CIs. We further hypothesize that those valid acoustic models will require a personalized amount of frequency mismatch between input filters and output tones or noise bands. BACKGROUND Noise or tone vocoders have been used as acoustic models of CIs in hundreds of publications but have never been convincingly validated. METHODS Acoustic models were evaluated by single-sided deaf CI users who compared what they heard with the CI in one ear to what they heard with the acoustic model in the other ear. We evaluated frequency-matched models (both all-channel and 6-channel models, both tone and noise vocoders) as well as self-selected models that included an individualized level of frequency mismatch. RESULTS Self-selected acoustic models resulted in similar levels of speech perception and similar perceptual quality as the CI. These models also matched the CI in terms of perceived intelligibility, harshness, and pleasantness. CONCLUSION Valid acoustic models of CIs exist, but they are different from the models most widely used in the literature. Individual amounts of frequency mismatch may be required to optimize the validity of the model. This may be related to the basalward frequency mismatch experienced by postlingually deaf patients after cochlear implantation.
Collapse
Affiliation(s)
- Mario A Svirsky
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
- Neuroscience Institute, New York University School of Medicine
| | - Nicole Hope Capach
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
| | - Jonathan D Neukam
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
| | - Mahan Azadpour
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
| | - Elad Sagi
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
| | - Ariel Edward Hight
- New York University
- Department of Otolaryngology Head and Neck Surgery, New York University Grossman School of Medicine, New York, New York
| | | | | | - Keena P Seward
- New York University
- 3L Therapy Solutions, LLC, Beltsville, Maryland
| | - Margaret K Miller
- New York University
- Human Auditory Development Lab, Boys Town National Research Hospital, Omaha, Nebraska, USA
| | - Nai Ding
- New York University
- College of Biomedical Engineering and Instrument Sciences, Zhejiang University, Zhejiang, China
| | - Chin-Tuan Tan
- New York University
- Erik Jonsson School of Engineering and Computer Science
- Department of Speech and Hearing, School of Behavioral and Brain Sciences, University of Texas at Dallas, Richardson, Texas
| | - Matthew B Fitzgerald
- New York University
- Department of Otolaryngology Head and Neck Surgery, Stanford University, Stanford, California, USA
| |
Collapse
|
4
|
Huang EHH, Wu CM, Lin HC. Combination and Comparison of Sound Coding Strategies Using Cochlear Implant Simulation With Mandarin Speech. IEEE Trans Neural Syst Rehabil Eng 2021; 29:2407-2416. [PMID: 34767509 DOI: 10.1109/tnsre.2021.3128064] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Three cochlear implant (CI) sound coding strategies were combined in the same signal processing path and compared for speech intelligibility with vocoded Mandarin sentences. The three CI coding strategies, biologically-inspired hearing aid algorithm (BioAid), envelope enhancement (EE), and fundamental frequency modulation (F0mod), were combined with the advanced combination encoder (ACE) strategy. Hence, four singular coding strategies and four combinational coding strategies were derived. Mandarin sentences with speech-shape noise were processed using these coding strategies. Speech understanding of vocoded Mandarin sentences was evaluated using short-time objective intelligibility (STOI) and subjective sentence recognition tests with normal-hearing listeners. For signal-to-noise ratios at 5 dB or above, the EE strategy had slightly higher average scores in both STOI and listening tests compared to ACE. The addition of EE to BioAid slightly increased the mean scores for BioAid+EE, which was the combination strategy with the highest scores in both objective and subjective speech intelligibility. The benefits of BioAid, F0mod, and the four combinational coding strategies were not observed in CI simulation. The findings of this study may be useful for the future design of coding strategies and related studies with Mandarin.
Collapse
|
5
|
Lamping W, Goehring T, Marozeau J, Carlyon RP. The effect of a coding strategy that removes temporally masked pulses on speech perception by cochlear implant users. Hear Res 2020; 391:107969. [PMID: 32320925 PMCID: PMC7116331 DOI: 10.1016/j.heares.2020.107969] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 03/26/2020] [Accepted: 04/05/2020] [Indexed: 01/11/2023]
Abstract
Speech recognition in noisy environments remains a challenge for cochlear implant (CI) recipients. Unwanted charge interactions between current pulses, both within and between electrode channels, are likely to impair performance. Here we investigate the effect of reducing the number of current pulses on speech perception. This was achieved by implementing a psychoacoustic temporal-masking model where current pulses in each channel were passed through a temporal integrator to identify and remove pulses that were less likely to be perceived by the recipient. The decision criterion of the temporal integrator was varied to control the percentage of pulses removed in each condition. In experiment 1, speech in quiet was processed with a standard Continuous Interleaved Sampling (CIS) strategy and with 25, 50 and 75% of pulses removed. In experiment 2, performance was measured for speech in noise with the CIS reference and with 50 and 75% of pulses removed. Speech intelligibility in quiet revealed no significant difference between reference and test conditions. For speech in noise, results showed a significant improvement of 2.4 dB when removing 50% of pulses and performance was not significantly different between the reference and when 75% of pulses were removed. Further, by reducing the overall amount of current pulses by 25, 50, and 75% but accounting for the increase in charge necessary to compensate for the decrease in loudness, estimated average power savings of 21.15, 40.95, and 63.45%, respectively, could be possible for this set of listeners. In conclusion, removing temporally masked pulses may improve speech perception in noise and result in substantial power savings.
Collapse
Affiliation(s)
- Wiebke Lamping
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800, Kgs. Lyngby, Denmark; Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, United Kingdom.
| | - Tobias Goehring
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, United Kingdom
| | - Jeremy Marozeau
- Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800, Kgs. Lyngby, Denmark
| | - Robert P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
6
|
Winn MB. Accommodation of gender-related phonetic differences by listeners with cochlear implants and in a variety of vocoder simulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:174. [PMID: 32006986 PMCID: PMC7341679 DOI: 10.1121/10.0000566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 12/06/2019] [Accepted: 12/13/2019] [Indexed: 06/01/2023]
Abstract
Speech perception requires accommodation of a wide range of acoustic variability across talkers. A classic example is the perception of "sh" and "s" fricative sounds, which are categorized according to spectral details of the consonant itself, and also by the context of the voice producing it. Because women's and men's voices occupy different frequency ranges, a listener is required to make a corresponding adjustment of acoustic-phonetic category space for these phonemes when hearing different talkers. This pattern is commonplace in everyday speech communication, and yet might not be captured in accuracy scores for whole words, especially when word lists are spoken by a single talker. Phonetic accommodation for fricatives "s" and "sh" was measured in 20 cochlear implant (CI) users and in a variety of vocoder simulations, including those with noise carriers with and without peak picking, simulated spread of excitation, and pulsatile carriers. CI listeners showed strong phonetic accommodation as a group. Each vocoder produced phonetic accommodation except the 8-channel noise vocoder, despite its historically good match with CI users in word intelligibility. Phonetic accommodation is largely independent of linguistic factors and thus might offer information complementary to speech intelligibility tests which are partially affected by language processing.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech & Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
7
|
Perception of noise-vocoded tone complexes: A time domain analysis based on an auditory filterbank model. Hear Res 2018; 367:1-16. [PMID: 30005269 DOI: 10.1016/j.heares.2018.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 06/28/2018] [Accepted: 07/02/2018] [Indexed: 11/21/2022]
Abstract
When a wideband harmonic tone complex (wHTC) is passed through a noise vocoder, the resulting sounds can have spectra with large peak-to-valley ratios, but little or no periodicity strength in the autocorrelation functions. We measured judgments of pitch strength for normal-hearing listeners for noise-vocoded wideband harmonic tone complexes (NV-wHTCs) relative to standard and anchor stimuli. The standard was a 1-channel NV-wHTC and the anchor was either the unprocessed wHTC or an infinitely-iterated rippled noise (IIRN). Although there is variability among individuals, the magnitude judgment functions obtained with the IIRN anchor suggest different listening strategies among individuals. In order to gain some insight into possible listening strategies, test stimuli were analyzed at the output of an auditory filterbank model based on gammatone filters. The weak periodicity strengths of NV-wHTCs observed in the stimulus autocorrelation functions are augmented at the output of the gammatone filterbank model. Six analytical models of pitch strength were evaluated based on summary correlograms obtained from the gammatone tone filterbank. The results of the filterbank analysis suggest that, contrary to the weak or absent periodicity strengths in the stimulus domain, temporal cues contribute to pitch strength perception of noise-vocoded harmonic stimuli such that listeners' judgments of pitch strength reflect a nonlinear, weighted average of the temporal information between the fine structure and the envelope.
Collapse
|
8
|
Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy. J Neurosci Methods 2016; 277:63-74. [PMID: 27939961 PMCID: PMC5270640 DOI: 10.1016/j.jneumeth.2016.12.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Revised: 11/19/2016] [Accepted: 12/07/2016] [Indexed: 11/23/2022]
Abstract
A Gammatone filterbank has the potential to better resolve the harmonics of complex tones. A total delay of a Gammatone filterbank can be made smaller compared to an FFT filterbank with the same frequency resolution at low frequencies. Melody contour identification improved with longer frame size or higher frequency resolution.
Background Contemporary speech processing strategies in cochlear implants (CIs) such as the Advanced Combination Encoder (ACE) use a standard Fast Fourier Transform (FFT) filterbank to extract envelopes. The assignment of the FFT bins to approximate the frequency resolution of the basilar membrane is only partly based on physiology, especially since the bins are distributed linearly below 1000 Hz and logarithmically above 1000 Hz. New method A Gammatone filterbank which provides a closer approximation to the bandwidths of filters in the human auditory system could replace the standard FFT filterbank in the ACE strategy. An infinite impulse response (IIR) all-pole design of the Gammatone filterbank was compared to the FFT filterbank with 128, 256 and 512 points resolutions and the effect of the frequency boundaries of the filters was also investigated. Results Melodic contour identification (MCI) and just noticeable difference (JND) experiments, both involving synthetic clarinet notes in octaves 3 and 4, were conducted with 6 normal hearing (NH) participants using noise vocoded stimuli; and 10 CI recipients just performed the MCI experiment. The MCI results for both NH and CI subjects, showed a significant effect of the filterbank on the percentage correct responses of the participants. Comparison with existing methods The Gammatone filterbank can better resolve the harmonics of tested synthetic clarinet notes which led to better performances in the MCI experiment. Conclusions The total delay of the Gammatone filterbank can be made smaller than the delay of the FFT filterbank with the same frequency resolution at low frequencies.
Collapse
|
9
|
Staisloff HE, Lee DH, Aronoff JM. Perceptually aligning apical frequency regions leads to more binaural fusion of speech in a cochlear implant simulation. Hear Res 2016; 337:59-64. [PMID: 27208791 DOI: 10.1016/j.heares.2016.05.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2016] [Revised: 04/27/2016] [Accepted: 05/04/2016] [Indexed: 11/16/2022]
Abstract
For bilateral cochlear implant users, the left and right arrays are typically not physically aligned, resulting in a degradation of binaural fusion, which can be detrimental to binaural abilities. Perceptually aligning the two arrays can be accomplished by disabling electrodes in one ear that do not have a perceptually corresponding electrode in the other side. However, disabling electrodes at the edges of the array will cause compression of the input frequency range into a smaller cochlear extent, which may result in reduced spectral resolution. An alternative approach to overcome this mismatch would be to only align one edge of the array. By aligning either only the apical or basal end of the arrays, fewer electrodes would be disabled, potentially causing less reduction in spectral resolution. The goal of this study was to determine the relative effect of aligning either the basal or apical end of the electrode with regards to binaural fusion. A vocoder was used to simulate cochlear implant listening conditions in normal hearing listeners. Speech signals were vocoded such that the two ears were either predominantly aligned at only the basal or apical end of the simulated arrays. The experiment was then repeated with a spectrally inverted vocoder to determine whether the detrimental effects on fusion were related to the spectral-temporal characteristics of the stimuli or the location in the cochlea where the misalignment occurred. In Experiment 1, aligning the basal portion of the simulated arrays led to significantly less binaural fusion than aligning the apical portions of the simulated array. However, when the input was spectrally inverted, aligning the apical portion of the simulated array led to significantly less binaural fusion than aligning the basal portions of the simulated arrays. These results suggest that, for speech, with its predominantly low frequency spectral-temporal modulations, it is more important to perceptually align the apical portion of the array to better preserve binaural fusion. By partially aligning these arrays, cochlear implant users could potentially increase their ability to fuse speech sounds presented to the two ears while maximizing spectral resolution.
Collapse
Affiliation(s)
- Hannah E Staisloff
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, 901 S. 6th St, Champaign, IL 61820, USA.
| | - Daniel H Lee
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, 901 S. 6th St, Champaign, IL 61820, USA.
| | - Justin M Aronoff
- Department of Speech and Hearing Science, University of Illinois at Urbana-Champaign, 901 S. 6th St, Champaign, IL 61820, USA.
| |
Collapse
|
10
|
The Intelligibility of Interrupted Speech: Cochlear Implant Users and Normal Hearing Listeners. J Assoc Res Otolaryngol 2016; 17:475-91. [PMID: 27090115 PMCID: PMC5023536 DOI: 10.1007/s10162-016-0565-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 03/18/2016] [Indexed: 11/13/2022] Open
Abstract
Compared with normal-hearing listeners, cochlear implant (CI) users display a loss of intelligibility of speech interrupted by silence or noise, possibly due to reduced ability to integrate and restore speech glimpses across silence or noise intervals. The present study was conducted to establish the extent of the deficit typical CI users have in understanding interrupted high-context sentences as a function of a range of interruption rates (1.5 to 24 Hz) and duty cycles (50 and 75 %). Further, factors such as reduced signal quality of CI signal transmission and advanced age, as well as potentially lower speech intelligibility of CI users even in the lack of interruption manipulation, were explored by presenting young, as well as age-matched, normal-hearing (NH) listeners with full-spectrum and vocoded speech (eight-channel and speech intelligibility baseline performance matched). While the actual CI users had more difficulties in understanding interrupted speech and taking advantage of faster interruption rates and increased duty cycle than the eight-channel noise-band vocoded listeners, their performance was similar to the matched noise-band vocoded listeners. These results suggest that while loss of spectro-temporal resolution indeed plays an important role in reduced intelligibility of interrupted speech, these factors alone cannot entirely explain the deficit. Other factors associated with real CIs, such as aging or failure in transmission of essential speech cues, seem to additionally contribute to poor intelligibility of interrupted speech.
Collapse
|
11
|
El Boghdady N, Kegel A, Lai WK, Dillier N. A neural-based vocoder implementation for evaluating cochlear implant coding strategies. Hear Res 2016; 333:136-149. [PMID: 26775182 DOI: 10.1016/j.heares.2016.01.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2015] [Revised: 12/18/2015] [Accepted: 01/07/2016] [Indexed: 10/22/2022]
Abstract
Most simulations of cochlear implant (CI) coding strategies rely on standard vocoders that are based on purely signal processing techniques. However, these models neither account for various biophysical phenomena, such as neural stochasticity and refractoriness, nor for effects of electrical stimulation, such as spectral smearing as a function of stimulus intensity. In this paper, a neural model that accounts for stochastic firing, parasitic spread of excitation across neuron populations, and neuronal refractoriness, was developed and augmented as a preprocessing stage for a standard 22-channel noise-band vocoder. This model was used to subjectively and objectively assess consonant discrimination in commercial and experimental coding strategies. Stimuli consisting of consonant-vowel (CV) and vowel-consonant-vowel (VCV) tokens were processed by either the Advanced Combination Encoder (ACE) or the Excitability Controlled Coding (ECC) strategies, and later resynthesized to audio using the aforementioned vocoder model. Baseline performance was measured using unprocessed versions of the speech tokens. Behavioural responses were collected from seven normal hearing (NH) volunteers, while EEG data were recorded from five NH participants. Psychophysical results indicate that while there may be a difference in consonant perception between the two tested coding strategies, mismatch negativity (MMN) waveforms do not show any marked trends in CV or VCV contrast discrimination.
Collapse
Affiliation(s)
- Nawal El Boghdady
- Institute for Neuroinformatics (INI), Universität Zürich (UZH)/ ETH Zürich (ETHZ), Zürich, Switzerland.
| | - Andrea Kegel
- Laboratory of Experimental Audiology, ENT Department, Universitätsspital Zürich (USZ), Zürich, Switzerland
| | - Wai Kong Lai
- Laboratory of Experimental Audiology, ENT Department, Universitätsspital Zürich (USZ), Zürich, Switzerland
| | - Norbert Dillier
- Laboratory of Experimental Audiology, ENT Department, Universitätsspital Zürich (USZ), Zürich, Switzerland
| |
Collapse
|
12
|
Aguiar DE, Taylor NE, Li J, Gazanfari DK, Talavage TM, Laflen JB, Neuberger H, Svirsky MA. Information theoretic evaluation of a noiseband-based cochlear implant simulator. Hear Res 2015; 333:185-193. [PMID: 26409068 DOI: 10.1016/j.heares.2015.09.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Revised: 08/25/2015] [Accepted: 09/20/2015] [Indexed: 10/23/2022]
Abstract
Noise-band vocoders are often used to simulate the signal processing algorithms used in cochlear implants (CIs), producing acoustic stimuli that may be presented to normal hearing (NH) subjects. Such evaluations may obviate the heterogeneity of CI user populations, achieving greater experimental control than when testing on CI subjects. However, it remains an open question whether advancements in algorithms developed on NH subjects using a simulator will necessarily improve performance in CI users. This study assessed the similarity in vowel identification of CI subjects and NH subjects using an 8-channel noise-band vocoder simulator configured to match input and output frequencies or to mimic output after a basalward shift of input frequencies. Under each stimulus condition, NH subjects performed the task both with and without feedback/training. Similarity of NH subjects to CI users was evaluated using correct identification rates and information theoretic approaches. Feedback/training produced higher rates of correct identification, as expected, but also resulted in error patterns that were closer to those of the CI users. Further evaluation remains necessary to determine how patterns of confusion at the token level are affected by the various parameters in CI simulators, providing insight into how a true CI simulation may be developed to facilitate more rapid prototyping and testing of novel CI signal processing and electrical stimulation strategies.
Collapse
Affiliation(s)
- Daniel E Aguiar
- School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - N Ellen Taylor
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA
| | - Jing Li
- School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - Daniel K Gazanfari
- School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - Thomas M Talavage
- School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, USA; Weldon School of Biomedical Engineering, Purdue University, West Lafayette, IN, USA.
| | - J Brandon Laflen
- School of Electrical & Computer Engineering, Purdue University, West Lafayette, IN, USA
| | - Heidi Neuberger
- DeVault Otologic Research Laboratory, Department of Otolaryngology/Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Mario A Svirsky
- DeVault Otologic Research Laboratory, Department of Otolaryngology/Head and Neck Surgery, Indiana University School of Medicine, Indianapolis, IN, USA; Department of Otolaryngology-Head & Neck Surgery, New York University School of Medicine, New York, NY, USA
| |
Collapse
|
13
|
Won JH, Jones GL, Moon IJ, Rubinstein JT. Spectral and temporal analysis of simulated dead regions in cochlear implants. J Assoc Res Otolaryngol 2015; 16:285-307. [PMID: 25740402 DOI: 10.1007/s10162-014-0502-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2014] [Accepted: 12/23/2014] [Indexed: 11/29/2022] Open
Abstract
A cochlear implant (CI) electrode in a "cochlear dead region" will excite neighboring neural populations. In previous research that simulated such dead regions, stimulus information in the simulated dead region was either added to the immediately adjacent frequency regions or dropped entirely. There was little difference in speech perception ability between the two conditions. This may imply that there may be little benefit of ensuring that stimulus information on an electrode in a suspected cochlear dead region is transmitted. Alternatively, performance may be enhanced by a broader frequency redistribution, rather than adding stimuli from the dead region to the edges. In the current experiments, cochlear dead regions were introduced by excluding selected CI electrodes or vocoder noise-bands. Participants were assessed for speech understanding as well as spectral and temporal sensitivities as a function of the size of simulated dead regions. In one set of tests, the normal input frequency range of the sound processor was distributed among the active electrodes in bands with approximately logarithmic spacing ("redistributed" maps); in the remaining tests, information in simulated dead regions was dropped ("dropped" maps). Word recognition and Schroeder-phase discrimination performance, which require both spectral and temporal sensitivities, decreased as the size of simulated dead regions increased, but the redistributed and dropped remappings showed similar performance in these two tasks. Psychoacoustic experiments showed that the near match in word scores may reflect a tradeoff between spectral and temporal sensitivity: spectral-ripple discrimination was substantially degraded in the redistributed condition relative to the dropped condition while performance in a temporal modulation detection task degraded in the dropped condition but remained constant in the redistributed condition.
Collapse
Affiliation(s)
- Jong Ho Won
- Virginia Merrill Bloedel Hearing Research Center, Department of Otolaryngology-Head and Neck Surgery, University of Washington, Seattle, WA, 98195, USA
| | | | | | | |
Collapse
|
14
|
Mesnildrey Q, Macherey O. Simulating the dual-peak excitation pattern produced by bipolar stimulation of a cochlear implant: effects on speech intelligibility. Hear Res 2014; 319:32-47. [PMID: 25449010 DOI: 10.1016/j.heares.2014.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Revised: 10/28/2014] [Accepted: 11/05/2014] [Indexed: 10/24/2022]
Abstract
Several electrophysiological and psychophysical studies have shown that the spatial excitation pattern produced by bipolar stimulation of a cochlear implant (CI) can have a dual-peak shape. The perceptual effects of this dual-peak shape were investigated using noise-vocoded CI simulations in which synthesis filters were designed to simulate the spread of neural activity produced by various electrode configurations, as predicted by a simple cochlear model. Experiments 1 and 2 tested speech recognition in the presence of a concurrent speech masker for various sets of single-peak and dual-peak synthesis filters and different numbers of channels. Similarly as results obtained in real CIs, both monopolar (MP, single-peak) and bipolar (BP + 1, dual-peak) simulations showed a plateau of performance above 8 channels. The benefit of increasing the number of channels was also lower for BP + 1 than for MP. This shows that channel interactions in BP + 1 become especially deleterious for speech intelligibility when a simulated electrode acts both as an active and as a return electrode for different channels because envelope information from two different analysis bands are being conveyed to the same spectral location. Experiment 3 shows that these channel interactions are even stronger in wide BP configuration (BP + 5), likely because the interfering speech envelopes are less correlated than in narrow BP + 1. Although the exact effects of dual- or multi-peak excitation in real CIs remain to be determined, this series of experiments suggest that multipolar stimulation strategies, such as bipolar or tripolar, should be controlled to avoid neural excitation in the vicinity of the return electrodes.
Collapse
Affiliation(s)
- Quentin Mesnildrey
- LMA-CNRS, UPR 7051, Aix-Marseille Univ., Centrale Marseille, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France.
| | - Olivier Macherey
- LMA-CNRS, UPR 7051, Aix-Marseille Univ., Centrale Marseille, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France
| |
Collapse
|
15
|
van de Velde DJ, Dritsakis G, Frijns JHM, van Heuven VJ, Schiller NO. The effect of spectral smearing on the identification of pureF0intonation contours in vocoder simulations of cochlear implants. Cochlear Implants Int 2014; 16:77-87. [DOI: 10.1179/1754762814y.0000000086] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
|
16
|
Lazard DS, Marozeau J, McDermott HJ. The sound sensation of apical electric stimulation in cochlear implant recipients with contralateral residual hearing. PLoS One 2012; 7:e38687. [PMID: 22723876 PMCID: PMC3378545 DOI: 10.1371/journal.pone.0038687] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 05/09/2012] [Indexed: 11/19/2022] Open
Abstract
Background Studies using vocoders as acoustic simulators of cochlear implants have generally focused on simulation of speech understanding, gender recognition, or music appreciation. The aim of the present experiment was to study the auditory sensation perceived by cochlear implant (CI) recipients with steady electrical stimulation on the most-apical electrode. Methodology/Principal Findings Five unilateral CI users with contralateral residual hearing were asked to vary the parameters of an acoustic signal played to the non-implanted ear, in order to match its sensation to that of the electric stimulus. They also provided a rating of similarity between each acoustic sound they selected and the electric stimulus. On average across subjects, the sound rated as most similar was a complex signal with a concentration of energy around 523 Hz. This sound was inharmonic in 3 out of 5 subjects with a moderate, progressive increase in the spacing between the frequency components. Conclusions/Significance For these subjects, the sound sensation created by steady electric stimulation on the most-apical electrode was neither a white noise nor a pure tone, but a complex signal with a progressive increase in the spacing between the frequency components in 3 out of 5 subjects. Knowing whether the inharmonic nature of the sound was related to the fact that the non-implanted ear was impaired has to be explored in single-sided deafened patients with a contralateral CI. These results may be used in the future to better understand peripheral and central auditory processing in relation to cochlear implants.
Collapse
|
17
|
Carroll J, Tiaden S, Zeng FG. Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:2054-62. [PMID: 21973360 PMCID: PMC3206909 DOI: 10.1121/1.3631563] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Revised: 05/11/2011] [Accepted: 08/05/2011] [Indexed: 05/25/2023]
Abstract
Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing.
Collapse
Affiliation(s)
- Jeff Carroll
- Hearing and Speech Research Laboratory, Department of Biomedical Engineering, University of California, Irvine, California 92697-5320, USA
| | | | | |
Collapse
|
18
|
Kwon BJ, Perry TT, Olmstead VL. Effects of stimulation configurations on place pitch discrimination in cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:3818-3826. [PMID: 21682405 PMCID: PMC3135145 DOI: 10.1121/1.3586786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Revised: 03/17/2011] [Accepted: 04/13/2011] [Indexed: 05/30/2023]
Abstract
The present study aimed to examine the effect of electrode configuration, specifically monopolar (MP) or bipolar (BP) stimulation, on place pitch discrimination in cochlear implants (CIs). Twelve subjects implanted with the Nucleus Freedom device were presented with various pairs of stimulation across the electrode array, with varying degrees of distance between stimulation sites, and asked to judge the higher of the two in pitch. Each pair was presented either in the same mode or in different modes of stimulation for the within-mode or across-mode condition, respectively, at least 20 times. The result of the within-mode condition revealed that subjects, on average, were able to discriminate pitches significantly better in MP than in BP, with the sensitivity index (d') for adjacent channels of 1.2 for MP and 0.8 for BP. The result of the across-mode condition revealed that while individual variability existed, there was a strong tendency for CI subjects to perceive a higher pitch in BP stimulation than in MP for a similar site of stimulation. In other words, an MP channel needed to be shifted in a basal direction by as much as two electrodes on average to elicit a pitch comparable to that of a BP channel.
Collapse
Affiliation(s)
- Bomjun J Kwon
- Department of Otolaryngology-Head and Neck Surgery, Eye and Ear Institute, The Ohio State University, Columbus, Ohio 43212, USA.
| | | | | |
Collapse
|
19
|
Effects of age on F0 discrimination and intonation perception in simulated electric and electroacoustic hearing. Ear Hear 2011; 32:75-83. [PMID: 20739892 DOI: 10.1097/aud.0b013e3181eccfe9] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
OBJECTIVES Recent research suggests that older listeners may have difficulty processing information related to the fundamental frequency (F0) of voiced speech. In this study, the focus was on the mechanisms that may underlie this reduced ability. We examined whether increased age resulted in decreased ability to perceive F0 using fine-structure cues provided by the harmonic structure of voiced speech sounds or cues provided by high-rate envelope fluctuations (periodicity). DESIGN Younger listeners with normal hearing and older listeners with normal to near-normal hearing completed two tasks of F0 perception. In the first task (steady state F0), the fundamental frequency difference limen (F0DL) was measured adaptively for synthetic vowel stimuli. In the second task (time-varying F0), listeners relied on variations in F0 to judge intonation of synthetic diphthongs. For both tasks, three processing conditions were created: eight-channel vocoding that preserved periodicity cues to F0; a simulated electroacoustic stimulation condition, which consisted of high-frequency vocoder processing combined with a low-pass-filtered portion, and offered both periodicity and fine-structure cues to F0; and an unprocessed condition. RESULTS F0 difference limens for steady state vowel sounds and the ability to discern rising and falling intonations were significantly worse in the older subjects compared with the younger subjects. For both older and younger listeners, scores were lowest for the vocoded condition, and there was no difference in scores between the unprocessed and electroacoustic simulation conditions. CONCLUSIONS Older listeners had difficulty using periodicity cues to obtain information related to talker fundamental frequency. However, performance was improved by combining periodicity cues with (low frequency) acoustic information, and that strategy should be considered in individuals who are appropriate candidates for such processing. For cochlear implant candidates, this effect might be achieved by partial electrode insertion providing acoustic stimulation in the low frequencies or by the combination of a traditional implant in one ear and a hearing aid in the opposite ear.
Collapse
|
20
|
Strydom T, Hanekom JJ. An analysis of the effects of electrical field interaction with an acoustic model of cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:2213-2226. [PMID: 21476676 DOI: 10.1121/1.3518761] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Electrical field interaction caused by current spread in a cochlear implant was modeled in an explicit way in an acoustic model (the SPREAD model) presented to six listeners with normal hearing. The typical processing of cochlear implants was modeled more closely than in traditional acoustic models by careful selection of parameters related to current spread or parameters that could amplify the electrical field interactions caused by current spread. These parameters were the insertion depth, electrode spacing, electrical dynamic range, and dynamic range compression function. The hypothesis was that current spread could account for the asymptote in performance in speech intelligibility experiments observed at around seven stimulation channels in a number of cochlear implant studies. Speech intelligibility for sentences, vowels, and consonants at three noise levels (SNR of +15 dB, +10 dB, and +5 dB) was measured as a function of the number of spectral channels (4, 7, and 16). The SPREAD model appears to explain the asymptote in speech intelligibility at seven channels for all noise levels for all speech material used in this study. It is shown that the compressive amplitude mapping used in cochlear implants can have a detrimental effect on the number of effective channels.
Collapse
Affiliation(s)
- Trudie Strydom
- Department of Electrical, Electronic and Computer Engineering, University of Pretoria, Pretoria, Gauteng 0002, South Africa
| | | |
Collapse
|
21
|
Abstract
OBJECTIVES This study was designed to determine what acoustic elements are associated with musical perception ability in cochlear implant (CI) users and to understand how acoustic elements, which are important to good speech perception, contribute to music perception in CI users. It was hypothesized that the variability in the performance of music and speech perception may be related to differences in the sensitivity to specific acoustic features such as spectral changes or temporal modulations, or both. DESIGN A battery of hearing tasks was administered to 42 CI listeners. The Clinical Assessment of Music Perception was used, which evaluates complex-tone pitch-direction discrimination, melody recognition, and timbre recognition. To investigate spectral and temporal processing, spectral-ripple discrimination and Schroeder-phase discrimination abilities were evaluated. Speech perception ability in quiet and noise was also evaluated. Relationships between Clinical Assessment of Music Perception subtest scores, spectral-ripple discrimination thresholds, Schroeder-phase discrimination scores, and speech recognition scores were assessed. RESULTS Spectral-ripple discrimination was shown to correlate with all three aspects of music perception studied. Schroeder-phase discrimination was generally not predictive of music perception outcomes. Music perception ability was significantly correlated with speech perception ability. Nearly half of the variance in melody and timbre recognition was predicted jointly by spectral-ripple and pitch-direction discrimination thresholds. Similar results were observed on speech recognition as well. CONCLUSIONS This study suggests that spectral-ripple discrimination is significantly associated with music perception in CI users. A previous report showed that spectral-ripple discrimination is significantly correlated with speech recognition in quiet and in noise. This study also showed that speech recognition and music perception are also related to one another. Spectral-ripple discrimination ability seems to reflect a wide range of hearing abilities in CI users. The results suggest that materially improving spectral resolution could provide significant benefits in music and speech perception outcomes in CI users.
Collapse
|
22
|
Pierzycki RH, Seeber BU. Indications for temporal fine structure contribution to co-modulation masking release. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3614-3624. [PMID: 21218893 DOI: 10.1121/1.3500673] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The contribution of temporal fine structure (TFS) information to co-modulation masking release (CMR) was examined by comparing CMR obtained with unprocessed or vocoded stimuli. Tone thresholds were measured in the presence of a sinusoidally amplitude-modulated on-frequency band (OFB) of noise and zero, two, or four flanking bands (FBs) of noise whose envelopes were either co- or anti-modulated with the OFB envelope. Vocoding replaced the TFS of the tone and masker with unrelated TFS of noise or sinusoidal carriers. Maximum CMR of 11 dB was found as the difference between the co- and anti-modulated conditions for unprocessed stimuli. After vocoding, tone thresholds increased by 7 dB, and CMR was reduced to about 4 dB but remained significant. The magnitude of CMR was similar for both the sine and the noise vocoder. Co-modulation improved detection in the vocoded condition despite the absence of the tone-masker TFS interactions; thus CMR appears to be a robust mechanism based on across-frequency processing. TFS information appears to contribute to across-channel CMR since the magnitude of CMR was significantly reduced after vocoding. Since CMR was evidenced despite vocoding, it is hoped that co-modulation would also improve detection in cochlear-implant listening.
Collapse
Affiliation(s)
- Robert H Pierzycki
- MRC Institute of Hearing Research, University Park, Nottingham NG7 2RD, United Kingdom.
| | | |
Collapse
|
23
|
What breaks a melody: perceiving F0 and intensity sequences with a cochlear implant. Hear Res 2010; 269:34-41. [PMID: 20674733 DOI: 10.1016/j.heares.2010.07.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/17/2010] [Revised: 07/16/2010] [Accepted: 07/20/2010] [Indexed: 11/21/2022]
Abstract
Pitch perception has been extensively studied using discrimination tasks on pairs of single sounds. When comparing pitch discrimination performance for normal-hearing (NH) and cochlear implant (CI) listeners, it usually appears that CI users have relatively poor pitch discrimination. Tasks involving pitch sequences, such as melody perception or auditory scene analysis, are also usually difficult for CI users. However, it is unclear whether the issue with pitch sequences is a consequence of sound discriminability, or if an impairment exists for sequence processing per se. Here, we compared sequence processing abilities across stimulus dimensions (fundamental frequency and intensity) and listener groups (NH, CI, and NH listeners presented with noise-vocoded sequences). The sequence elements were firstly matched in discriminability, for each listener and dimension. Participants were then presented with pairs of sequences, constituted by up to four elements varying on a single dimension, and they performed a same/different task. In agreement with a previous study (Cousineau et al., 2009) fundamental frequency sequences were processed more accurately than intensity sequences by NH listeners. However, this was not the case for CI listeners, nor for NH listeners presented with noise-vocoded sequences. Intensity sequence processing was, nonetheless, equally accurate in the three groups. These results show that the reduced pitch cues received by CI listeners do not only elevate thresholds, as previously documented, but also affect pitch sequence processing above threshold. We suggest that efficient sequence processing for pitch requires the resolution of individual harmonics in the auditory periphery, which is not achieved with the current generation of implants.
Collapse
|
24
|
Souza P, Rosen S. Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:792-805. [PMID: 19640044 PMCID: PMC2730710 DOI: 10.1121/1.3158835] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The choice of processing parameters for vocoded signals may have an important effect on the availability of various auditory features. Experiment 1 varied envelope cutoff frequency (30 and 300 Hz), carrier type (sine and noise), and number of bands (2-5) for vocoded speech presented to normal-hearing listeners. Performance was better with a high cutoff for sine-vocoding, with no effect of cutoff for noise-vocoding. With a low cutoff, performance was better for noise-vocoding than for sine-vocoding. With a high cutoff, performance was better for sine-vocoding. Experiment 2 measured perceptibility of cues to voice pitch variations. A noise carrier combined with a high cutoff allowed intonation to be perceived to some degree but performance was best in high-cutoff sine conditions. A low cutoff led to poorest performance, regardless of carrier. Experiment 3 tested the relative contributions of co-modulation across bands and spectral density to improved performance with a sine carrier and high cutoff. Co-modulation across bands had no effect so it appears that sidebands providing a denser spectrum improved performance. These results indicate that carrier type in combination with envelope cutoff can alter the available cues in vocoded speech, factors which must be considered in interpreting results with vocoded signals.
Collapse
Affiliation(s)
- Pamela Souza
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Seattle, WA 98105, USA
| | | |
Collapse
|
25
|
Luo X, Fu QJ, Wu HP, Hsu CJ. Concurrent-vowel and tone recognition by Mandarin-speaking cochlear implant users. Hear Res 2009; 256:75-84. [PMID: 19595753 DOI: 10.1016/j.heares.2009.07.001] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2008] [Revised: 06/25/2009] [Accepted: 07/08/2009] [Indexed: 12/31/2022]
Abstract
In Mandarin Chinese, tonal patterns are lexically meaningful. In a multi-talker environment, competing tones may create interference in addition to competing vowels and consonants. The present study measured Mandarin-speaking cochlear implant (CI) users' ability to recognize concurrent vowels, tones, and syllables in a concurrent-syllable recognition test. Concurrent syllables were constructed by summing either one Chinese syllable each from one male and one female talker or two syllables from the same male talker. Each talker produced 16 different syllables (4 vowels combined with 4 tones); all syllables were normalized to have the same overall duration and amplitude. Both single- and concurrent-syllable recognition were measured in 4 adolescent and 4 adult CI subjects, using their clinically assigned speech processors. The results showed no significant difference in performance between the adolescent and adult CI subjects. With single syllables, mean vowel recognition was 90% correct, while tone and syllable recognition were only 63% and 57% correct, respectively. With concurrent syllables, vowel, tone, and syllable recognition scores dropped by 40-60 percentage points. Concurrent-syllable performance was significantly correlated with single-syllable performance. Concurrent-vowel and syllable recognition were not significantly different between the same- and different-talker conditions, while concurrent-tone recognition was significantly better with the same-talker condition. Vowel and tone recognition were better when concurrent syllables contained the same vowels or tones, respectively. Across the different vowel pairs, tone recognition was less variable than vowel recognition; across the different tone pairs, vowel recognition was less variable than tone recognition. The present results suggest that interference between concurrent tones may contribute to Mandarin-speaking CI users' susceptibility to competing-talker backgrounds.
Collapse
Affiliation(s)
- Xin Luo
- Department of Speech, Language, and Hearing Sciences, Purdue University, 500 Oval Drive, West Lafayette, IN 47907, USA.
| | | | | | | |
Collapse
|
26
|
Yuan M, Lee T, Yuen KCP, Soli SD, van Hasselt CA, Tong MCF. Cantonese tone recognition with enhanced temporal periodicity cues. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:327-337. [PMID: 19603889 DOI: 10.1121/1.3117447] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
This study investigated the contributions of temporal periodicity cues and the effectiveness of enhancing these cues for Cantonese tone recognition in noise. A multichannel noise-excited vocoder was used to simulate speech processing in cochlear implants. Ten normal-hearing listeners were tested. Temporal envelope and periodicity cues (TEPCs) below 500 Hz were extracted from four frequency bands: 60-500, 500-1000, 1000-2000, and 2000-4000 Hz. The test stimuli were obtained by combining TEPC-modulated noise signals from individual bands. For periodicity enhancement, temporal fluctuations in the range 20-500 Hz were replaced by a sinusoid with frequency equal to the fundamental frequency of original speech. Tone identification experiments were carried out using disyllabic word carriers. Results showed that TEPCs from the two high-frequency bands were more important for tone identification than TEPCs from the low-frequency bands. The use of periodicity-enhanced TEPCs led to consistent improvement of tone identification accuracy. The improvement was more significant at low signal-to-noise ratios, and more noticeable for female than for male voices. Analysis of error distributions showed that the enhancement method reduced tone identification errors and did not show any negative effect on the recognition of segmental structures.
Collapse
Affiliation(s)
- Meng Yuan
- Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong.
| | | | | | | | | | | |
Collapse
|
27
|
Luo X, Fu QJ. Concurrent-vowel and tone recognitions in acoustic and simulated electric hearing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:3223-3233. [PMID: 19425665 PMCID: PMC2806442 DOI: 10.1121/1.3106534] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/16/2008] [Revised: 02/27/2009] [Accepted: 03/05/2009] [Indexed: 05/27/2023]
Abstract
Because of the poor spectral resolution in cochlear implants (CIs), fundamental frequency (F0) cues are not well preserved. Chinese-speaking CI users may have great difficulty understanding speech produced by competing talkers, due to conflicting tones. In this study, normal-hearing listeners' concurrent Chinese syllable recognition was measured with unprocessed speech and CI simulations. Concurrent syllables were constructed by summing two vowels from a male talker (with identical mean F0's) or one vowel from each of a male and a female talker (with a relatively large F0 separation). CI signal processing was simulated using four- and eight-channel noise-band vocoders; the degraded spectral resolution may limit listeners' ability to utilize talker and/or tone differences. The results showed that concurrent speech recognition was significantly poorer with the CI simulations than with unprocessed speech. There were significant interactions between the talker and speech-processing conditions, e.g., better tone and syllable recognitions with the male-female condition for unprocessed speech, and with the male-male condition for eight-channel speech. With the CI simulations, competing tones interfered with concurrent-tone and syllable recognitions, but not vowel recognition. Given limited pitch cues, subjects were unable to use F0 differences between talkers or tones for concurrent Chinese syllable recognition.
Collapse
Affiliation(s)
- Xin Luo
- Communication and Auditory Neuroscience, House Ear Institute, 2100 West Third Street, Los Angeles, California 90057, USA.
| | | |
Collapse
|
28
|
Verschuur C. Modeling the effect of channel number and interaction on consonant recognition in a cochlear implant peak-picking strategy. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:1723-1736. [PMID: 19275329 DOI: 10.1121/1.3075554] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Difficulties in speech recognition experienced by cochlear implant users may be attributed both to information loss caused by signal processing and to information loss associated with the interface between the electrode array and auditory nervous system, including cross-channel interaction. The objective of the work reported here was to attempt to partial out the relative contribution of these different factors to consonant recognition. This was achieved by comparing patterns of consonant feature recognition as a function of channel number and presence/absence of background noise in users of the Nucleus 24 device with normal hearing subjects listening to acoustic models that mimicked processing of that device. Additionally, in the acoustic model experiment, a simulation of cross-channel spread of excitation, or "channel interaction," was varied. Results showed that acoustic model experiments were highly correlated with patterns of performance in better-performing cochlear implant users. Deficits to consonant recognition in this subgroup could be attributed to cochlear implant processing, whereas channel interaction played a much smaller role in determining performance errors. The study also showed that large changes to channel number in the Advanced Combination Encoder signal processing strategy led to no substantial changes in performance.
Collapse
Affiliation(s)
- Carl Verschuur
- Hearing and Balance Centre, Institute of Sound and Vibration Research, University of Southampton, Highfield, Southampton, United Kingdom
| |
Collapse
|
29
|
Gaudrain E, Grimault N, Healy EW, Béra JC. Streaming of vowel sequences based on fundamental frequency in a cochlear-implant simulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3076-87. [PMID: 19045793 PMCID: PMC2677355 DOI: 10.1121/1.2988289] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2007] [Revised: 08/21/2008] [Accepted: 08/22/2008] [Indexed: 05/27/2023]
Abstract
Cochlear-implant (CI) users often have difficulties perceiving speech in noisy environments. Although this problem likely involves auditory scene analysis, few studies have examined sequential segregation in CI listening situations. The present study aims to assess the possible role of fundamental frequency (F(0)) cues for the segregation of vowel sequences, using a noise-excited envelope vocoder that simulates certain aspects of CI stimulation. Obligatory streaming was evaluated using an order-naming task in two experiments involving normal-hearing subjects. In the first experiment, it was found that streaming did not occur based on F(0) cues when natural-duration vowels were processed to reduce spectral cues using the vocoder. In the second experiment, shorter duration vowels were used to enhance streaming. Under these conditions, F(0)-related streaming appeared even when vowels were processed to reduce spectral cues. However, the observed segregation could not be convincingly attributed to temporal periodicity cues. A subsequent analysis of the stimuli revealed that an F(0)-related spectral cue could have elicited the observed segregation. Thus, streaming under conditions of severely reduced spectral cues, such as those associated with CIs, may potentially occur as a result of this particular cue.
Collapse
Affiliation(s)
- Etienne Gaudrain
- Neurosciences Sensorielles, Comportement, Cognition, CNRS UMR 5020, Universite Lyon 1, 50 Avenue Tony Garnier, 69366 Lyon Cedex 07, France
| | | | | | | |
Collapse
|
30
|
Kong YY, Carlyon RP. Improved speech recognition in noise in simulated binaurally combined acoustic and electric stimulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:3717-27. [PMID: 17552722 DOI: 10.1121/1.2717408] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Speech recognition in noise improves with combined acoustic and electric stimulation compared to electric stimulation alone [Kong et al., J. Acoust. Soc. Am. 117, 1351-1361 (2005)]. Here the contribution of fundamental frequency (F0) and low-frequency phonetic cues to speech recognition in combined hearing was investigated. Normal-hearing listeners heard vocoded speech in one ear and low-pass (LP) filtered speech in the other. Three listening conditions (vocode-alone, LP-alone, combined) were investigated. Target speech (average F0=120 Hz) was mixed with a time-reversed masker (average F0=172 Hz) at three signal-to-noise ratios (SNRs). LP speech aided performance at all SNRs. Low-frequency phonetic cues were then removed by replacing the LP speech with a LP equal-amplitude harmonic complex, frequency and amplitude modulated by the F0 and temporal envelope of voiced segments of the target. The combined hearing advantage disappeared at 10 and 15 dB SNR, but persisted at 5 dB SNR. A similar finding occurred when, additionally, F0 contour cues were removed. These results are consistent with a role for low-frequency phonetic cues, but not with a combination of F0 information between the two ears. The enhanced performance at 5 dB SNR with F0 contour cues absent suggests that voicing or glimpsing cues may be responsible for the combined hearing benefit.
Collapse
Affiliation(s)
- Ying-Yee Kong
- MRC-Cognition & Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 2EF, United Kingdom.
| | | |
Collapse
|
31
|
Kong YY, Zeng FG. Temporal and spectral cues in Mandarin tone recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 120:2830-40. [PMID: 17139741 DOI: 10.1121/1.2346009] [Citation(s) in RCA: 102] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
This study evaluates the relative contributions of envelope and fine structure cues in both temporal and spectral domains to Mandarin tone recognition in quiet and in noise. Four sets of stimuli were created. Noise-excited vocoder speech was used to evaluate the temporal envelope. Frequency modulation was then added to evaluate the temporal fine structure. Whispered speech was used to evaluate the spectral envelope. Finally, equal-amplitude harmonics were used to evaluate the spectral fine structure. Results showed that normal-hearing listeners achieved nearly perfect tone recognition with either spectral or temporal fine structure in quiet, but only 70%-80% correct with the envelope cues. With the temporal envelope, 32 spectral bands were needed to achieve performance similar to that obtained with the original stimuli, but only four bands were necessary with the additional temporal fine structure. Envelope cues were more susceptible to noise than fine structure cues, with the envelope cues producing significantly lower performance in noise. These findings suggest that tonal pattern recognition is a robust process that can make use of both spectral and temporal cues. Unlike speech recognition, the fine structure is more important than the envelope for tone recognition in both temporal and spectral domains, particularly in noise.
Collapse
Affiliation(s)
- Ying-Yee Kong
- Hearing and Speech Research Laboratory, Department of Cognitive Sciences, University of California-Irvine, Irvine, CA 92697, USA.
| | | |
Collapse
|