1
|
Goldsworthy RL, Bissmeyer SRS, Camarena A. Advantages of Pulse Rate Compared to Modulation Frequency for Temporal Pitch Perception in Cochlear Implant Users. J Assoc Res Otolaryngol 2022; 23:137-150. [PMID: 34981263 PMCID: PMC8782986 DOI: 10.1007/s10162-021-00828-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 12/01/2021] [Indexed: 02/03/2023] Open
Abstract
Most cochlear implants encode the fundamental frequency of periodic sounds by amplitude modulation of constant-rate pulsatile stimulation. Pitch perception provided by such stimulation strategies is markedly poor. Two experiments are reported here that consider potential advantages of pulse rate compared to modulation frequency for providing stimulation timing cues for pitch. The first experiment examines beat frequency distortion that occurs when modulating constant-rate pulsatile stimulation. This distortion has been reported on previously, but the results presented here indicate that distortion occurs for higher stimulation rates than previously reported. The second experiment examines pitch resolution as provided by pulse rate compared to modulation frequency. The results indicate that pitch discrimination is better with pulse rate than with modulation frequency. The advantage was large for rates near what has been suggested as the upper limit of temporal pitch perception conveyed by cochlear implants. The results are relevant to sound processing design for cochlear implants particularly for algorithms that encode fundamental frequency into deep envelope modulations or into precisely timed pulsatile stimulation.
Collapse
Affiliation(s)
- Raymond L Goldsworthy
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Susan R S Bissmeyer
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Department of Biomedical Engineering, Viterbi School of Engineering, University of Southern California, Los Angeles, CA, USA
| | - Andres Camarena
- Auditory Research Center, Caruso Department of Otolaryngology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
- Neuroscience Graduate Program, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
2
|
Goehring T, Arenberg JG, Carlyon RP. Using Spectral Blurring to Assess Effects of Channel Interaction on Speech-in-Noise Perception with Cochlear Implants. J Assoc Res Otolaryngol 2020; 21:353-371. [PMID: 32519088 PMCID: PMC7445227 DOI: 10.1007/s10162-020-00758-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 05/21/2020] [Indexed: 01/07/2023] Open
Abstract
Cochlear implant (CI) listeners struggle to understand speech in background noise. Interactions between electrode channels due to current spread increase the masking of speech by noise and lead to difficulties with speech perception. Strategies that reduce channel interaction therefore have the potential to improve speech-in-noise perception by CI listeners, but previous results have been mixed. We investigated the effects of channel interaction on speech-in-noise perception and its association with spectro-temporal acuity in a listening study with 12 experienced CI users. Instead of attempting to reduce channel interaction, we introduced spectral blurring to simulate some of the effects of channel interaction by adjusting the overlap between electrode channels at the input level of the analysis filters or at the output by using several simultaneously stimulated electrodes per channel. We measured speech reception thresholds in noise as a function of the amount of blurring applied to either all 15 electrode channels or to 5 evenly spaced channels. Performance remained roughly constant as the amount of blurring applied to all channels increased up to some knee point, above which it deteriorated. This knee point differed across listeners in a way that correlated with performance on a non-speech spectro-temporal task, and is proposed here as an individual measure of channel interaction. Surprisingly, even extreme amounts of blurring applied to 5 channels did not affect performance. The effects on speech perception in noise were similar for blurring at the input and at the output of the CI. The results are in line with the assumption that experienced CI users can make use of a limited number of effective channels of information and tolerate some deviations from their everyday settings when identifying speech in the presence of a masker. Furthermore, these findings may explain the mixed results by strategies that optimized or deactivated a small number of electrodes evenly distributed along the array by showing that blurring or deactivating one-third of the electrodes did not harm speech-in-noise performance.
Collapse
Affiliation(s)
- Tobias Goehring
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK.
| | - Julie G Arenberg
- Massachusetts Eye and Ear, Harvard Medical School, 243 Charles St, Boston, MA, 02114, USA
| | - Robert P Carlyon
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
3
|
Winn MB. Accommodation of gender-related phonetic differences by listeners with cochlear implants and in a variety of vocoder simulations. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:174. [PMID: 32006986 PMCID: PMC7341679 DOI: 10.1121/10.0000566] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 12/06/2019] [Accepted: 12/13/2019] [Indexed: 06/01/2023]
Abstract
Speech perception requires accommodation of a wide range of acoustic variability across talkers. A classic example is the perception of "sh" and "s" fricative sounds, which are categorized according to spectral details of the consonant itself, and also by the context of the voice producing it. Because women's and men's voices occupy different frequency ranges, a listener is required to make a corresponding adjustment of acoustic-phonetic category space for these phonemes when hearing different talkers. This pattern is commonplace in everyday speech communication, and yet might not be captured in accuracy scores for whole words, especially when word lists are spoken by a single talker. Phonetic accommodation for fricatives "s" and "sh" was measured in 20 cochlear implant (CI) users and in a variety of vocoder simulations, including those with noise carriers with and without peak picking, simulated spread of excitation, and pulsatile carriers. CI listeners showed strong phonetic accommodation as a group. Each vocoder produced phonetic accommodation except the 8-channel noise vocoder, despite its historically good match with CI users in word intelligibility. Phonetic accommodation is largely independent of linguistic factors and thus might offer information complementary to speech intelligibility tests which are partially affected by language processing.
Collapse
Affiliation(s)
- Matthew B Winn
- Department of Speech & Hearing Sciences, University of Minnesota, 164 Pillsbury Drive Southeast, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
4
|
Gianakas SP, Winn MB. Lexical bias in word recognition by cochlear implant listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2019; 146:3373. [PMID: 31795696 PMCID: PMC6948217 DOI: 10.1121/1.5132938] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 10/04/2019] [Accepted: 10/14/2019] [Indexed: 06/03/2023]
Abstract
When hearing an ambiguous speech sound, listeners show a tendency to perceive it as a phoneme that would complete a real word, rather than completing a nonsense/fake word. For example, a sound that could be heard as either /b/ or /ɡ/ is perceived as /b/ when followed by _ack but perceived as /ɡ/ when followed by "_ap." Because the target sound is acoustically identical across both environments, this effect demonstrates the influence of top-down lexical processing in speech perception. Degradations in the auditory signal were hypothesized to render speech stimuli more ambiguous, and therefore promote increased lexical bias. Stimuli included three speech continua that varied by spectral cues of varying speeds, including stop formant transitions (fast), fricative spectra (medium), and vowel formants (slow). Stimuli were presented to listeners with cochlear implants (CIs), and also to listeners with normal hearing with clear spectral quality, or with varying amounts of spectral degradation using a noise vocoder. Results indicated an increased lexical bias effect with degraded speech and for CI listeners, for whom the effect size was related to segment duration. This method can probe an individual's reliance on top-down processing even at the level of simple lexical/phonetic perception.
Collapse
Affiliation(s)
- Steven P Gianakas
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, 164 Pillsbury Drive SE, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
5
|
Goehring T, Archer-Boyd A, Deeks JM, Arenberg JG, Carlyon RP. A Site-Selection Strategy Based on Polarity Sensitivity for Cochlear Implants: Effects on Spectro-Temporal Resolution and Speech Perception. J Assoc Res Otolaryngol 2019; 20:431-448. [PMID: 31161338 PMCID: PMC6646483 DOI: 10.1007/s10162-019-00724-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 05/08/2019] [Indexed: 01/04/2023] Open
Abstract
Thresholds of asymmetric pulses presented to cochlear implant (CI) listeners depend on polarity in a way that differs across subjects and electrodes. It has been suggested that lower thresholds for cathodic-dominant compared to anodic-dominant pulses reflect good local neural health. We evaluated the hypothesis that this polarity effect (PE) can be used in a site-selection strategy to improve speech perception and spectro-temporal resolution. Detection thresholds were measured in eight users of Advanced Bionics CIs for 80-pps, triphasic, monopolar pulse trains where the central high-amplitude phase was either anodic or cathodic. Two experimental MAPs were then generated for each subject by deactivating the five electrodes with either the highest or the lowest PE magnitudes (cathodic minus anodic threshold). Performance with the two experimental MAPs was evaluated using two spectro-temporal tests (Spectro-Temporal Ripple for Investigating Processor EffectivenesS (STRIPES; Archer-Boyd et al. in J Acoust Soc Am 144:2983–2997, 2018) and Spectral-Temporally Modulated Ripple Test (SMRT; Aronoff and Landsberger in J Acoust Soc Am 134:EL217–EL222, 2013)) and with speech recognition in quiet and in noise. Performance was also measured with an experimental MAP that used all electrodes, similar to the subjects’ clinical MAP. The PE varied strongly across subjects and electrodes, with substantial magnitudes relative to the electrical dynamic range. There were no significant differences in performance between the three MAPs at group level, but there were significant effects at subject level—not all of which were in the hypothesized direction—consistent with previous reports of a large variability in CI users’ performance and in the potential benefit of site-selection strategies. The STRIPES but not the SMRT test successfully predicted which strategy produced the best speech-in-noise performance on a subject-by-subject basis. The average PE across electrodes correlated significantly with subject age, duration of deafness, and speech perception scores, consistent with a relationship between PE and neural health. These findings motivate further investigations into site-specific measures of neural health and their application to CI processing strategies.
Collapse
Affiliation(s)
- Tobias Goehring
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK.
| | - Alan Archer-Boyd
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | - John M Deeks
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| | - Julie G Arenberg
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd St., Seattle, WA, 98105, USA
| | - Robert P Carlyon
- Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
6
|
Goldsworthy RL. Temporal envelope cues and simulations of cochlear implant signal processing. SPEECH COMMUNICATION 2019; 109:24-33. [PMID: 39104946 PMCID: PMC11299890 DOI: 10.1016/j.specom.2019.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/07/2024]
Abstract
Conventional signal processing implemented on clinical cochlear implant (CI) sound processors is based on envelope signals extracted from overlapping frequency regions. Conventional strategies do not encode temporal envelope or temporal fine-structure cues with high fidelity. In contrast, several research strategies have been developed recently to enhance the encoding of temporal envelope and fine-structure cues. The present study examines the salience of temporal envelope cues when encoded into vocoder representations of CI signal processing. Normal-hearing listeners were evaluated on measures of speech reception, speech quality ratings, and spatial hearing when listening to vocoder representations of CI signal processing. Conventional vocoder techniques using envelope signals with noise- or tone-excited reconstruction were evaluated in comparison to a novel approach based on impulse-response reconstruction. A variation of this impulse-response approach was based on a research strategy, the Fundamentally Asynchronous Stimulus Timing (FAST) algorithm, designed to improve temporal precision of envelope cues. The results indicate that the introduced impulse-response approach, combined with the FAST algorithm, produces similar results on speech reception measures as the conventional vocoder approaches, while providing significantly better sound quality and spatial hearing outcomes. This novel approach for stimulating how temporal envelope cues are encoded into CI stimulation has potential for examining diverse aspects of hearing, particularly in aspects of musical pitch perception and spatial hearing.
Collapse
|
7
|
Steinmetzger K, Rosen S. The role of envelope periodicity in the perception of masked speech with simulated and real cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2018; 144:885. [PMID: 30180719 DOI: 10.1121/1.5049584] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 07/22/2018] [Indexed: 06/08/2023]
Abstract
In normal hearing, complex tones with pitch-related periodic envelope modulations are far less effective maskers of speech than aperiodic noise. Here, it is shown that this masker-periodicity benefit is diminished in noise-vocoder simulations of cochlear implants (CIs) and further reduced with real CIs. Nevertheless, both listener groups still benefitted significantly from masker periodicity, despite the lack of salient spectral pitch cues. The main reason for the smaller effect observed in CI users is thought to be an even stronger channel interaction than in the CI simulations, which smears out the random envelope modulations that are characteristic for aperiodic sounds. In contrast, neither interferers that were amplitude-modulated at a rate of 10 Hz nor maskers with envelopes specifically designed to reveal the target speech enabled a masking release in CI users. Hence, even at the high signal-to-noise ratios at which they were tested, CI users can still exploit pitch cues transmitted by the temporal envelope of a non-speech masker, whereas slow amplitude modulations of the masker envelope are no longer helpful.
Collapse
Affiliation(s)
- Kurt Steinmetzger
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| | - Stuart Rosen
- Speech, Hearing and Phonetic Sciences, University College London, Chandler House, 2 Wakefield Street, London, WC1N 1PF, United Kingdom
| |
Collapse
|
8
|
Mesnildrey Q, Hilkhuysen G, Macherey O. Pulse-spreading harmonic complex as an alternative carrier for vocoder simulations of cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 139:986-91. [PMID: 26936577 DOI: 10.1121/1.4941451] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Noise- and sine-carrier vocoders are often used to acoustically simulate the information transmitted by a cochlear implant (CI). However, sine-waves fail to mimic the broad spread of excitation produced by a CI and noise-bands contain intrinsic modulations that are absent in CIs. The present study proposes pulse-spreading harmonic complexes (PSHCs) as an alternative acoustic carrier in vocoders. Sentence-in-noise recognition was measured in 12 normal-hearing subjects for noise-, sine-, and PSHC-vocoders. Consistent with the amount of intrinsic modulations present in each vocoder condition, the average speech reception threshold obtained with the PSHC-vocoder was higher than with sine-vocoding but lower than with noise-vocoding.
Collapse
Affiliation(s)
- Quentin Mesnildrey
- LMA-CNRS, UPR 7051, Aix-Marseille Université, Centrale Marseille, 4 impasse Nikola TESLA CS 40006, F-13453, Marseille Cedex 13, France
| | - Gaston Hilkhuysen
- LMA-CNRS, UPR 7051, Aix-Marseille Université, Centrale Marseille, 4 impasse Nikola TESLA CS 40006, F-13453, Marseille Cedex 13, France
| | - Olivier Macherey
- LMA-CNRS, UPR 7051, Aix-Marseille Université, Centrale Marseille, 4 impasse Nikola TESLA CS 40006, F-13453, Marseille Cedex 13, France
| |
Collapse
|
9
|
Schubotz W, Brand T, Kollmeier B, Ewert SD. The Influence of High-Frequency Envelope Information on Low-Frequency Vowel Identification in Noise. PLoS One 2016; 11:e0145610. [PMID: 26730702 PMCID: PMC4701218 DOI: 10.1371/journal.pone.0145610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 12/07/2015] [Indexed: 11/19/2022] Open
Abstract
Vowel identification in noise using consonant-vowel-consonant (CVC) logatomes was used to investigate a possible interplay of speech information from different frequency regions. It was hypothesized that the periodicity conveyed by the temporal envelope of a high frequency stimulus can enhance the use of the information carried by auditory channels in the low-frequency region that share the same periodicity. It was further hypothesized that this acts as a strobe-like mechanism and would increase the signal-to-noise ratio for the voiced parts of the CVCs. In a first experiment, different high-frequency cues were provided to test this hypothesis, whereas a second experiment examined more closely the role of amplitude modulations and intact phase information within the high-frequency region (4–8 kHz). CVCs were either natural or vocoded speech (both limited to a low-pass cutoff-frequency of 2.5 kHz) and were presented in stationary 3-kHz low-pass filtered masking noise. The experimental results did not support the hypothesized use of periodicity information for aiding low-frequency perception.
Collapse
Affiliation(s)
- Wiebke Schubotz
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
- * E-mail:
| | - Thomas Brand
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| | - Stephan D. Ewert
- Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg, Oldenburg, Germany
| |
Collapse
|
10
|
Apoux F, Youngdahl CL, Yoho SE, Healy EW. Dual-carrier processing to convey temporal fine structure cues: Implications for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2015; 138:1469-80. [PMID: 26428784 PMCID: PMC4575322 DOI: 10.1121/1.4928136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/14/2014] [Revised: 07/22/2015] [Accepted: 07/23/2015] [Indexed: 05/26/2023]
Abstract
Speech intelligibility in noise can be degraded by using vocoder processing to alter the temporal fine structure (TFS). Here it is argued that this degradation is not attributable to the loss of speech information potentially present in the TFS. Instead it is proposed that the degradation results from the loss of sound-source segregation information when two or more carriers (i.e., TFS) are substituted with only one as a consequence of vocoder processing. To demonstrate this segregation role, vocoder processing involving two carriers, one for the target and one for the background, was implemented. Because this approach does not preserve the speech TFS, it may be assumed that any improvement in intelligibility can only be a consequence of the preserved carrier duality and associated segregation cues. Three experiments were conducted using this "dual-carrier" approach. All experiments showed substantial sentence intelligibility in noise improvements compared to traditional single-carrier conditions. In several conditions, the improvement was so substantial that intelligibility approximated that for unprocessed speech in noise. A foreseeable and potentially promising implication for the dual-carrier approach involves implementation into cochlear implant speech processors, where it may provide the TFS cues necessary to segregate speech from noise.
Collapse
Affiliation(s)
- Frédéric Apoux
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Carla L Youngdahl
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Sarah E Yoho
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric W Healy
- Speech Psychoacoustics Laboratory, Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
11
|
Mesnildrey Q, Macherey O. Simulating the dual-peak excitation pattern produced by bipolar stimulation of a cochlear implant: effects on speech intelligibility. Hear Res 2014; 319:32-47. [PMID: 25449010 DOI: 10.1016/j.heares.2014.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Revised: 10/28/2014] [Accepted: 11/05/2014] [Indexed: 10/24/2022]
Abstract
Several electrophysiological and psychophysical studies have shown that the spatial excitation pattern produced by bipolar stimulation of a cochlear implant (CI) can have a dual-peak shape. The perceptual effects of this dual-peak shape were investigated using noise-vocoded CI simulations in which synthesis filters were designed to simulate the spread of neural activity produced by various electrode configurations, as predicted by a simple cochlear model. Experiments 1 and 2 tested speech recognition in the presence of a concurrent speech masker for various sets of single-peak and dual-peak synthesis filters and different numbers of channels. Similarly as results obtained in real CIs, both monopolar (MP, single-peak) and bipolar (BP + 1, dual-peak) simulations showed a plateau of performance above 8 channels. The benefit of increasing the number of channels was also lower for BP + 1 than for MP. This shows that channel interactions in BP + 1 become especially deleterious for speech intelligibility when a simulated electrode acts both as an active and as a return electrode for different channels because envelope information from two different analysis bands are being conveyed to the same spectral location. Experiment 3 shows that these channel interactions are even stronger in wide BP configuration (BP + 5), likely because the interfering speech envelopes are less correlated than in narrow BP + 1. Although the exact effects of dual- or multi-peak excitation in real CIs remain to be determined, this series of experiments suggest that multipolar stimulation strategies, such as bipolar or tripolar, should be controlled to avoid neural excitation in the vicinity of the return electrodes.
Collapse
Affiliation(s)
- Quentin Mesnildrey
- LMA-CNRS, UPR 7051, Aix-Marseille Univ., Centrale Marseille, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France.
| | - Olivier Macherey
- LMA-CNRS, UPR 7051, Aix-Marseille Univ., Centrale Marseille, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France
| |
Collapse
|
12
|
Churchill TH, Kan A, Goupell MJ, Ihlefeld A, Litovsky RY. Speech perception in noise with a harmonic complex excited vocoder. J Assoc Res Otolaryngol 2014; 15:265-78. [PMID: 24448721 DOI: 10.1007/s10162-013-0435-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2012] [Accepted: 12/17/2013] [Indexed: 12/01/2022] Open
Abstract
A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Collapse
Affiliation(s)
- Tyler H Churchill
- Waisman Center, University of Wisconsin-Madison, 1500 Highland Avenue #521, Madison, WI, 53705, USA,
| | | | | | | | | |
Collapse
|
13
|
Goldsworthy RL, Shannon RV. Training improves cochlear implant rate discrimination on a psychophysical task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2014; 135:334-341. [PMID: 24437773 PMCID: PMC3985914 DOI: 10.1121/1.4835735] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Revised: 10/19/2013] [Accepted: 10/31/2013] [Indexed: 05/28/2023]
Abstract
The purpose of this study was to determine the extent to which cochlear implant (CI) rate discrimination can be improved through training. Six adult CI users took part in a study that included 32 h of training and assessment on rate discrimination measures. Rate difference limens (DLs) were measured from 110 to 3520 Hz in octave steps using 500 ms biphasic pulse trains; the target and standard stimuli were loudness-balanced with the target always at an adaptively lower rate. DLs were measured at four electrode positions corresponding to basal, mid-basal, mid-apical, and apical locations. Procedural variations were implemented to determine if rate discrimination was impacted by random variations in stimulus amplitude or by amplitude modulation. DLs improved by more than a factor of 2 across subjects, electrodes, and standard rates. Factor analysis indicated that the effect of training was comparable for all electrodes and standard rates tested. Neither level roving nor amplitude modulation had a significant effect on rate DLs. In conclusion, the results demonstrate that training can significantly improve CI rate discrimination on a psychophysical task.
Collapse
Affiliation(s)
- Raymond L Goldsworthy
- Sensimetrics Corporation, Research & Development, 14 Summer Street, Suite 305, Malden, Massachusetts 02148
| | - Robert V Shannon
- House Research Institute, Communications and Auditory Neurosciences, 2100 West 3rd Street, Los Angeles, California 90057
| |
Collapse
|
14
|
Sohoglu E, Peelle JE, Carlyon RP, Davis MH. Top-down influences of written text on perceived clarity of degraded speech. J Exp Psychol Hum Percept Perform 2013; 40:186-99. [PMID: 23750966 PMCID: PMC3906796 DOI: 10.1037/a0033206] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
An unresolved question is how the reported clarity of degraded speech is enhanced when listeners have prior knowledge of speech content. One account of this phenomenon proposes top-down modulation of early acoustic processing by higher-level linguistic knowledge. Alternative, strictly bottom-up accounts argue that acoustic information and higher-level knowledge are combined at a late decision stage without modulating early acoustic processing. Here we tested top-down and bottom-up accounts using written text to manipulate listeners’ knowledge of speech content. The effect of written text on the reported clarity of noise-vocoded speech was most pronounced when text was presented before (rather than after) speech (Experiment 1). Fine-grained manipulation of the onset asynchrony between text and speech revealed that this effect declined when text was presented more than 120 ms after speech onset (Experiment 2). Finally, the influence of written text was found to arise from phonological (rather than lexical) correspondence between text and speech (Experiment 3). These results suggest that prior knowledge effects are time-limited by the duration of auditory echoic memory for degraded speech, consistent with top-down modulation of early acoustic processing by linguistic knowledge.
Collapse
|
15
|
Modulation frequency discrimination with modulated and unmodulated interference in normal hearing and in cochlear-implant users. J Assoc Res Otolaryngol 2013; 14:591-601. [PMID: 23632651 DOI: 10.1007/s10162-013-0391-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2012] [Accepted: 04/08/2013] [Indexed: 10/26/2022] Open
Abstract
Differences in fundamental frequency (F0) provide an important cue for segregating simultaneous sounds. Cochlear implants (CIs) transmit F0 information primarily through the periodicity of the temporal envelope of the electrical pulse trains. Successful segregation of sounds with different F0s requires the ability to process multiple F0s simultaneously, but it is unknown whether CI users have this ability. This study measured modulation frequency discrimination thresholds for half-wave rectified sinusoidal envelopes modulated at 115 Hz in CI users and normal-hearing (NH) listeners. The target modulation was presented in isolation or in the presence of an interferer. Discrimination thresholds were strongly affected by the presence of an interferer, even when it was unmodulated and spectrally remote. Interferer modulation increased interference and often led to very high discrimination thresholds, especially when the interfering modulation frequency was lower than that of the target. Introducing a temporal offset between the interferer and the target led to at best modest improvements in performance in CI users and NH listeners. The results suggest no fundamental difference between acoustic and electric hearing in processing single or multiple envelope-based F0s, but confirm that differences in F0 are unlikely to provide a robust cue for perceptual segregation in CI users.
Collapse
|
16
|
Hervais-Adelman AG, Carlyon RP, Johnsrude IS, Davis MH. Brain regions recruited for the effortful comprehension of noise-vocoded words. ACTA ACUST UNITED AC 2012. [DOI: 10.1080/01690965.2012.662280] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Lazard DS, Marozeau J, McDermott HJ. The sound sensation of apical electric stimulation in cochlear implant recipients with contralateral residual hearing. PLoS One 2012; 7:e38687. [PMID: 22723876 PMCID: PMC3378545 DOI: 10.1371/journal.pone.0038687] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 05/09/2012] [Indexed: 11/19/2022] Open
Abstract
Background Studies using vocoders as acoustic simulators of cochlear implants have generally focused on simulation of speech understanding, gender recognition, or music appreciation. The aim of the present experiment was to study the auditory sensation perceived by cochlear implant (CI) recipients with steady electrical stimulation on the most-apical electrode. Methodology/Principal Findings Five unilateral CI users with contralateral residual hearing were asked to vary the parameters of an acoustic signal played to the non-implanted ear, in order to match its sensation to that of the electric stimulus. They also provided a rating of similarity between each acoustic sound they selected and the electric stimulus. On average across subjects, the sound rated as most similar was a complex signal with a concentration of energy around 523 Hz. This sound was inharmonic in 3 out of 5 subjects with a moderate, progressive increase in the spacing between the frequency components. Conclusions/Significance For these subjects, the sound sensation created by steady electric stimulation on the most-apical electrode was neither a white noise nor a pure tone, but a complex signal with a progressive increase in the spacing between the frequency components in 3 out of 5 subjects. Knowing whether the inharmonic nature of the sound was related to the fact that the non-implanted ear was impaired has to be explored in single-sided deafened patients with a contralateral CI. These results may be used in the future to better understand peripheral and central auditory processing in relation to cochlear implants.
Collapse
|
18
|
Szenkovits G, Peelle JE, Norris D, Davis MH. Individual differences in premotor and motor recruitment during speech perception. Neuropsychologia 2012; 50:1380-92. [DOI: 10.1016/j.neuropsychologia.2012.02.023] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2011] [Revised: 12/13/2011] [Accepted: 02/25/2012] [Indexed: 10/28/2022]
|
19
|
Chen J, Li H, Li L, Wu X, Moore BCJ. Informational masking of speech produced by speech-like sounds without linguistic content. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:2914-26. [PMID: 22501069 DOI: 10.1121/1.3688510] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.
Collapse
Affiliation(s)
- Jing Chen
- Department of Machine Intelligence, Speech and Hearing Research Center, and Key Laboratory of Machine Perception, Ministry of Education, Peking University, Beijing 100871, People's Republic of China
| | | | | | | | | |
Collapse
|
20
|
Strydom T, Hanekom JJ. The performance of different synthesis signals in acoustic models of cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:920-933. [PMID: 21361449 DOI: 10.1121/1.3518760] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.
Collapse
Affiliation(s)
- Trudie Strydom
- Department of Electrical, Electronic, and Computer Engineering, University of Pretoria, Pretoria 0002, South Africa
| | | |
Collapse
|
21
|
Summers RJ, Bailey PJ, Roberts B. Effects of differences in fundamental frequency on across-formant grouping in speech perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:3667-3677. [PMID: 21218899 DOI: 10.1121/1.3505119] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (ΔF0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (ΔF0 = ± 8, ± 2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ΔF0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping "primitive" based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.
Collapse
Affiliation(s)
- Robert J Summers
- School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| | | | | |
Collapse
|
22
|
Kwon BJ. Effects of electrode separation between speech and noise signals on consonant identification in cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:3258-3267. [PMID: 20000939 PMCID: PMC2803724 DOI: 10.1121/1.3257200] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Revised: 08/31/2009] [Accepted: 09/25/2009] [Indexed: 05/26/2023]
Abstract
The aim of the present study was to examine cochlear implant (CI) users' perceptual segregation of speech from background noise with differing degrees of electrode separation between speech and noise. Eleven users of the nucleus CI system were tested on consonant identification using an experimental processing scheme called "multi-stream processing" in which speech and noise stimuli were processed separately and interleaved. Speech was presented to either ten (every other electrode) or six electrodes (every fourth electrode). Noise was routed to either the same (the "overlapped" condition) or a different set of electrodes (the "interlaced" condition), where speech and noise electrodes were separated by one- and two-electrode spacings for ten- and six-electrode presentations, respectively. Results indicated a small but significant improvement in consonant recognition (5%-10%) in the interlaced condition with a two-electrode spacing (approximately 1.1 mm) in two subjects. It appears that the results were influenced by peripheral channel interactions, partially accounting for individual variability. Although the overall effect was small and observed from a small number of subjects, the present study demonstrated that CI users' performance on segregating the target from the background might be improved if these sounds were presented with sufficient peripheral separation.
Collapse
Affiliation(s)
- Bom Jun Kwon
- Department of Communication Sciences and Disorders, University of Utah, 390 S 1530 E, Salt Lake City, Utah 84112, USA.
| |
Collapse
|
23
|
Cooper HR, Roberts B. Simultaneous grouping in cochlear implant listeners: can abrupt changes in level be used to segregate components from a complex tone? J Assoc Res Otolaryngol 2009; 11:89-100. [PMID: 19826870 DOI: 10.1007/s10162-009-0190-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2009] [Accepted: 09/21/2009] [Indexed: 12/01/2022] Open
Abstract
A sudden increase in the amplitude of a component often causes its segregation from a complex tone, and shorter rise times enhance this effect. We explored whether this also occurs in implant listeners (n = 8). Condition 1 used a 3.5-s "complex tone" comprising concurrent stimulation on five electrodes distributed across the array of the Nucleus CI24 implant. For each listener, the baseline stimulus level on each electrode was set at 50% of the dynamic range (DR). Two 1-s increments of 12.5%, 25%, or 50% DR were introduced in succession on adjacent electrodes within the "inner" three of those activated. Both increments had rise and fall times of 30 and 970 ms or vice versa. Listeners reported which increment was higher in pitch. Some listeners performed above chance for all increment sizes, but only for 50% increments did all listeners perform above chance. No significant effect of rise time was found. Condition 2 replaced amplitude increments with decrements. Only three listeners performed above chance even for 50% decrements. One exceptional listener performed well for 50% decrements with fall and rise times of 970 and 30 ms but around chance for fall and rise times of 30 and 970 ms, indicating successful discrimination based on a sudden rise back to baseline stimulation. Overall, the results suggest that implant listeners can use amplitude changes against a constant background to pick out components from a complex, but generally these must be large compared with those required in normal hearing. For increments, performance depended mainly on above-baseline stimulation of the target electrodes, not rise time. With one exception, performance for decrements was typically very poor.
Collapse
Affiliation(s)
- Huw R Cooper
- Psychology, School of Life and Health Sciences, Aston University, Birmingham, B4 7ET, UK.
| | | |
Collapse
|
24
|
Oxenham AJ, Simonson AM. Masking release for low- and high-pass-filtered speech in the presence of noise and single-talker interference. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 125:457-68. [PMID: 19173431 PMCID: PMC2677273 DOI: 10.1121/1.3021299] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Speech intelligibility was measured for sentences presented in spectrally matched steady noise, single-talker interference, or speech-modulated noise. The stimuli were unfiltered or were low-pass (LP) (1200 Hz cutoff) or high-pass (HP) (1500 Hz cutoff) filtered. The cutoff frequencies were selected to produce equal performance in both LP and HP conditions in steady noise and to limit access to the temporal fine structure of resolved harmonics in the HP conditions. Masking release, or the improvement in performance between the steady noise and single-talker interference, was substantial with no filtering. Under LP and HP filtering, masking release was roughly equal but was much less than in unfiltered conditions. When the average F0 of the interferer was shifted lower than that of the target, similar increases in masking release were observed under LP and HP filtering. Similar LP and HP results were also obtained for the speech-modulated-noise masker. The findings are not consistent with the idea that pitch conveyed by the temporal fine structure of low-order harmonics plays a crucial role in masking release. Instead, any reduction in speech redundancy, or manipulation that increases the target-to-masker ratio necessary for intelligibility to beyond around 0 dB, may result in reduced masking release.
Collapse
Affiliation(s)
- Andrew J Oxenham
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota 55455, USA.
| | | |
Collapse
|
25
|
Oxenham AJ. Pitch perception and auditory stream segregation: implications for hearing loss and cochlear implants. Trends Amplif 2008; 12:316-31. [PMID: 18974203 PMCID: PMC2901529 DOI: 10.1177/1084713808325881] [Citation(s) in RCA: 140] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Pitch is important for speech and music perception, and may also play a crucial role in our ability to segregate sounds that arrive from different sources. This article reviews some basic aspects of pitch coding in the normal auditory system and explores the implications for pitch perception in people with hearing impairments and cochlear implants. Data from normal-hearing listeners suggest that the low-frequency, low-numbered harmonics within complex tones are of prime importance in pitch perception and in the perceptual segregation of competing sounds. The poorer frequency selectivity experienced by many hearing-impaired listeners leads to less access to individual harmonics, and the coding schemes currently employed in cochlear implants provide little or no representation of individual harmonics. These deficits in the coding of harmonic sounds may underlie some of the difficulties experienced by people with hearing loss and cochlear implants, and may point to future areas where sound representation in auditory prostheses could be improved.
Collapse
|
26
|
Abstract
A common complaint among listeners with hearing loss (HL) is that they have difficulty communicating in common social settings. This article reviews how normal-hearing listeners cope in such settings, especially how they focus attention on a source of interest. Results of experiments with normal-hearing listeners suggest that the ability to selectively attend depends on the ability to analyze the acoustic scene and to form perceptual auditory objects properly. Unfortunately, sound features important for auditory object formation may not be robustly encoded in the auditory periphery of HL listeners. In turn, impaired auditory object formation may interfere with the ability to filter out competing sound sources. Peripheral degradations are also likely to reduce the salience of higher-order auditory cues such as location, pitch, and timbre, which enable normal-hearing listeners to select a desired sound source out of a sound mixture. Degraded peripheral processing is also likely to increase the time required to form auditory objects and focus selective attention so that listeners with HL lose the ability to switch attention rapidly (a skill that is particularly important when trying to participate in a lively conversation). Finally, peripheral deficits may interfere with strategies that normal-hearing listeners employ in complex acoustic settings, including the use of memory to fill in bits of the conversation that are missed. Thus, peripheral hearing deficits are likely to cause a number of interrelated problems that challenge the ability of HL listeners to communicate in social settings requiring selective attention.
Collapse
Affiliation(s)
- Barbara G Shinn-Cunningham
- Hearing Research Center, Departments of Cognitive and Neural Systems and Biomedical Engineering, Boston University, Boston, MA 02421, USA.
| | | |
Collapse
|
27
|
Gaudrain E, Grimault N, Healy EW, Béra JC. Streaming of vowel sequences based on fundamental frequency in a cochlear-implant simulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:3076-87. [PMID: 19045793 PMCID: PMC2677355 DOI: 10.1121/1.2988289] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2007] [Revised: 08/21/2008] [Accepted: 08/22/2008] [Indexed: 05/27/2023]
Abstract
Cochlear-implant (CI) users often have difficulties perceiving speech in noisy environments. Although this problem likely involves auditory scene analysis, few studies have examined sequential segregation in CI listening situations. The present study aims to assess the possible role of fundamental frequency (F(0)) cues for the segregation of vowel sequences, using a noise-excited envelope vocoder that simulates certain aspects of CI stimulation. Obligatory streaming was evaluated using an order-naming task in two experiments involving normal-hearing subjects. In the first experiment, it was found that streaming did not occur based on F(0) cues when natural-duration vowels were processed to reduce spectral cues using the vocoder. In the second experiment, shorter duration vowels were used to enhance streaming. Under these conditions, F(0)-related streaming appeared even when vowels were processed to reduce spectral cues. However, the observed segregation could not be convincingly attributed to temporal periodicity cues. A subsequent analysis of the stimuli revealed that an F(0)-related spectral cue could have elicited the observed segregation. Thus, streaming under conditions of severely reduced spectral cues, such as those associated with CIs, may potentially occur as a result of this particular cue.
Collapse
Affiliation(s)
- Etienne Gaudrain
- Neurosciences Sensorielles, Comportement, Cognition, CNRS UMR 5020, Universite Lyon 1, 50 Avenue Tony Garnier, 69366 Lyon Cedex 07, France
| | | | | | | |
Collapse
|
28
|
Stone MA, Füllgrabe C, Moore BCJ. Benefit of high-rate envelope cues in vocoder processing: effect of number of channels and spectral region. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2008; 124:2272-82. [PMID: 19062865 DOI: 10.1121/1.2968678] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
In cochlear implants, or vocoder simulations of cochlear implants, the transmission of envelope cues at high rates (related to voice fundamental frequency, f0) may be limited by the widths of the filters used to form the channels and/or by the cutoff frequency, f(lp), of the low-pass filters used for envelope extraction. The effect of varying f(lp) in tone and noise vocoders was investigated for channel numbers, N, from 6 to 18. As N increased, the widths of the channels decreased. The value of f(lp) was 45 Hz (envelope or "E" filter), or 180 Hz (pitch or "P" filter). The following combinations of cutoff frequencies were used for channels below and above 1500 Hz, respectively: EE, PE, EP, and PP. Results from a competing-talker task showed that the tone vocoder led to better intelligibility than the noise vocoder. The PP condition led to the best intelligibility and the EE condition to the worst. For N=6, intelligibility was better for condition PE than for condition EP. For N=18, the reverse was true. The results indicate that the channel bandwidths can compromise the transmission of f0-related envelope information, and suggest that vocoder simulations of cochlear-implant processing have limitations.
Collapse
Affiliation(s)
- Michael A Stone
- Department of Experimental Psychology, University of Cambridge, Downing Street, Cambridge CB2 3EB, United Kingdom.
| | | | | |
Collapse
|
29
|
Larsen E, Cedolin L, Delgutte B. Pitch representations in the auditory nerve: two concurrent complex tones. J Neurophysiol 2008; 100:1301-19. [PMID: 18632887 PMCID: PMC2544468 DOI: 10.1152/jn.01361.2007] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Pitch differences between concurrent sounds are important cues used in auditory scene analysis and also play a major role in music perception. To investigate the neural codes underlying these perceptual abilities, we recorded from single fibers in the cat auditory nerve in response to two concurrent harmonic complex tones with missing fundamentals and equal-amplitude harmonics. We investigated the efficacy of rate-place and interspike-interval codes to represent both pitches of the two tones, which had fundamental frequency (F0) ratios of 15/14 or 11/9. We relied on the principle of scaling invariance in cochlear mechanics to infer the spatiotemporal response patterns to a given stimulus from a series of measurements made in a single fiber as a function of F0. Templates created by a peripheral auditory model were used to estimate the F0s of double complex tones from the inferred distribution of firing rate along the tonotopic axis. This rate-place representation was accurate for F0s greater, similar900 Hz. Surprisingly, rate-based F0 estimates were accurate even when the two-tone mixture contained no resolved harmonics, so long as some harmonics were resolved prior to mixing. We also extended methods used previously for single complex tones to estimate the F0s of concurrent complex tones from interspike-interval distributions pooled over the tonotopic axis. The interval-based representation was accurate for F0s less, similar900 Hz, where the two-tone mixture contained no resolved harmonics. Together, the rate-place and interval-based representations allow accurate pitch perception for concurrent sounds over the entire range of human voice and cat vocalizations.
Collapse
Affiliation(s)
- Erik Larsen
- Eaton-Peabody Laboratory, Massachusetts Eye and Ear Infirmary, Boston, MA, USA
| | | | | |
Collapse
|
30
|
Kong YY, Carlyon RP. Improved speech recognition in noise in simulated binaurally combined acoustic and electric stimulation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2007; 121:3717-27. [PMID: 17552722 DOI: 10.1121/1.2717408] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
Speech recognition in noise improves with combined acoustic and electric stimulation compared to electric stimulation alone [Kong et al., J. Acoust. Soc. Am. 117, 1351-1361 (2005)]. Here the contribution of fundamental frequency (F0) and low-frequency phonetic cues to speech recognition in combined hearing was investigated. Normal-hearing listeners heard vocoded speech in one ear and low-pass (LP) filtered speech in the other. Three listening conditions (vocode-alone, LP-alone, combined) were investigated. Target speech (average F0=120 Hz) was mixed with a time-reversed masker (average F0=172 Hz) at three signal-to-noise ratios (SNRs). LP speech aided performance at all SNRs. Low-frequency phonetic cues were then removed by replacing the LP speech with a LP equal-amplitude harmonic complex, frequency and amplitude modulated by the F0 and temporal envelope of voiced segments of the target. The combined hearing advantage disappeared at 10 and 15 dB SNR, but persisted at 5 dB SNR. A similar finding occurred when, additionally, F0 contour cues were removed. These results are consistent with a role for low-frequency phonetic cues, but not with a combination of F0 information between the two ears. The enhanced performance at 5 dB SNR with F0 contour cues absent suggests that voicing or glimpsing cues may be responsible for the combined hearing benefit.
Collapse
Affiliation(s)
- Ying-Yee Kong
- MRC-Cognition & Brain Sciences Unit, 15 Chaucer Road, Cambridge CB2 2EF, United Kingdom.
| | | |
Collapse
|
31
|
Carlyon RP, Long CJ, Deeks JM, McKay CM. Concurrent sound segregation in electric and acoustic hearing. J Assoc Res Otolaryngol 2007; 8:119-33. [PMID: 17216383 PMCID: PMC2538412 DOI: 10.1007/s10162-006-0068-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2006] [Accepted: 12/04/2006] [Indexed: 10/23/2022] Open
Abstract
We investigated potential cues to sound segregation by cochlear implant (CI) and normal-hearing (NH) listeners. In each presentation interval of experiment 1a, CI listeners heard a mixture of four pulse trains applied concurrently to separate electrodes, preceded by a "probe" applied to a single electrode. In one of these two intervals, which the subject had to identify, the probe electrode was the same as a "target" electrode in the mixture. The pulse train on the target electrode had a higher level than the others in the mixture. Additionally, it could be presented either with a 200-ms onset delay, at a lower rate, or with an asynchrony produced by delaying each pulse by about 5 ms re those on the nontarget electrodes. Neither the rate difference nor the asynchrony aided performance over and above the level difference alone, but the onset delay produced a modest improvement. Experiment 1b showed that two subjects could perform the task using the onset delay alone, with no level difference. Experiment 2 used a method similar to that of experiment 1, but investigated the onset cue using NH listeners. In one condition, the mixture consisted of harmonics 5 to 40 of a 100-Hz fundamental, with the onset of either harmonics 13 to 17 or 26 to 30 delayed re the rest. Performance was modest in this condition, but could be improved markedly by using stimuli containing a spectral gap between the target and nontarget harmonics. The results suggest that (a) CI users are unlikely to use temporal pitch differences between adjacent channels to separate concurrent sounds, and that (b) they can use onset differences between channels, but the usefulness of this cue will be compromised by the spread of excitation along the nerve-fiber array. This deleterious effect of spread-of-excitation can also impair the use of onset cues by NH listeners.
Collapse
Affiliation(s)
- Robert P Carlyon
- MRC Cognition & Brain Sciences Unit, 15 Chaucer Rd, Cambridge, CB2 7EF, England.
| | | | | | | |
Collapse
|
32
|
Affiliation(s)
- Colette M McKay
- School of Life and Health Sciences, Aston University, Birmingham B4 7ET, United Kingdom
| |
Collapse
|
33
|
Qin MK, Oxenham AJ. Effects of introducing unprocessed low-frequency information on the reception of envelope-vocoder processed speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:2417-26. [PMID: 16642854 DOI: 10.1121/1.2178719] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
This study investigated the benefits of adding unprocessed low-frequency information to acoustic simulations of cochlear-implant processing in normal-hearing listeners. Implant processing was simulated using an eight-channel noise-excited envelope vocoder, and low-frequency information was added by replacing the lower frequency channels of the processor with a low-pass-filtered version of the original stimulus. Experiment 1 measured sentence-level speech reception as a function of target-to-masker ratio, with either steady-state speech-shaped noise or single-talker maskers. Experiment 2 measured listeners' ability to identify two vowels presented simultaneously, as a function of the F0 difference between the two vowels. In both experiments low-frequency information was added below either 300 or 600 Hz. The introduction of the additional low-frequency information led to substantial and significant improvements in performance in both experiments, with a greater improvement observed for the higher (600 Hz) than for the lower (300 Hz) cutoff frequency. However, performance never equaled performance in the unprocessed conditions. The results confirm other recent demonstrations that added low-frequency information can provide significant benefits in intelligibility, which may at least in part be attributed to improvements in F0 representation. The findings provide further support for efforts to make use of residual acoustic hearing in cochlear-implant users.
Collapse
Affiliation(s)
- Michael K Qin
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | |
Collapse
|
34
|
Qin MK, Oxenham AJ. Effects of envelope-vocoder processing on F0 discrimination and concurrent-vowel identification. Ear Hear 2006; 26:451-60. [PMID: 16230895 DOI: 10.1097/01.aud.0000179689.79868.06] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
OBJECTIVE The aim of this study was to examine the effects of envelope-vocoder sound processing on listeners' ability to discriminate changes in fundamental frequency (F0) in anechoic and reverberant conditions and on their ability to identify concurrent vowels based on differences in F0. DESIGN In the first experiment, F0 difference limens (F0DLs) were measured as a function of number of envelope-vocoder frequency channels (1, 4, 8, 24, and 40 channels, and unprocessed) in four normal-hearing listeners, with degree of simulated reverberation (no, mild, and severe reverberation) as a parameter. In the second experiment, vowel identification was measured as a function of the F0 difference between two simultaneous vowels in six normal-hearing listeners, with the number of vocoder channels (8 and 24 channels, and unprocessed) as a parameter. RESULTS Reverberation was detrimental to F0 discrimination in conditions with fewer numbers of vocoder channels. Despite the reasonable F0DLs (<1 semitone) with 24- and 8-channel vocoder processing, listeners were unable to benefit from F0 differences between the competing vowels in the concurrent-vowel paradigm. CONCLUSIONS The overall detrimental effects of vocoder processing are probably are due to the poor spectral representation of the lower-order harmonics. The F0 information carried in the temporal envelope is weak, susceptible to reverberation, and may not suffice for source segregation. To the extent that vocoder processing simulates cochlear implant processing, users of current implant processing schemes are unlikely to benefit from F0 differences between competing talkers when listening to speech in complex environments. The results provide further incentive for finding a way to make the information from low-order, resolved harmonics available to cochlear implant users.
Collapse
Affiliation(s)
- Michael K Qin
- Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
| | | |
Collapse
|
35
|
Laneau J, Moonen M, Wouters J. Factors affecting the use of noise-band vocoders as acoustic models for pitch perception in cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2006; 119:491-506. [PMID: 16454303 DOI: 10.1121/1.2133391] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
Although in a number of experiments noise-band vocoders have been shown to provide acoustic models for speech perception in cochlear implants (CI), the present study assesses in four experiments whether and under what limitations noise-band vocoders can be used as an acoustic model for pitch perception in CI. The first two experiments examine the effect of spectral smearing on simulated electrode discrimination and fundamental frequency (FO) discrimination. The third experiment assesses the effect of spectral mismatch in an FO-discrimination task with two different vocoders. The fourth experiment investigates the effect of amplitude compression on modulation rate discrimination. For each experiment, the results obtained from normal-hearing subjects presented with vocoded stimuli are compared to results obtained directly from CI recipients. The results show that place pitch sensitivity drops with increased spectral smearing and that place pitch cues for multi-channel stimuli can adequately be mimicked when the discriminability of adjacent channels is adjusted by varying the spectral slopes to match that of CI subjects. The results also indicate that temporal pitch sensitivity is limited for noise-band carriers with low center frequencies and that the absence of a compression function in the vocoder might alter the saliency of the temporal pitch cues.
Collapse
Affiliation(s)
- Johan Laneau
- Laboratory for Experimental ORL, K.U.Leuven, Kapucijnenvoer 33, B 3000 Leuven, Belgium.
| | | | | |
Collapse
|