1
|
Choi HJ, Kyong JS, Won JH, Shim HJ. Effect of spectral degradation on speech intelligibility and cortical representation. Front Neurosci 2024; 18:1368641. [PMID: 38646607 PMCID: PMC11027739 DOI: 10.3389/fnins.2024.1368641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Accepted: 03/25/2024] [Indexed: 04/23/2024] Open
Abstract
Noise-vocoded speech has long been used to investigate how acoustic cues affect speech understanding. Studies indicate that reducing the number of spectral channel bands diminishes speech intelligibility. Despite previous studies examining the channel band effect using earlier event-related potential (ERP) components, such as P1, N1, and P2, a clear consensus or understanding remains elusive. Given our hypothesis that spectral degradation affects higher-order processing of speech understanding beyond mere perception, we aimed to objectively measure differences in higher-order abilities to discriminate or interpret meaning. Using an oddball paradigm with speech stimuli, we examined how neural signals correlate with the evaluation of speech stimuli based on the number of channel bands measuring N2 and P3b components. In 20 young participants with normal hearing, we measured speech intelligibility and N2 and P3b responses using a one-syllable task paradigm with animal and non-animal stimuli across four vocoder conditions with 4, 8, 16, or 32 channel bands. Behavioral data from word repetition clearly affected the number of channel bands, and all pairs were significantly different (p < 0.001). We also observed significant effects of the number of channels on the peak amplitude [F(2.006, 38.117) = 9.077, p < 0.001] and peak latency [F(3, 57) = 26.642, p < 0.001] of the N2 component. Similarly, the P3b component showed significant main effects of the number of channel bands on the peak amplitude [F(2.231, 42.391) = 13.045, p < 0.001] and peak latency [F(3, 57) = 2.968, p = 0.039]. In summary, our findings provide compelling evidence that spectral channel bands profoundly influence cortical speech processing, as reflected in the N2 and P3b components, a higher-order cognitive process. We conclude that spectrally degraded one-syllable speech primarily affects cortical responses during semantic integration.
Collapse
Affiliation(s)
- Hyo Jung Choi
- Department of Otorhinolaryngology-Head and Neck Surgery, Nowon Eulji Medical Center, Eulji University School of Medicine, Seoul, Republic of Korea
- Eulji Tinnitus and Hearing Research Institute, Nowon Eulji Medical Center, Seoul, Republic of Korea
| | - Jeong-Sug Kyong
- Sensory-Organ Research Institute, Medical Research Center, Seoul National University School of Medicine, Seoul, Republic of Korea
- Department of Radiology, Konkuk University Medical Center, Seoul, Republic of Korea
| | - Jong Ho Won
- Hyman, Phelps and McNamara, P.C., Washington, DC, United States
| | - Hyun Joon Shim
- Department of Otorhinolaryngology-Head and Neck Surgery, Nowon Eulji Medical Center, Eulji University School of Medicine, Seoul, Republic of Korea
- Eulji Tinnitus and Hearing Research Institute, Nowon Eulji Medical Center, Seoul, Republic of Korea
| |
Collapse
|
2
|
Cychosz M, Winn MB, Goupell MJ. How to vocode: Using channel vocoders for cochlear-implant research. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2407-2437. [PMID: 38568143 PMCID: PMC10994674 DOI: 10.1121/10.0025274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 02/14/2024] [Accepted: 02/23/2024] [Indexed: 04/05/2024]
Abstract
The channel vocoder has become a useful tool to understand the impact of specific forms of auditory degradation-particularly the spectral and temporal degradation that reflect cochlear-implant processing. Vocoders have many parameters that allow researchers to answer questions about cochlear-implant processing in ways that overcome some logistical complications of controlling for factors in individual cochlear implant users. However, there is such a large variety in the implementation of vocoders that the term "vocoder" is not specific enough to describe the signal processing used in these experiments. Misunderstanding vocoder parameters can result in experimental confounds or unexpected stimulus distortions. This paper highlights the signal processing parameters that should be specified when describing vocoder construction. The paper also provides guidance on how to determine vocoder parameters within perception experiments, given the experimenter's goals and research questions, to avoid common signal processing mistakes. Throughout, we will assume that experimenters are interested in vocoders with the specific goal of better understanding cochlear implants.
Collapse
Affiliation(s)
- Margaret Cychosz
- Department of Linguistics, University of California, Los Angeles, Los Angeles, California 90095, USA
| | - Matthew B Winn
- Department of Speech-Language-Hearing Sciences, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, College Park, Maryland 20742, USA
| |
Collapse
|
3
|
Levin M, Zaltz Y. Voice Discrimination in Quiet and in Background Noise by Simulated and Real Cochlear Implant Users. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2023; 66:5169-5186. [PMID: 37992412 DOI: 10.1044/2023_jslhr-23-00019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2023]
Abstract
PURPOSE Cochlear implant (CI) users demonstrate poor voice discrimination (VD) in quiet conditions based on the speaker's fundamental frequency (fo) and formant frequencies (i.e., vocal-tract length [VTL]). Our purpose was to examine the effect of background noise at levels that allow good speech recognition thresholds (SRTs) on VD via acoustic CI simulations and CI hearing. METHOD Forty-eight normal-hearing (NH) listeners who listened via noise-excited (n = 20) or sinewave (n = 28) vocoders and 10 prelingually deaf CI users (i.e., whose hearing loss began before language acquisition) participated in the study. First, the signal-to-noise ratio (SNR) that yields 70.7% correct SRT was assessed using an adaptive sentence-in-noise test. Next, the CI simulation listeners performed 12 adaptive VDs: six in quiet conditions, two with each cue (fo, VTL, fo + VTL), and six amid speech-shaped noise. The CI participants performed six VDs: one with each cue, in quiet and amid noise. SNR at VD testing was 5 dB higher than the individual's SRT in noise (SRTn +5 dB). RESULTS Results showed the following: (a) Better VD was achieved via the noise-excited than the sinewave vocoder, with the noise-excited vocoder better mimicking CI VD; (b) background noise had a limited negative effect on VD, only for the CI simulation listeners; and (c) there was a significant association between SNR at testing and VTL VD only for the CI simulation listeners. CONCLUSIONS For NH listeners who listen to CI simulations, noise that allows good SRT can nevertheless impede VD, probably because VD depends more on bottom-up sensory processing. Conversely, for prelingually deaf CI users, noise that allows good SRT hardly affects VD, suggesting that they rely strongly on bottom-up processing for both VD and speech recognition.
Collapse
Affiliation(s)
- Michal Levin
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Faculty of Medicine, Tel Aviv University, Israel
| | - Yael Zaltz
- Department of Communication Disorders, The Stanley Steyer School of Health Professions, Faculty of Medicine, Tel Aviv University, Israel
- Sagol School of Neuroscience, Tel Aviv University, Israel
| |
Collapse
|
4
|
Xu C, Cheng FY, Medina S, Eng E, Gifford R, Smith S. Objective discrimination of bimodal speech using frequency following responses. Hear Res 2023; 437:108853. [PMID: 37441879 DOI: 10.1016/j.heares.2023.108853] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 07/03/2023] [Accepted: 07/08/2023] [Indexed: 07/15/2023]
Abstract
Bimodal hearing, in which a contralateral hearing aid is combined with a cochlear implant (CI), provides greater speech recognition benefits than using a CI alone. Factors predicting individual bimodal patient success are not fully understood. Previous studies have shown that bimodal benefits may be driven by a patient's ability to extract fundamental frequency (f0) and/or temporal fine structure cues (e.g., F1). Both of these features may be represented in frequency following responses (FFR) to bimodal speech. Thus, the goals of this study were to: 1) parametrically examine neural encoding of f0 and F1 in simulated bimodal speech conditions; 2) examine objective discrimination of FFRs to bimodal speech conditions using machine learning; 3) explore whether FFRs are predictive of perceptual bimodal benefit. Three vowels (/ε/, /i/, and /ʊ/) with identical f0 were manipulated by a vocoder (right ear) and low-pass filters (left ear) to create five bimodal simulations for evoking FFRs: Vocoder-only, Vocoder +125 Hz, Vocoder +250 Hz, Vocoder +500 Hz, and Vocoder +750 Hz. Perceptual performance on the BKB-SIN test was also measured using the same five configurations. Results suggested that neural representation of f0 and F1 FFR components were enhanced with increasing acoustic bandwidth in the simulated "non-implanted" ear. As spectral differences between vowels emerged in the FFRs with increased acoustic bandwidth, FFRs were more accurately classified and discriminated using a machine learning algorithm. Enhancement of f0 and F1 neural encoding with increasing bandwidth were collectively predictive of perceptual bimodal benefit on a speech-in-noise task. Given these results, FFR may be a useful tool to objectively assess individual variability in bimodal hearing.
Collapse
Affiliation(s)
- Can Xu
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Fan-Yin Cheng
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Sarah Medina
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - Erica Eng
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA
| | - René Gifford
- Department of Speech, Language, and Hearing Sciences, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Spencer Smith
- Department of Speech, Language, and Hearing Sciences, University of Texas at Austin, 2504A Whitis Ave. (A1100), Austin 78712-0114, TX, USA.
| |
Collapse
|
5
|
Ananthakrishnan S, Luo X. Effects of Temporal Envelope Cutoff Frequency, Number of Channels, and Carrier Type on Brainstem Neural Representation of Pitch in Vocoded Speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:3146-3164. [PMID: 35944032 DOI: 10.1044/2022_jslhr-21-00576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
PURPOSE The objective of this study was to determine if and how the subcortical neural representation of pitch cues in listeners with normal hearing is affected by systematic manipulation of vocoder parameters. METHOD This study assessed the effects of temporal envelope cutoff frequency (50 and 500 Hz), number of channels (1-32), and carrier type (sine-wave and noise-band) on brainstem neural representation of fundamental frequency (f o) in frequency-following responses (FFRs) to vocoded vowels of 15 young adult listeners with normal hearing. RESULTS Results showed that FFR f o strength (quantified as absolute f o magnitude divided by noise floor [NF] magnitude) significantly improved with 500-Hz vs. 50-Hz temporal envelopes for all channel numbers and both carriers except the 1-channel noise-band vocoder. FFR f o strength with 500-Hz temporal envelopes significantly improved when the channel number increased from 1 to 2, but it either declined (sine-wave vocoders) or saturated (noise-band vocoders) when the channel number increased from 4 to 32. FFR f o strength with 50-Hz temporal envelopes was similarly small for both carriers with all channel numbers, except for a significant improvement with the 16-channel sine-wave vocoder. With 500-Hz temporal envelopes, FFR f o strength was significantly greater for sine-wave vocoders than for noise-band vocoders with channel numbers 1-8; no significant differences were seen with 16 and 32 channels. With 50-Hz temporal envelopes, the carrier effect was only observed with 16 channels. In contrast, there was no significant carrier effect for the absolute f o magnitude. Compared to sine-wave vocoders, noise-band vocoders had a higher NF and thus lower relative FFR f o strength. CONCLUSIONS It is important to normalize the f o magnitude relative to the NF when analyzing the FFRs to vocoded speech. The physiological findings reported here may result from the availability of f o-related temporal periodicity and spectral sidelobes in vocoded signals and should be considered when selecting vocoder parameters and interpreting results in future physiological studies. In general, the dependence of brainstem neural phase-locking strength to f o on vocoder parameters may confound the comparison of pitch-related behavioral results across different vocoder designs.
Collapse
Affiliation(s)
| | - Xin Luo
- Program of Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe
| |
Collapse
|
6
|
Rao R, Shen H. Onchidium reevesii may be able to distinguish low-frequency sound to discriminate the state of tides. MOLLUSCAN RESEARCH 2022. [DOI: 10.1080/13235818.2022.2065439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Rongcheng Rao
- National Experimental Teaching Demonstration Center, Shanghai Key Laboratory of Systematic Classification and Evolution of Marine Animals, Shanghai Ocean University, Shanghai, People’s Republic of China
- Shanghai Collaborative Innovation Center for Cultivating Elite Breeds and Green-culture of Aquaculture animals, Shanghai, People’s Republic of China
| | - Heding Shen
- National Experimental Teaching Demonstration Center, Shanghai Key Laboratory of Systematic Classification and Evolution of Marine Animals, Shanghai Ocean University, Shanghai, People’s Republic of China
- Shanghai Collaborative Innovation Center for Cultivating Elite Breeds and Green-culture of Aquaculture animals, Shanghai, People’s Republic of China
| |
Collapse
|
7
|
Kim S, Chou HH, Luo X. Mandarin tone recognition training with cochlear implant simulation: Amplitude envelope enhancement and cue weighting. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:1218. [PMID: 34470277 DOI: 10.1121/10.0005878] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Accepted: 07/22/2021] [Indexed: 06/13/2023]
Abstract
With limited fundamental frequency (F0) cues, cochlear implant (CI) users recognize Mandarin tones using amplitude envelope. This study investigated whether tone recognition training with amplitude envelope enhancement may improve tone recognition and cue weighting with CIs. Three groups of CI-simulation listeners received training using vowels with amplitude envelope modified to resemble F0 contour (enhanced-amplitude-envelope training), training using natural vowels (natural-amplitude-envelope training), and exposure to natural vowels without training, respectively. Tone recognition with natural and enhanced amplitude envelope cues and cue weighting of amplitude envelope and F0 contour were measured in pre-, post-, and retention-tests. It was found that with similar pre-test performance, both training groups had better tone recognition than the no-training group after training. Only enhanced-amplitude-envelope training increased the benefits of amplitude envelope enhancement in the post- and retention-tests than in the pre-test. Neither training paradigm increased the cue weighting of amplitude envelope and F0 contour more than stimulus exposure. Listeners attending more to amplitude envelope in the pre-test tended to have better tone recognition with enhanced amplitude envelope cues before training and improve more in tone recognition after enhanced-amplitude-envelope training. The results suggest that auditory training and speech enhancement may bring maximum benefits to CI users when combined.
Collapse
Affiliation(s)
- Seeon Kim
- Program of Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona 85287, USA
| | - Hsiao-Hsiuan Chou
- Program of Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona 85287, USA
| | - Xin Luo
- Program of Speech and Hearing Science, College of Health Solutions, Arizona State University, Tempe, Arizona 85287, USA
| |
Collapse
|
8
|
Griz SMS, Menezes DC, Advíncula KP, Lima MADL, Menezes PDL. Forward masking with frequency-following response analyses. REVISTA CEFAC 2021. [DOI: 10.1590/1982-0216/20212321220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
9
|
Abstract
OBJECTIVES There is increasing interest in using the frequency following response (FFR) to describe the effects of varying different aspects of hearing aid signal processing on brainstem neural representation of speech. To this end, recent studies have examined the effects of filtering on brainstem neural representation of the speech fundamental frequency (f0) in listeners with normal hearing sensitivity by measuring FFRs to low- and high-pass filtered signals. However, the stimuli used in these studies do not reflect the entire range of typical cutoff frequencies used in frequency-specific gain adjustments during hearing aid fitting. Further, there has been limited discussion on the effect of filtering on brainstem neural representation of formant-related harmonics. Here, the effects of filtering on brainstem neural representation of speech fundamental frequency (f0) and harmonics related to first formant frequency (F1) were assessed by recording envelope and spectral FFRs to a vowel low-, high-, and band-pass filtered at cutoff frequencies ranging from 0.125 to 8 kHz. DESIGN FFRs were measured to a synthetically generated vowel stimulus /u/ presented in a full bandwidth and low-pass (experiment 1), high-pass (experiment 2), and band-pass (experiment 3) filtered conditions. In experiment 1, FFRs were measured to a synthetically generated vowel stimulus /u/ presented in a full bandwidth condition as well as 11 low-pass filtered conditions (low-pass cutoff frequencies: 0.125, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, and 8 kHz) in 19 adult listeners with normal hearing sensitivity. In experiment 2, FFRs were measured to the same synthetically generated vowel stimulus /u/ presented in a full bandwidth condition as well as 10 high-pass filtered conditions (high-pass cutoff frequencies: 0.125, 0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, and 6 kHz) in 7 adult listeners with normal hearing sensitivity. In experiment 3, in addition to the full bandwidth condition, FFRs were measured to vowel /u/ low-pass filtered at 2 kHz, band-pass filtered between 2-4 kHz and 4-6 kHz in 10 adult listeners with normal hearing sensitivity. A Fast Fourier Transform analysis was conducted to measure the strength of f0 and the F1-related harmonic relative to the noise floor in the brainstem neural responses obtained to the full bandwidth and filtered stimulus conditions. RESULTS Brainstem neural representation of f0 was reduced when the low-pass filter cutoff frequency was between 0.25 and 0.5 kHz; no differences in f0 strength were noted between conditions when the low-pass filter cutoff condition was at or greater than 0.75 kHz. While envelope FFR f0 strength was reduced when the stimulus was high-pass filtered at 6 kHz, there was no effect of high-pass filtering on brainstem neural representation of f0 when the high-pass filter cutoff frequency ranged from 0.125 to 4 kHz. There was a weakly significant global effect of band-pass filtering on brainstem neural phase-locking to f0. A trends analysis indicated that mean f0 magnitude in the brainstem neural response was greater when the stimulus was band-pass filtered between 2 and 4 kHz as compared to when the stimulus was band-pass filtered between 4 and 6 kHz, low-pass filtered at 2 kHz or presented in the full bandwidth condition. Last, neural phase-locking to f0 was reduced or absent in envelope FFRs measured to filtered stimuli that lacked spectral energy above 0.125 kHz or below 6 kHz. Similarly, little to no energy was seen at F1 in spectral FFRs obtained to low-, high-, or band-pass filtered stimuli that did not contain energy in the F1 region. For stimulus conditions that contained energy at F1, the strength of the peak at F1 in the spectral FFR varied little with low-, high-, or band-pass filtering. CONCLUSIONS Energy at f0 in envelope FFRs may arise due to neural phase-locking to low-, mid-, or high-frequency stimulus components, provided the stimulus envelope is modulated by at least two interacting harmonics. Stronger neural responses at f0 are measured when filtering results in stimulus bandwidths that preserve stimulus energy at F1 and F2. In addition, results suggest that unresolved harmonics may favorably influence f0 strength in the neural response. Lastly, brainstem neural representation of the F1-related harmonic measured in spectral FFRs obtained to filtered stimuli is related to the presence or absence of stimulus energy at F1. These findings add to the existing literature exploring the viability of the FFR as an objective technique to evaluate hearing aid fitting where stimulus bandwidth is altered by design due to frequency-specific gain applied by amplification algorithms.
Collapse
|
10
|
Anderson S, Roque L, Gaskins CR, Gordon-Salant S, Goupell MJ. Age-Related Compensation Mechanism Revealed in the Cortical Representation of Degraded Speech. J Assoc Res Otolaryngol 2020; 21:373-391. [PMID: 32643075 DOI: 10.1007/s10162-020-00753-4] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Accepted: 04/05/2020] [Indexed: 02/08/2023] Open
Abstract
Older adults understand speech with comparative ease in quiet, but signal degradation can hinder speech understanding much more than it does in younger adults. This difficulty may result, in part, from temporal processing deficits related to the aging process and/or high-frequency hearing loss that can occur in listeners who have normal- or near-normal-hearing thresholds in the speech frequency range. Temporal processing deficits may manifest as degraded neural representation in peripheral and brainstem/midbrain structures that lead to compensation, or changes in response strength in auditory cortex. Little is understood about the process by which the neural representation of signals is improved or restored by age-related cortical compensation mechanisms. Therefore, we used vocoding to simulate spectral degradation to compare the behavioral and neural representation of words that contrast on a temporal dimension. Specifically, we used the closure duration of the silent interval between the vowel and the final affricate /t∫/ or fricative /ʃ/ of the words DITCH and DISH, respectively. We obtained perceptual identification functions and electrophysiological neural measures (frequency-following responses (FFR) and cortical auditory-evoked potentials (CAEPs)) to unprocessed and vocoded versions of these words in young normal-hearing (YNH), older normal- or near-normal-hearing (ONH), and older hearing-impaired (OHI) listeners. We found that vocoding significantly reduced the slope of the perceptual identification function in only the OHI listeners. In contrast to the limited effects of vocoding on perceptual performance, vocoding had robust effects on the FFRs across age groups, such that stimulus-to-response correlations and envelope magnitudes were significantly lower for vocoded vs. unprocessed conditions. Increases in the P1 peak amplitude for vocoded stimuli were found for both ONH and OHI listeners, but not for the YNH listeners. These results suggest that while vocoding substantially degrades early neural representation of speech stimuli in the midbrain, there may be cortical compensation in older listeners that is not seen in younger listeners.
Collapse
Affiliation(s)
- Samira Anderson
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, 20742, USA.
| | - Lindsey Roque
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, 20742, USA
| | - Casey R Gaskins
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, 20742, USA
| | - Sandra Gordon-Salant
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, 20742, USA
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, 20742, USA
| |
Collapse
|
11
|
Menezes DC, Griz SMS, Araújo AKLD, Venâncio LGA, Advincula KP, Menezes PDL. Assessment protocols for forward masking in Frequency-Following Response. REVISTA CEFAC 2020. [DOI: 10.1590/1982-0216/202022611219] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
ABSTRACT Purpose: to investigate forward masking by comparing latencies values of positive and negative peaks in frequency-following responses (FFR) recordings, in normally hearing young adults. Methods: from a database, 20 FFR recordings were selected, 10 being from men, and 10 from women, aged 18 to 25 years, with normal hearing. They were qualitatively analyzed by two experienced researchers and also analyzed, according to two different protocols of recording identification: (i) predominance of positive peaks - PV, A, PW, PX, PY, PZ, and O waves; and (ii) predominance of negative peaks - V, A, C, D, E, F, and O waves. The Shapiro-Wilk normality test, the Wilcoxon test, and the Student’s t-test were conducted, by adopting the significance level of p<0.05. Results: the comparative analysis of latency peak values did not reveal any significant difference between the studied protocols. However, the standard deviation was higher for absolute latency values as compared to negative peaks, suggesting an inverted pattern of what was expected. Conclusion: forward masking was identified in both proposals and the protocol of predominant positive peaks was less variable.
Collapse
|
12
|
Human Frequency Following Responses to Vocoded Speech: Amplitude Modulation Versus Amplitude Plus Frequency Modulation. Ear Hear 2019; 41:300-311. [PMID: 31246660 DOI: 10.1097/aud.0000000000000756] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES The most commonly employed speech processing strategies in cochlear implants (CIs) only extract and encode amplitude modulation (AM) in a limited number of frequency channels. proposed a novel speech processing strategy that encodes both frequency modulation (FM) and AM to improve CI performance. Using behavioral tests, they reported better speech, speaker, and tone recognition with this novel strategy than with the AM-alone strategy. Here, we used the scalp-recorded human frequency following responses (FFRs) to examine the differences in the neural representation of vocoded speech sounds with AM alone and AM + FM as the spectral and temporal cues were varied. Specifically, we were interested in determining whether the addition of FM to AM improved the neural representation of envelope periodicity (FFRENV) and temporal fine structure (FFRTFS), as reflected in the temporal pattern of the phase-locked neural activity generating the FFR. DESIGN FFRs were recorded from 13 normal-hearing, adult listeners in response to the original unprocessed stimulus (a synthetic diphthong /au/ with a 110-Hz fundamental frequency or F0 and a 250-msec duration) and the 2-, 4-, 8- and 16-channel sine vocoded versions of /au/ with AM alone and AM + FM. Temporal waveforms, autocorrelation analyses, fast Fourier Transform, and stimulus-response spectral correlations were used to analyze both the strength and fidelity of the neural representation of envelope periodicity (F0) and TFS (formant structure). RESULTS The periodicity strength in the FFRENV decreased more for the AM stimuli than for the relatively resilient AM + FM stimuli as the number of channels was increased. Regardless of the number of channels, a clear spectral peak of FFRENV was consistently observed at the stimulus F0 for all the AM + FM stimuli but not for the AM stimuli. Neural representation as revealed by the spectral correlation of FFRTFS was better for the AM + FM stimuli when compared to the AM stimuli. Neural representation of the time-varying formant-related harmonics as revealed by the spectral correlation was also better for the AM + FM stimuli as compared to the AM stimuli. CONCLUSIONS These results are consistent with previously reported behavioral results and suggest that the AM + FM processing strategy elicited brainstem neural activity that better preserved periodicity, temporal fine structure, and time-varying spectral information than the AM processing strategy. The relatively more robust neural representation of AM + FM stimuli observed here likely contributes to the superior performance on speech, speaker, and tone recognition with the AM + FM processing strategy. Taken together, these results suggest that neural information preserved in the FFR may be used to evaluate signal processing strategies considered for CIs.
Collapse
|
13
|
Perception of noise-vocoded tone complexes: A time domain analysis based on an auditory filterbank model. Hear Res 2018; 367:1-16. [PMID: 30005269 DOI: 10.1016/j.heares.2018.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Revised: 06/28/2018] [Accepted: 07/02/2018] [Indexed: 11/21/2022]
Abstract
When a wideband harmonic tone complex (wHTC) is passed through a noise vocoder, the resulting sounds can have spectra with large peak-to-valley ratios, but little or no periodicity strength in the autocorrelation functions. We measured judgments of pitch strength for normal-hearing listeners for noise-vocoded wideband harmonic tone complexes (NV-wHTCs) relative to standard and anchor stimuli. The standard was a 1-channel NV-wHTC and the anchor was either the unprocessed wHTC or an infinitely-iterated rippled noise (IIRN). Although there is variability among individuals, the magnitude judgment functions obtained with the IIRN anchor suggest different listening strategies among individuals. In order to gain some insight into possible listening strategies, test stimuli were analyzed at the output of an auditory filterbank model based on gammatone filters. The weak periodicity strengths of NV-wHTCs observed in the stimulus autocorrelation functions are augmented at the output of the gammatone filterbank model. Six analytical models of pitch strength were evaluated based on summary correlograms obtained from the gammatone tone filterbank. The results of the filterbank analysis suggest that, contrary to the weak or absent periodicity strengths in the stimulus domain, temporal cues contribute to pitch strength perception of noise-vocoded harmonic stimuli such that listeners' judgments of pitch strength reflect a nonlinear, weighted average of the temporal information between the fine structure and the envelope.
Collapse
|