1
|
Yun D, Lentz J, Shen Y. The Noise Reduction Algorithm May Not Compensate for the Degradation in Output Signal-to-Noise Ratio Caused by Wide Dynamic Range Compression. Am J Audiol 2024:1-17. [PMID: 38875482 DOI: 10.1044/2024_aja-24-00011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2024] Open
Abstract
PURPOSE Most modern hearing aids (HAs) employ wide dynamic range compression (WDRC) and noise reduction (NR) algorithms. It is known that the nonlinear effects of WDRC and NR cause changes to the output signal-to-noise ratio (SNR) of an HA. However, the relative contributions of WDRC and NR to the nonlinear effects are not fully understood. The current study investigated (a) whether WDRC or NR dominates the nonlinear effects measured at the output of a digital HA and (b) whether the electroacoustic effectiveness of NR depends on WDRC parameters while input SNR and background noise are systematically varied. METHOD Test stimuli were Connected Speech Test sentences in multitalker babble noise (2- or 20-talker), presented at input SNRs ranging from -10 to +10 dB. The HA was programmed using multiband WDRC set according to the National Acoustic Laboratories for Nonlinear HA fitting formula 2 prescriptive fits for four standard audiograms and two compression speeds. The NR algorithm of the HA was switched on or off in separate conditions. Nonlinear electroacoustic effects from the WDRC and NR algorithms were assessed by measuring the output SNR of the HA using a phase-inversion technique. To investigate whether there are other factors that may be important besides the output SNR, the Hearing Aid Speech Intelligibility Index and the Hearing Aid Speech Quality Index were applied to the recordings to generate inferences on aided speech intelligibility and perceived speech quality. RESULTS Results showed that WDRC dominated the net nonlinear effect at low-input SNRs, and the net nonlinear effect of WDRC and NR was reduced at high-input SNRs. Results also showed that the effectiveness of NR depended on compression parameters. The effectiveness of NR was partially explained by the trend of Hearing Aid Speech Intelligibility Index and Hearing Aid Speech Quality Index scores, potentially indicating that the Hearing Aid Speech Intelligibility Index and Hearing Aid Speech Quality Index scores may capture factors that cannot be captured by the output SNR metric. CONCLUSIONS Results suggest that the individual signal-processing stages in an HA should not be considered as independent. Electroacoustic evaluation of WDRC and NR algorithms in isolation is not sufficient to capture the combined nonlinear effect of the two algorithms. SUPPLEMENTAL MATERIAL https://doi.org/10.23641/asha.25962541.
Collapse
Affiliation(s)
- Donghyeon Yun
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Jennifer Lentz
- Department of Speech, Language and Hearing Sciences, Indiana University Bloomington
| | - Yi Shen
- Department of Speech and Hearing Sciences, University of Washington, Seattle
| |
Collapse
|
2
|
Shi K, Quass GL, Rogalla MM, Ford AN, Czarny JE, Apostolides PF. Population coding of time-varying sounds in the nonlemniscal inferior colliculus. J Neurophysiol 2024; 131:842-864. [PMID: 38505907 DOI: 10.1152/jn.00013.2024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 02/29/2024] [Accepted: 03/15/2024] [Indexed: 03/21/2024] Open
Abstract
The inferior colliculus (IC) of the midbrain is important for complex sound processing, such as discriminating conspecific vocalizations and human speech. The IC's nonlemniscal, dorsal "shell" region is likely important for this process, as neurons in these layers project to higher-order thalamic nuclei that subsequently funnel acoustic signals to the amygdala and nonprimary auditory cortices, forebrain circuits important for vocalization coding in a variety of mammals, including humans. However, the extent to which shell IC neurons transmit acoustic features necessary to discern vocalizations is less clear, owing to the technical difficulty of recording from neurons in the IC's superficial layers via traditional approaches. Here, we use two-photon Ca2+ imaging in mice of either sex to test how shell IC neuron populations encode the rate and depth of amplitude modulation, important sound cues for speech perception. Most shell IC neurons were broadly tuned, with a low neurometric discrimination of amplitude modulation rate; only a subset was highly selective to specific modulation rates. Nevertheless, neural network classifier trained on fluorescence data from shell IC neuron populations accurately classified amplitude modulation rate, and decoding accuracy was only marginally reduced when highly tuned neurons were omitted from training data. Rather, classifier accuracy increased monotonically with the modulation depth of the training data, such that classifiers trained on full-depth modulated sounds had median decoding errors of ∼0.2 octaves. Thus, shell IC neurons may transmit time-varying signals via a population code, with perhaps limited reliance on the discriminative capacity of any individual neuron.NEW & NOTEWORTHY The IC's shell layers originate a "nonlemniscal" pathway important for perceiving vocalization sounds. However, prior studies suggest that individual shell IC neurons are broadly tuned and have high response thresholds, implying a limited reliability of efferent signals. Using Ca2+ imaging, we show that amplitude modulation is accurately represented in the population activity of shell IC neurons. Thus, downstream targets can read out sounds' temporal envelopes from distributed rate codes transmitted by populations of broadly tuned neurons.
Collapse
Affiliation(s)
- Kaiwen Shi
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
| | - Gunnar L Quass
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
| | - Meike M Rogalla
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
| | - Alexander N Ford
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
| | - Jordyn E Czarny
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
| | - Pierre F Apostolides
- Department of Otolaryngology-Head & Neck Surgery, Kresge Hearing Research Institute, University of Michigan Medical School, Ann Arbor, Michigan, United States
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, Michigan, United States
| |
Collapse
|
3
|
Shi K, Quass GL, Rogalla MM, Ford AN, Czarny JE, Apostolides PF. Population coding of time-varying sounds in the non-lemniscal Inferior Colliculus. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.14.553263. [PMID: 37645904 PMCID: PMC10461978 DOI: 10.1101/2023.08.14.553263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
The inferior colliculus (IC) of the midbrain is important for complex sound processing, such as discriminating conspecific vocalizations and human speech. The IC's non-lemniscal, dorsal "shell" region is likely important for this process, as neurons in these layers project to higher-order thalamic nuclei that subsequently funnel acoustic signals to the amygdala and non-primary auditory cortices; forebrain circuits important for vocalization coding in a variety of mammals, including humans. However, the extent to which shell IC neurons transmit acoustic features necessary to discern vocalizations is less clear, owing to the technical difficulty of recording from neurons in the IC's superficial layers via traditional approaches. Here we use 2-photon Ca2+ imaging in mice of either sex to test how shell IC neuron populations encode the rate and depth of amplitude modulation, important sound cues for speech perception. Most shell IC neurons were broadly tuned, with a low neurometric discrimination of amplitude modulation rate; only a subset were highly selective to specific modulation rates. Nevertheless, neural network classifier trained on fluorescence data from shell IC neuron populations accurately classified amplitude modulation rate, and decoding accuracy was only marginally reduced when highly tuned neurons were omitted from training data. Rather, classifier accuracy increased monotonically with the modulation depth of the training data, such that classifiers trained on full-depth modulated sounds had median decoding errors of ~0.2 octaves. Thus, shell IC neurons may transmit time-varying signals via a population code, with perhaps limited reliance on the discriminative capacity of any individual neuron.
Collapse
Affiliation(s)
- Kaiwen Shi
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Gunnar L. Quass
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Meike M. Rogalla
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Alexander N. Ford
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Jordyn E. Czarny
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
| | - Pierre F. Apostolides
- Kresge Hearing Research Institute, Department of Otolaryngology — Head & Neck Surgery, University of Michigan Medical School, Ann Arbor, MI, 48109
- Department of Molecular & Integrative Physiology, University of Michigan Medical School, Ann Arbor, MI, 48109
| |
Collapse
|
4
|
Hamza Y, Farhadi A, Schwarz DM, McDonough JM, Carney LH. Representations of fricatives in subcortical model responses: Comparisons with human consonant perception. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 154:602-618. [PMID: 37535429 PMCID: PMC10550336 DOI: 10.1121/10.0020536] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 07/11/2023] [Accepted: 07/13/2023] [Indexed: 08/05/2023]
Abstract
Fricatives are obstruent sound contrasts made by airflow constrictions in the vocal tract that produce turbulence across the constriction or at a site downstream from the constriction. Fricatives exhibit significant intra/intersubject and contextual variability. Yet, fricatives are perceived with high accuracy. The current study investigated modeled neural responses to fricatives in the auditory nerve (AN) and inferior colliculus (IC) with the hypothesis that response profiles across populations of neurons provide robust correlates to consonant perception. Stimuli were 270 intervocalic fricatives (10 speakers × 9 fricatives × 3 utterances). Computational model response profiles had characteristic frequencies that were log-spaced from 125 Hz to 8 or 20 kHz to explore the impact of high-frequency responses. Confusion matrices generated by k-nearest-neighbor subspace classifiers were based on the profiles of average rates across characteristic frequencies as feature vectors. Model confusion matrices were compared with published behavioral data. The modeled AN and IC neural responses provided better predictions of behavioral accuracy than the stimulus spectra, and IC showed better accuracy than AN. Behavioral fricative accuracy was explained by modeled neural response profiles, whereas confusions were only partially explained. Extended frequencies improved accuracy based on the model IC, corroborating the importance of extended high frequencies in speech perception.
Collapse
Affiliation(s)
- Yasmeen Hamza
- Department of Biomedical Engineering, University of Rochester, Rochester, New York 14627, USA
| | - Afagh Farhadi
- Department of Electrical and Computer Engineering, University of Rochester, Rochester, New York 14627, USA
| | - Douglas M Schwarz
- Depts. of Neuroscience and Biomedical Engineering, University of Rochester, Rochester, New York 14627, USA
| | - Joyce M McDonough
- Department of Linguistics, University of Rochester, Rochester, New York 14627, USA
| | - Laurel H Carney
- Depts. of Biomedical Engineering, Neuroscience, and Electrical and Computer Engineering, University of Rochester, Rochester, New York 14627, USA
| |
Collapse
|
5
|
Ellis GM, Souza P. Updating the Spectral Correlation Index: Integrating Audibility and Band Importance Using Speech Intelligibility Index Weights. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2022; 65:2720-2726. [PMID: 35767317 PMCID: PMC9584137 DOI: 10.1044/2022_jslhr-21-00448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Revised: 12/16/2021] [Accepted: 04/03/2022] [Indexed: 06/15/2023]
Abstract
The original Spectral Correlation Index (SCIo ) is a measure of amplitude envelope distortion that has been used in several studies to predict behavioral results. Because the original SCIo did not account for the differential contribution of particular frequency bands to speech intelligibility (i.e., band importance) or for audibility, a new "individual" version (the SCIi ) is proposed and evaluated. Sentence intelligibility data are used to compare the predictive power and goodness-of-fit for statistical models using two versions of the SCI. The SCIi provides significantly better fits to behavioral data than the SCIo . This result demonstrates the importance of accounting for and including signal audibility in analyzing and modeling data collected from the population of individuals with hearing impairment. With this update, the SCIi is a useful measure for predicting speech intelligibility based on amplitude envelope distortions.
Collapse
Affiliation(s)
- Gregory M. Ellis
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
| | - Pamela Souza
- Roxelyn and Richard Pepper Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL
- Knowles Hearing Center, Northwestern University, Evanston, IL
| |
Collapse
|
6
|
Souza PE, Ellis G, Marks K, Wright R, Gallun F. Does the Speech Cue Profile Affect Response to Amplitude Envelope Distortion? JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2021; 64:2053-2069. [PMID: 34019777 PMCID: PMC8740712 DOI: 10.1044/2021_jslhr-20-00481] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 11/13/2020] [Accepted: 02/11/2021] [Indexed: 06/12/2023]
Abstract
Purpose A broad area of interest to our group is to understand the consequences of the "cue profile" (a measure of how well a listener can utilize audible temporal and/or spectral cues for listening scenarios in which a subset of cues is distorted. The study goal was to determine if listeners whose cue profile indicated that they primarily used temporal cues for recognition would respond differently to speech-envelope distortion than listeners who utilized both spectral and temporal cues. Method Twenty-five adults with sensorineural hearing loss participated in the study. The listener's cue profile was measured by analyzing identification patterns for a set of synthetic syllables in which envelope rise time and formant transitions were varied. A linear discriminant analysis quantified the relative contributions of spectral and temporal cues to identification patterns. Low-context sentences in noise were processed with time compression, wide-dynamic range compression, or a combination of time compression and wide-dynamic range compression to create a range of speech-envelope distortions. An acoustic metric, a modified version of the Spectral Correlation Index, was calculated to quantify envelope distortion. Results A binomial generalized linear mixed-effects model indicated that envelope distortion, the cue profile, the interaction between envelope distortion and the cue profile, and the pure-tone average were significant predictors of sentence recognition. Conclusions The listeners with good perception of spectro-temporal contrasts were more resilient to the detrimental effects of envelope compression than listeners who used temporal cues to a greater extent. The cue profile may provide information about individual listening that can direct choice of hearing aid parameters, especially those parameters that affect the speech envelope.
Collapse
|
7
|
Yellamsetty A, Ozmeral EJ, Budinsky RA, Eddins DA. A Comparison of Environment Classification Among Premium Hearing Instruments. Trends Hear 2021; 25:2331216520980968. [PMID: 33749410 PMCID: PMC7989119 DOI: 10.1177/2331216520980968] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 11/20/2020] [Accepted: 11/24/2020] [Indexed: 11/21/2022] Open
Abstract
Hearing aids classify acoustic environments into multiple, generic classes for the purposes of guiding signal processing. Information about environmental classification is made available to the clinician for fitting, counseling, and troubleshooting purposes. The goal of this study was to better inform scientists and clinicians about the nature of that information by comparing the classification schemes among five premium hearing instruments in a wide range of acoustic scenes including those that vary in signal-to-noise ratio and overall level (dB SPL). Twenty-eight acoustic scenes representing various prototypical environments were presented to five premium devices mounted on an acoustic manikin. Classification measures were recorded from the brand-specific fitting software then recategorized to generic labels to conceal the device company, including (a) Speech in Quiet, (b) Speech in Noise, (c) Noise, and (d) Music. Twelve normal-hearing listeners also classified each scene. The results revealed a variety of similarities and differences among the five devices and the human subjects. Where some devices were highly dependent on input overall level, others were influenced markedly by signal-to-noise ratio. Differences between human and hearing aid classification were evident for several speech and music scenes. Environmental classification is the heart of the signal processing strategy for any given device, providing key input to subsequent decision-making. Comprehensive assessment of environmental classification is essential when considering the cost of signal processing errors, the potential impact for typical wearers, and the information that is available for use by clinicians. The magnitude of differences among devices is remarkable and to be noted.
Collapse
Affiliation(s)
- Anusha Yellamsetty
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, United States
| | - Erol J. Ozmeral
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, United States
| | - Robert A. Budinsky
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, United States
| | - David A. Eddins
- Department of Communication Sciences and Disorders, University of South Florida, Tampa, Florida, United States
| |
Collapse
|
8
|
Lelo de Larrea-Mancera ES, Stavropoulos T, Hoover EC, Eddins DA, Gallun FJ, Seitz AR. Portable Automated Rapid Testing (PART) for auditory assessment: Validation in a young adult normal-hearing population. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 148:1831. [PMID: 33138479 PMCID: PMC7541091 DOI: 10.1121/10.0002108] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/26/2019] [Revised: 09/14/2020] [Accepted: 09/16/2020] [Indexed: 05/23/2023]
Abstract
This study aims to determine the degree to which Portable Automated Rapid Testing (PART), a freely available program running on a tablet computer, is capable of reproducing standard laboratory results. Undergraduate students were assigned to one of three within-subject conditions that examined repeatability of performance on a battery of psychoacoustical tests of temporal fine structure processing, spectro-temporal amplitude modulation, and targets in competition. The repeatability condition examined test/retest with the same system, the headphones condition examined the effects of varying headphones (passive and active noise-attenuating), and the noise condition examined repeatability in the presence of recorded cafeteria noise. In general, performance on the test battery showed high repeatability, even across manipulated conditions, and was similar to that reported in the literature. These data serve as validation that suprathreshold psychoacoustical tests can be made accessible to run on consumer-grade hardware and perform in less controlled settings. This dataset also provides a distribution of thresholds that can be used as a normative baseline against which auditory dysfunction can be identified in future work.
Collapse
Affiliation(s)
| | - Trevor Stavropoulos
- Brain Game Center, University of California Riverside, 1201 University Avenue, Riverside California 92521, USA
| | - Eric C Hoover
- University of Maryland, College Park, Maryland 20742, USA
| | | | | | - Aaron R Seitz
- Psychology Department, University of California, Riverside, 900 University Avenue, Riverside, California 92521, USA
| |
Collapse
|
9
|
May T, Kowalewski B, Dau T. Signal-to-Noise-Ratio-Aware Dynamic Range Compression in Hearing Aids. Trends Hear 2018; 22:2331216518790903. [PMID: 30117366 PMCID: PMC6100123 DOI: 10.1177/2331216518790903] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Fast-acting dynamic range compression is a level-dependent amplification scheme which aims to restore audibility for hearing-impaired listeners. However, when being applied to noisy speech at positive signal-to-noise ratios (SNRs), the gain function typically changes rapidly over time as it is driven by the short-term fluctuations of the speech signal. This leads to an amplification of the noise components in the speech gaps, which reduces the output SNR and distorts the acoustic properties of the background noise. An adaptive compression scheme is proposed here which utilizes information about the SNR in different frequency channels to adaptively change the characteristics of the compressor. Specifically, fast-acting compression is applied to speech-dominated time-frequency (T-F) units where the SNR is high, while slow-acting compression is used to effectively linearize the processing for noise-dominated T-F units where the SNR is low. A systematic evaluation of this SNR-aware compression scheme showed that the effective compression of speech components embedded in noise was similar to that of a conventional fast-acting system, whereas natural fluctuations in the background noise were preserved in a similar way as when a slow-acting compressor was applied.
Collapse
Affiliation(s)
- Tobias May
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Borys Kowalewski
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| | - Torsten Dau
- 1 Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, Lyngby, Denmark
| |
Collapse
|
10
|
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization. Ear Hear 2018; 37:e377-e390. [PMID: 27438871 DOI: 10.1097/aud.0000000000000328] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. DESIGN Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. RESULTS Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. CONCLUSIONS When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Collapse
|
11
|
Humes LE, Kidd GR, Fogerty D. Exploring Use of the Coordinate Response Measure in a Multitalker Babble Paradigm. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2017; 60:741-754. [PMID: 28249093 PMCID: PMC5544196 DOI: 10.1044/2016_jslhr-h-16-0042] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Revised: 08/18/2016] [Accepted: 09/01/2016] [Indexed: 05/28/2023]
Abstract
PURPOSE Three experiments examined the use of competing coordinate response measure (CRM) sentences as a multitalker babble. METHOD In Experiment I, young adults with normal hearing listened to a CRM target sentence in the presence of 2, 4, or 6 competing CRM sentences with synchronous or asynchronous onsets. In Experiment II, the condition with 6 competing sentences was explored further. Three stimulus conditions (6 talkers saying same sentence, 1 talker producing 6 different sentences, and 6 talkers each saying a different sentence) were evaluated with different methods of presentation. Experiment III examined the performance of older adults with hearing impairment in a subset of conditions from Experiment II. RESULTS In Experiment I, performance declined with increasing numbers of talkers and improved with asynchronous sentence onsets. Experiment II identified conditions under which an increase in the number of talkers led to better performance. In Experiment III, the relative effects of the number of talkers, messages, and onset asynchrony were the same for young and older listeners. CONCLUSIONS Multitalker babble composed of CRM sentences has masking properties similar to other types of multitalker babble. However, when the number of different talkers and messages are varied independently, performance is best with more talkers and fewer messages.
Collapse
Affiliation(s)
- Larry E. Humes
- Department of Speech and Hearing Sciences, Indiana University Bloomington
| | - Gary R. Kidd
- Department of Speech and Hearing Sciences, Indiana University Bloomington
| | - Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia
| |
Collapse
|
12
|
Zaar J, Dau T. Predicting consonant recognition and confusions in normal-hearing listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:1051. [PMID: 28253684 DOI: 10.1121/1.4976054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
The perception of consonants in background noise has been investigated in various studies and was shown to critically depend on fine details in the stimuli. In this study, a microscopic speech perception model is proposed that represents an extension of the auditory signal processing model by Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102, 2892-2905]. The model was evaluated based on the extensive consonant perception data set provided by Zaar and Dau [(2015). J. Acoust. Soc. Am. 138, 1253-1267], which was obtained with normal-hearing listeners using 15 consonant-vowel combinations mixed with white noise. Accurate predictions of the consonant recognition scores were obtained across a large range of signal-to-noise ratios. Furthermore, the model yielded convincing predictions of the consonant confusion scores, such that the predicted errors were clustered in perceptually plausible confusion groups. The large predictive power of the proposed model suggests that adaptive processes in the auditory preprocessing in combination with a cross-correlation based template-matching back end can account for some of the processes underlying consonant perception in normal-hearing listeners. The proposed model may provide a valuable framework, e.g., for investigating the effects of hearing impairment and hearing-aid signal processing on phoneme recognition.
Collapse
Affiliation(s)
- Johannes Zaar
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| | - Torsten Dau
- Hearing Systems Group, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kongens Lyngby, Denmark
| |
Collapse
|
13
|
Fogerty D, Xu J, Gibbs BE. Modulation masking and glimpsing of natural and vocoded speech during single-talker modulated noise: Effect of the modulation spectrum. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1800. [PMID: 27914381 PMCID: PMC5848862 DOI: 10.1121/1.4962494] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
Compared to notionally steady-state noise, modulated maskers provide a perceptual benefit for speech recognition, in part due to preserved speech information during the amplitude dips of the masker. However, overlap in the modulation spectrum between the target speech and the competing modulated masker may potentially result in modulation masking, and thereby offset the release from energetic masking. The current study investigated masking release provided by single-talker modulated noise. The overlap in the modulation spectra of the target speech and the modulated noise masker was varied through time compression or expansion of the competing masker. Younger normal hearing adults listened to sentences that were unprocessed or noise vocoded to primarily limit speech recognition to the preserved temporal envelope cues. For unprocessed speech, results demonstrated improved performance with masker modulation spectrum shifted up or down compared to the target modulation spectrum, except for the most extreme time expansion. For vocoded speech, significant masking release was observed with the slowest masker rate. Perceptual results combined with acoustic analyses of the preserved glimpses of the target speech suggest contributions of modulation masking and cognitive-linguistic processing as factors contributing to performance.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Jiaqian Xu
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Bobby E Gibbs
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| |
Collapse
|
14
|
Fogerty D, Xu J, Gibbs BE. Modulation masking and glimpsing of natural and vocoded speech during single-talker modulated noise: Effect of the modulation spectrum. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2016; 140:1800. [PMID: 27914381 DOI: 10.5041466/1.4962494] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Compared to notionally steady-state noise, modulated maskers provide a perceptual benefit for speech recognition, in part due to preserved speech information during the amplitude dips of the masker. However, overlap in the modulation spectrum between the target speech and the competing modulated masker may potentially result in modulation masking, and thereby offset the release from energetic masking. The current study investigated masking release provided by single-talker modulated noise. The overlap in the modulation spectra of the target speech and the modulated noise masker was varied through time compression or expansion of the competing masker. Younger normal hearing adults listened to sentences that were unprocessed or noise vocoded to primarily limit speech recognition to the preserved temporal envelope cues. For unprocessed speech, results demonstrated improved performance with masker modulation spectrum shifted up or down compared to the target modulation spectrum, except for the most extreme time expansion. For vocoded speech, significant masking release was observed with the slowest masker rate. Perceptual results combined with acoustic analyses of the preserved glimpses of the target speech suggest contributions of modulation masking and cognitive-linguistic processing as factors contributing to performance.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Jiaqian Xu
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| | - Bobby E Gibbs
- Department of Communication Sciences and Disorders, University of South Carolina, Columbia, South Carolina 29208, USA
| |
Collapse
|
15
|
Sources of Variability in Consonant Perception and Implications for Speech Perception Modeling. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016. [PMID: 27080685 DOI: 10.1007/978-3-319-25474-6_46] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
The present study investigated the influence of various sources of response variability in consonant perception. A distinction was made between source-induced variability and receiver-related variability. The former refers to perceptual differences induced by differences in the speech tokens and/or the masking noise tokens; the latter describes perceptual differences caused by within- and across-listener uncertainty. Consonant-vowel combinations (CVs) were presented to normal-hearing listeners in white noise at six different signal-to-noise ratios. The obtained responses were analyzed with respect to the considered sources of variability using a measure of the perceptual distance between responses. The largest effect was found across different CVs. For stimuli of the same phonetic identity, the speech-induced variability across and within talkers and the across-listener variability were substantial and of similar magnitude. Even time-shifts in the waveforms of white masking noise produced a significant effect, which was well above the within-listener variability (the smallest effect). Two auditory-inspired models in combination with a template-matching back end were considered to predict the perceptual data. In particular, an energy-based and a modulation-based approach were compared. The suitability of the two models was evaluated with respect to the source-induced perceptual distance and in terms of consonant recognition rates and consonant confusions. Both models captured the source-induced perceptual distance remarkably well. However, the modulation-based approach showed a better agreement with the data in terms of consonant recognition and confusions. The results indicate that low-frequency modulations up to 16 Hz play a crucial role in consonant perception.
Collapse
|
16
|
Brennan M, McCreery R, Kopun J, Lewis D, Alexander J, Stelmachowicz P. Masking Release in Children and Adults With Hearing Loss When Using Amplification. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2016; 59:110-21. [PMID: 26540194 PMCID: PMC4867924 DOI: 10.1044/2015_jslhr-h-14-0105] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/14/2014] [Accepted: 10/23/2015] [Indexed: 05/09/2023]
Abstract
PURPOSE This study compared masking release for adults and children with normal hearing and hearing loss. For the participants with hearing loss, masking release using simulated hearing aid amplification with 2 different compression speeds (slow, fast) was compared. METHOD Sentence recognition in unmodulated noise was compared with recognition in modulated noise (masking release). Recognition was measured for participants with hearing loss using individualized amplification via the hearing-aid simulator. RESULTS Adults with hearing loss showed greater masking release than the children with hearing loss. Average masking release was small (1 dB) and did not depend on hearing status. Masking release was comparable for slow and fast compression. CONCLUSIONS The use of amplification in this study contrasts with previous studies that did not use amplification. The results suggest that when differences in audibility are reduced, participants with hearing loss may be able to take advantage of dips in the noise levels, similar to participants with normal hearing. Although children required a more favorable signal-to-noise ratio than adults for both unmodulated and modulated noise, masking release was not statistically different. However, the ability to detect a difference may have been limited by the small amount of masking release observed.
Collapse
|
17
|
Abstract
PURPOSE Authors of previous work using laboratory-based paradigms documented that wide dynamic range compression (WDRC) may improve gap detection compared to linear amplification. The purpose of this study was to measure temporal resolution using WDRC fit with compression ratios set for each listener’s hearing loss. METHOD Nineteen adults with mild-to-moderate hearing loss fitted with WDRC or linear amplification set to a prescriptive fitting method participated in this study. Subjects detected amplitude modulations and gaps. Two types of noise carrier were used: narrowband (1995–2005 Hz) and broadband (100–8000 Hz). RESULTS Small differences between WDRC and linear amplification were observed in the measures of temporal resolution. Modulation detection thresholds worsened by a mean of 0.7 dB with WDRC compared to linear amplification. This reduction was observed for both carrier types. Gap detection thresholds did not differ between the 2 amplification conditions. CONCLUSIONS WDRC set using a prescriptive fitting method with individualized compression ratios had a small but statistically significant effect on measures of modulation thresholds. Differences were not observed between the two amplification conditions for the measures of gap detection. These findings contrast with previous work using fixed compression ratios, suggesting that the effect of the fitting method on the compression ratio should be considered when attempting to generalize the effect of WDRC on temporal resolution to the clinical setting.
Collapse
|
18
|
Narne VK, Barman A, Deepthi M. Effect of companding on speech recognition in quiet and noise for listeners with ANSD. Int J Audiol 2013; 53:94-100. [PMID: 24237041 DOI: 10.3109/14992027.2013.849008] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
19
|
Payton KL, Shrestha M. Comparison of a short-time speech-based intelligibility metric to the speech transmission index and intelligibility data. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:3818-3827. [PMID: 24180791 PMCID: PMC3829886 DOI: 10.1121/1.4821216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/31/2012] [Revised: 08/04/2013] [Accepted: 08/30/2013] [Indexed: 05/28/2023]
Abstract
Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.
Collapse
Affiliation(s)
- Karen L Payton
- ECE Department, University of Massachusetts Dartmouth, 285 Old Westport Road, North Dartmouth, Massachusetts 02747
| | | |
Collapse
|
20
|
Fogerty D. Acoustic predictors of intelligibility for segmentally interrupted speech: temporal envelope, voicing, and duration. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2013; 56:1402-8. [PMID: 23838986 PMCID: PMC4064467 DOI: 10.1044/1092-4388(2013/12-0203)] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
PURPOSE Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information-namely, consonants and vowels. METHOD Young listeners with normal hearing previously completed word and sentence recognition tasks that required them to repeat word and sentence material that was temporally interrupted. Interruptions were designed to replace various portions of consonants or vowels with low-level noise. Acoustic analysis of preserved consonant and vowel segments was conducted to investigate the role of the preserved temporal envelope, voicing, and speech duration in predicting performance. RESULTS Results demonstrate that the temporal envelope, predominantly from vowels, is most important for sentence recognition and largely predicts results across consonant and vowel conditions. In contrast, for isolated words the proportion of speech preserved was the best predictor of performance regardless of whether glimpses were from consonants or vowels. CONCLUSION These findings suggest consideration of the vowel temporal envelope in speech transmission and amplification technologies for improving the intelligibility of temporally interrupted sentences.
Collapse
|
21
|
Sabin AT, Souza PE. Initial development of a temporal-envelope-preserving nonlinear hearing aid prescription using a genetic algorithm. Trends Amplif 2013; 17:94-107. [PMID: 24028890 DOI: 10.1177/1084713813495981] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Most hearing aid prescriptions focus on the optimization of a metric derived from the long-term average spectrum of speech, and do not consider how the prescribed values might distort the temporal envelope shape. A growing body of evidence suggests that such distortions can lead to systematic errors in speech perception, and therefore hearing aid prescriptions might benefit by including preservation of the temporal envelope shape in their rationale. To begin to explore this possibility, we designed a genetic algorithm (GA) to find the multiband compression settings that preserve the shape of the original temporal envelope while placing that envelope in the listener's audiometric dynamic range. The resulting prescription had a low compression threshold, short attack and release times, and a combination of compression ratio and gain that placed the output signal within the listener's audiometric dynamic range. Initial behavioral tests of individuals with impaired hearing revealed no difference in speech-in-noise perception between the GA and the NAL-NL2 prescription. However, gap detection performance was superior with the GA in comparison to NAL-NL2. Overall, this work is a proof of concept that consideration of temporal envelope distortions can be incorporated into hearing aid prescriptions.
Collapse
Affiliation(s)
- Andrew T Sabin
- 1Department of Communication Sciences and Disorders, Northwestern University, Evanston, IL, USA
| | | |
Collapse
|
22
|
Sabin AT, Gallun FJ, Souza PE. Acoustical correlates of performance on a dynamic range compression discrimination task. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2013; 134:2136-47. [PMID: 23967944 PMCID: PMC3765331 DOI: 10.1121/1.4816410] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
Dynamic range compression is widely used to reduce the difference between the most and least intense portions of a signal. Such compression distorts the shape of the amplitude envelope of a signal, but it is unclear to what extent such distortions are actually perceivable by listeners. Here, the ability to distinguish between compressed and uncompressed versions of a noise vocoded sentence was initially measured in listeners with normal hearing while varying the threshold, ratio, attack, and release parameters. This narrow condition was selected in order to characterize perception under the most favorable listening conditions. The average behavioral sensitivity to compression was highly correlated to several acoustical indices of modulation depth. In particular, performance was highly correlated to the Euclidean distance between the modulation spectra of the uncompressed and compressed signals. Suggesting that this relationship is not restricted to the initial test conditions, the correlation remained largely unchanged both (1) when listeners with normal hearing were tested using a time-compressed version of the original signal, and (2) when listeners with impaired hearing were tested using the original signal. If this relationship generalizes to more ecologically valid conditions, it will provide a straightforward method for predicting the detectability of compression-induced distortions.
Collapse
Affiliation(s)
- Andrew T Sabin
- Department of Communication Sciences and Disorders, Northwestern University, 2240 Campus Drive, Evanston, Illinois 60201, USA.
| | | | | |
Collapse
|
23
|
Hoover EC, Souza PE, Gallun FJ. The consonant-weighted envelope difference index (cEDI): a proposed technique for quantifying envelope distortion. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:1802-1806. [PMID: 22411284 PMCID: PMC3538866 DOI: 10.1044/1092-4388(2012/11-0255)] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
PURPOSE The benefits of amplitude compression in hearing aids may be limited by distortion resulting from rapid gain adjustment. To evaluate this, it is convenient to quantify distortion by using a metric that is sensitive to the changes in the processed signal that decrease consonant recognition, such as the Envelope Difference Index (EDI; Fortune, Woodruff, & Preves, 1994). However, the EDI relies on the entire duration of the signal, including portions irrelevant to consonant recognition. METHOD This note describes a computationally efficient method of automatically segmenting speech in time according to the syllable structure. Our technique uses the 1st derivative of the envelope as a basis. Peaks located in the derivative were used to generate a weighting function for the computation of a metric of signal distortion. RESULTS The weighting function significantly improved the variance explained in consonant recognition scores over previous methods. However, only 3.2% of the variance was explained in the revised model. CONCLUSION This technique was effective in focusing the analysis of distortion on specific segments of the signal. Use of the technique has implications for speech analysis. The difference in the amplitude envelope of consonants is not a robust model of the effect of hearing aid compression on consonant recognition.
Collapse
|
24
|
Dubno JR, Ahlstrom JB, Wang X, Horwitz AR. Level-dependent changes in perception of speech envelope cues. J Assoc Res Otolaryngol 2012; 13:835-52. [PMID: 22872414 DOI: 10.1007/s10162-012-0343-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 07/16/2012] [Indexed: 11/28/2022] Open
Abstract
Level-dependent changes in temporal envelope fluctuations in speech and related changes in speech recognition may reveal effects of basilar-membrane nonlinearities. As a result of compression in the basilar-membrane response, the "effective" magnitude of envelope fluctuations may be reduced as speech level increases from lower level (more linear) to mid-level (more compressive) regions. With further increases to a more linear region, speech envelope fluctuations may become more pronounced. To assess these effects, recognition of consonants and key words in sentences was measured as a function of speech level for younger adults with normal hearing. Consonant-vowel syllables and sentences were spectrally degraded using "noise vocoder" processing to maximize perceptual effects of changes to the speech envelope. Broadband noise at a fixed signal-to-noise ratio maintained constant audibility as speech level increased. Results revealed significant increases in scores and envelope-dependent feature transmission from 45 to 60 dB SPL and decreasing scores and feature transmission from 60 to 85 dB SPL. This quadratic pattern, with speech recognition maximized at mid levels and poorer at lower and higher levels, is consistent with a role of cochlear nonlinearities in perception of speech envelope cues.
Collapse
Affiliation(s)
- Judy R Dubno
- Department of Otolaryngology-Head and Neck Surgery, Medical University of South Carolina, 135 Rutledge Avenue, MSC 550, Charleston, SC 29425-5500, USA.
| | | | | | | |
Collapse
|
25
|
Souza P, Hoover E, Gallun F. Application of the envelope difference index to spectrally sparse speech. JOURNAL OF SPEECH, LANGUAGE, AND HEARING RESEARCH : JSLHR 2012; 55:824-837. [PMID: 22232401 PMCID: PMC3326439 DOI: 10.1044/1092-4388(2011/10-0301)] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
PURPOSE Amplitude compression is a common hearing aid processing strategy that can improve speech audibility and loudness comfort but also has the potential to alter important cues carried by the speech envelope. In previous work, a measure of envelope change, the Envelope Difference Index (EDI; Fortune, Woodruff, & Preves, 1994), was moderately related to recognition of spectrally robust consonants. This follow-up study investigated the relationship between the EDI and recognition of spectrally sparse consonants. METHOD Stimuli were vowel-consonant-vowel tokens processed to reduce spectral cues. Compression parameters were chosen to achieve a range of EDI values. Recognition was measured for 20 listeners with normal hearing. RESULTS Both overall recognition and perception of consonant features were reduced at higher EDI values. Similar effects were noted with noise-vocoded and sine-vocoded processing and regardless of whether periodicity cues were available. CONCLUSION The data provide information about the acceptable limits of envelope distortion under constrained conditions. These limits can be used to consider the impact of envelope distortions in situations where other cues are available to varying extents.
Collapse
|
26
|
Fogerty D, Humes LE. The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2012; 131:1490-501. [PMID: 22352519 PMCID: PMC3292616 DOI: 10.1121/1.3676696] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2011] [Revised: 11/16/2011] [Accepted: 12/21/2011] [Indexed: 05/13/2023]
Abstract
The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Sciences Indiana University Bloomington, Indiana 47405, USA.
| | | |
Collapse
|
27
|
Jørgensen S, Dau T. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:1475-87. [PMID: 21895088 DOI: 10.1121/1.3621502] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]
Abstract
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.
Collapse
Affiliation(s)
- Søren Jørgensen
- Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark
| | | |
Collapse
|
28
|
Fogerty D. Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: effect of interruption at the syllabic-rate and periodic-rate of speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 130:489-500. [PMID: 21786914 PMCID: PMC3155597 DOI: 10.1121/1.3592220] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/23/2010] [Accepted: 04/25/2011] [Indexed: 05/16/2023]
Abstract
Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA.
| |
Collapse
|
29
|
Kates JM. Spectro-temporal envelope changes caused by temporal fine structure modification. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:3981-3990. [PMID: 21682419 DOI: 10.1121/1.3583552] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The study of speech from which the temporal fine structure (TFS) has been removed has become an important research area. Common procedures for removing TFS include noise and tone vocoders. In the noise vocoder, bands of noise are modulated by the envelope of the speech within each band, and in the tone vocoder the carrier is a sinusoid at the center of each frequency band. Five different procedures for removing TFS are evaluated in this paper: the noise vocoder, a low-noise noise approach in which the noise envelope is replaced by the speech envelope in each frequency band, phase randomization within each band, the tone vocoder, and sinusoidal modeling with random phase. The effects of TFS modification on the speech envelope are evaluated using an index based on the envelope time-frequency modulation. The results show that for all of the TFS techniques implemented in this study, there is a substantial loss in the accuracy of reproduction of the envelope time-frequency modulation. The tone vocoder gives the best accuracy, followed by the procedure that replaces the noise envelope with the speech envelope in each band.
Collapse
Affiliation(s)
- James M Kates
- GN ReSound A/S, 3215 Marine Street, Room W161, Boulder, Colorado 80309, USA.
| |
Collapse
|
30
|
Fogerty D. Perceptual weighting of individual and concurrent cues for sentence intelligibility: frequency, envelope, and fine structure. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2011; 129:977-88. [PMID: 21361454 PMCID: PMC3070991 DOI: 10.1121/1.3531954] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2010] [Revised: 11/19/2010] [Accepted: 12/06/2010] [Indexed: 05/16/2023]
Abstract
The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise.
Collapse
Affiliation(s)
- Daniel Fogerty
- Department of Speech and Hearing Sciences, Indiana University, Bloomington, Indiana 47405, USA.
| |
Collapse
|
31
|
Calandruccio L, Dhar S, Bradlow AR. Speech-on-speech masking with variable access to the linguistic content of the masker speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2010; 128:860-9. [PMID: 20707455 PMCID: PMC2933260 DOI: 10.1121/1.3458857] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2009] [Revised: 03/11/2010] [Accepted: 06/09/2010] [Indexed: 05/21/2023]
Abstract
It has been reported that listeners can benefit from a release in masking when the masker speech is spoken in a language that differs from the target speech compared to when the target and masker speech are spoken in the same language [Freyman, R. L. et al. (1999). J. Acoust. Soc. Am. 106, 3578-3588; Van Engen, K., and Bradlow, A. (2007), J. Acoust. Soc. Am. 121, 519-526]. It is unclear whether listeners benefit from this release in masking due to the lack of linguistic interference of the masker speech, from acoustic and phonetic differences between the target and masker languages, or a combination of these differences. In the following series of experiments, listeners' sentence recognition was evaluated using speech and noise maskers that varied in the amount of linguistic content, including native-English, Mandarin-accented English, and Mandarin speech. Results from three experiments indicated that the majority of differences observed between the linguistic maskers could be explained by spectral differences between the masker conditions. However, when the recognition task increased in difficulty, i.e., at a more challenging signal-to-noise ratio, a greater decrease in performance was observed for the maskers with more linguistically relevant information than what could be explained by spectral differences alone.
Collapse
Affiliation(s)
- Lauren Calandruccio
- Department of Linguistics and Communication Disorders, Queens College of the City University of New York, Flushing, New York 11367, USA.
| | | | | |
Collapse
|
32
|
Abstract
OBJECTIVE In previous work, a simplified version of the modulation spectrum, the Spectral Correlation Index, was shown to be related to consonant error patterns. It is unknown what effect clinical amplification strategies will have on the modulation spectrum. Accordingly, the goals of this study were to examine the effect of clinical amplification strategies on the consonant modulation spectrum and to determine whether there was a relationship between the modulation spectrum and consonant errors for spectrally robust, amplified speech presented to listeners with hearing loss. DESIGN Participants were 13 adults (mean age, 67 yrs) with mild to moderate sensorineural hearing loss. Each listener was fit monaurally in the test ear with a 16-band, four-channel behind the ear hearing aid. One memory of the hearing aid was programmed with compression limiting amplification and one with fast-acting wide-dynamic range compression (WDRC) amplification. Twenty-two consonant-vowel syllables were presented to the listener and recorded at the output of the hearing aid using a probe microphone system. A modulation spectrum was obtained for each amplified and unamplified consonant-vowel. Consonant recognition was also measured for each listener. RESULTS Results show that (1) WDRC increased heterogeneity of the modulation spectrum across consonants and (2) for spectrally robust speech processed with either compression limiting or WDRC amplification, two consonants with similar modulation spectra are more likely to be confused with one another than are the two consonants with dissimilar modulation spectra. CONCLUSION These data expand and confirm earlier results linking the modulation spectrum to specific consonant errors.
Collapse
|
33
|
Souza P, Rosen S. Effects of envelope bandwidth on the intelligibility of sine- and noise-vocoded speech. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2009; 126:792-805. [PMID: 19640044 PMCID: PMC2730710 DOI: 10.1121/1.3158835] [Citation(s) in RCA: 79] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
The choice of processing parameters for vocoded signals may have an important effect on the availability of various auditory features. Experiment 1 varied envelope cutoff frequency (30 and 300 Hz), carrier type (sine and noise), and number of bands (2-5) for vocoded speech presented to normal-hearing listeners. Performance was better with a high cutoff for sine-vocoding, with no effect of cutoff for noise-vocoding. With a low cutoff, performance was better for noise-vocoding than for sine-vocoding. With a high cutoff, performance was better for sine-vocoding. Experiment 2 measured perceptibility of cues to voice pitch variations. A noise carrier combined with a high cutoff allowed intonation to be perceived to some degree but performance was best in high-cutoff sine conditions. A low cutoff led to poorest performance, regardless of carrier. Experiment 3 tested the relative contributions of co-modulation across bands and spectral density to improved performance with a sine carrier and high cutoff. Co-modulation across bands had no effect so it appears that sidebands providing a denser spectrum improved performance. These results indicate that carrier type in combination with envelope cutoff can alter the available cues in vocoded speech, factors which must be considered in interpreting results with vocoded signals.
Collapse
Affiliation(s)
- Pamela Souza
- Department of Speech and Hearing Sciences, University of Washington, 1417 NE 42nd Street, Seattle, WA 98105, USA
| | | |
Collapse
|