1
|
Bispectral Feature Speech Intelligibility Assessment Metric Based on Auditory Model. COMPUT SPEECH LANG 2023. [DOI: 10.1016/j.csl.2023.101492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
|
2
|
Zaar J, Carney LH. Predicting speech intelligibility in hearing-impaired listeners using a physiologically inspired auditory model. Hear Res 2022; 426:108553. [PMID: 35750575 PMCID: PMC10560534 DOI: 10.1016/j.heares.2022.108553] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/14/2022] [Revised: 05/17/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022]
Abstract
This study presents a major update and full evaluation of a speech intelligibility (SI) prediction model previously introduced by Scheidiger, Carney, Dau, and Zaar [(2018), Acta Acust. United Ac. 104, 914-917]. The model predicts SI in speech-in-noise conditions via comparison of the noisy speech and the noise-alone reference. The two signals are processed through a physiologically inspired nonlinear model of the auditory periphery, for a range of characteristic frequencies (CFs), followed by a modulation analysis in the range of the fundamental frequency of speech. The decision metric of the model is the mean of a series of short-term, across-CF correlations between population responses to noisy speech and noise alone, with a sensitivity-limitation process imposed. The decision metric is assumed to be inversely related to SI and is converted to a percent-correct score using a single data-based fitting function. The model performance was evaluated in conditions of stationary, fluctuating, and speech-like interferers using sentence-based speech-reception thresholds (SRTs) previously obtained in 5 normal-hearing (NH) and 13 hearing-impaired (HI) listeners. For the NH listener group, the model accurately predicted SRTs across the different acoustic conditions (apart from a slight overestimation of the masking release observed for fluctuating maskers), as well as plausible effects in response to changes in presentation level. For HI listeners, the model was adjusted to account for the individual audiograms using standard assumptions concerning the amount of HI attributed to inner-hair-cell (IHC) and outer-hair-cell (OHC) impairment. HI model results accounted remarkably well for elevated individual SRTs and reduced masking release. Furthermore, plausible predictions of worsened SI were obtained when the relative contribution of IHC impairment to HI was increased. Overall, the present model provides a useful tool to accurately predict speech-in-noise outcomes in NH and HI listeners, and may yield important insights into auditory processes that are crucial for speech understanding.
Collapse
Affiliation(s)
- Johannes Zaar
- Eriksholm Research Centre, DK-3070 Snekkersten, Denmark; Hearing Systems Section, Department of Health Technology, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.
| | - Laurel H Carney
- Departments of Biomedical Engineering and Neuroscience, University of Rochester, Rochester, NY, 14642, USA
| |
Collapse
|
3
|
Haro S, Smalt CJ, Ciccarelli GA, Quatieri TF. Deep Neural Network Model of Hearing-Impaired Speech-in-Noise Perception. Front Neurosci 2020; 14:588448. [PMID: 33384579 PMCID: PMC7770113 DOI: 10.3389/fnins.2020.588448] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 11/10/2020] [Indexed: 01/15/2023] Open
Abstract
Many individuals struggle to understand speech in listening scenarios that include reverberation and background noise. An individual's ability to understand speech arises from a combination of peripheral auditory function, central auditory function, and general cognitive abilities. The interaction of these factors complicates the prescription of treatment or therapy to improve hearing function. Damage to the auditory periphery can be studied in animals; however, this method alone is not enough to understand the impact of hearing loss on speech perception. Computational auditory models bridge the gap between animal studies and human speech perception. Perturbations to the modeled auditory systems can permit mechanism-based investigations into observed human behavior. In this study, we propose a computational model that accounts for the complex interactions between different hearing damage mechanisms and simulates human speech-in-noise perception. The model performs a digit classification task as a human would, with only acoustic sound pressure as input. Thus, we can use the model's performance as a proxy for human performance. This two-stage model consists of a biophysical cochlear-nerve spike generator followed by a deep neural network (DNN) classifier. We hypothesize that sudden damage to the periphery affects speech perception and that central nervous system adaptation over time may compensate for peripheral hearing damage. Our model achieved human-like performance across signal-to-noise ratios (SNRs) under normal-hearing (NH) cochlear settings, achieving 50% digit recognition accuracy at -20.7 dB SNR. Results were comparable to eight NH participants on the same task who achieved 50% behavioral performance at -22 dB SNR. We also simulated medial olivocochlear reflex (MOCR) and auditory nerve fiber (ANF) loss, which worsened digit-recognition accuracy at lower SNRs compared to higher SNRs. Our simulated performance following ANF loss is consistent with the hypothesis that cochlear synaptopathy impacts communication in background noise more so than in quiet. Following the insult of various cochlear degradations, we implemented extreme and conservative adaptation through the DNN. At the lowest SNRs (<0 dB), both adapted models were unable to fully recover NH performance, even with hundreds of thousands of training samples. This implies a limit on performance recovery following peripheral damage in our human-inspired DNN architecture.
Collapse
Affiliation(s)
- Stephanie Haro
- Human Health and Performance Systems, Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, United States
- Speech and Hearing Biosciences and Technology, Harvard Medical School, Boston, MA, United States
| | - Christopher J. Smalt
- Human Health and Performance Systems, Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, United States
| | - Gregory A. Ciccarelli
- Human Health and Performance Systems, Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, United States
| | - Thomas F. Quatieri
- Human Health and Performance Systems, Massachusetts Institute of Technology Lincoln Laboratory, Lexington, MA, United States
- Speech and Hearing Biosciences and Technology, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
4
|
Mahmoodian N, Poudel P, Illanes A, Friebe M. Higher Order Statistical Analysis for Thyroid Texture Classification and Segmentation in 2D ultrasound Images. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2020; 2019:5832-5835. [PMID: 31947178 DOI: 10.1109/embc.2019.8857380] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Ultrasound (US) imaging is one of the most cost-effective imaging modality that utilizes sound waves for generating medical images of anatomical structure. However, the presence of speckle noise and low contrast in the US images makes it difficult to use for proper classification of anatomical structures in clinical scenarios. Hence, it is important to devise a method that is robust and accurate even in the presence of speckle noise and is not affected by the low image contrast. In this work, a novel approach for thyroid texture characterization based on extracting features utilizing higher order spectral analysis (HOSA) was used. A Support Vector Machine (SVM) was applied on the extracted features to classify the thyroid texture. Since HOSA is a well suited technique for processing non-Gaussian data involving non-linear dynamics, good classification of thyroid texture can be obtained in US images as they also contain non-Gaussian Speckle noise and nonlinear characteristics. A final accuracy of 93.27%, sensitivity of 0.92 and specificity of 0.62 were obtained using the proposed approach.
Collapse
|
5
|
Jassim WA, Zilany MS. NSQM: A non-intrusive assessment of speech quality using normalized energies of the neurogram. COMPUT SPEECH LANG 2019. [DOI: 10.1016/j.csl.2019.04.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
6
|
Hossain ME, Zilany MS, Davies-Venn E. On the feasibility of using a bispectral measure as a nonintrusive predictor of speech intelligibility. COMPUT SPEECH LANG 2019. [DOI: 10.1016/j.csl.2019.02.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
7
|
Mahmoodian N, Schaufler A, Pashazadeh A, Boese A, Friebe M, Illanes A. Proximal detection of guide wire perforation using feature extraction from bispectral audio signal analysis combined with machine learning. Comput Biol Med 2019; 107:10-17. [DOI: 10.1016/j.compbiomed.2019.02.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2018] [Revised: 01/25/2019] [Accepted: 02/02/2019] [Indexed: 11/26/2022]
|
8
|
Koning R, Bruce IC, Denys S, Wouters J. Perceptual and Model-Based Evaluation of Ideal Time-Frequency Noise Reduction in Hearing-Impaired Listeners. IEEE Trans Neural Syst Rehabil Eng 2018. [PMID: 29522412 DOI: 10.1109/tnsre.2018.2794557] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
State-of-the-art hearing aids (HAs) try to overcome the deficit of poor speech intelligibility (SI) in noisy listening environments using digital noise reduction (NR) techniques. The application of time-frequency masks to the noisy sound input is a common NR technique to increase SI. The binary mask with its binary weights and the Wiener filter with continuous weights are representatives of a hard- and a soft-decision approach for time-frequency masking. In normal-hearing listeners, the ideal Wiener filter (IWF) outperforms the ideal binary mask (IBM) in terms of SI and speech quality with perfect SI even at very low signal-to-noise ratios. In this paper, both approaches were investigated for hearing-impaired (HI) listeners. Perceptual and auditory model-based measures were used for the evaluation. The IWF outperformed the IBM in terms of SI. Quality-wise, there was no overall difference between the NR algorithms perceived. Additionally, the processed signals were evaluated based on an auditory nerve model using the neurogram similarity metric (NSIM). The mean NSIM values were significantly different for intelligible and unintelligible sentences. The results suggest that a soft-mask seems to be promising for application in HAs.
Collapse
|
9
|
Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues. J Assoc Res Otolaryngol 2017; 18:687-710. [PMID: 28748487 DOI: 10.1007/s10162-017-0627-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2015] [Accepted: 05/29/2017] [Indexed: 10/19/2022] Open
Abstract
Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Collapse
|
10
|
Moncada-Torres A, van Wieringen A, Bruce IC, Wouters J, Francart T. Predicting phoneme and word recognition in noise using a computational model of the auditory periphery. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2017; 141:300. [PMID: 28147586 DOI: 10.1121/1.4973569] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Several filterbank-based metrics have been proposed to predict speech intelligibility (SI). However, these metrics incorporate little knowledge of the auditory periphery. Neurogram-based metrics provide an alternative, incorporating knowledge of the physiology of hearing by using a mathematical model of the auditory nerve response. In this work, SI was assessed utilizing different filterbank-based metrics (the speech intelligibility index and the speech-based envelope power spectrum model) and neurogram-based metrics, using the biologically inspired model of the auditory nerve proposed by Zilany, Bruce, Nelson, and Carney [(2009), J. Acoust. Soc. Am. 126(5), 2390-2412] as a front-end and the neurogram similarity metric and spectro temporal modulation index as a back-end. Then, the correlations with behavioural scores were computed. Results showed that neurogram-based metrics representing the speech envelope showed higher correlations with the behavioural scores at a word level. At a per-phoneme level, it was found that phoneme transitions contribute to higher correlations between objective measures that use speech envelope information at the auditory periphery level and behavioural data. The presented framework could function as a useful tool for the validation and tuning of speech materials, as well as a benchmark for the development of speech processing algorithms.
Collapse
Affiliation(s)
- Arturo Moncada-Torres
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Astrid van Wieringen
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Ian C Bruce
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Jan Wouters
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| | - Tom Francart
- Department of Neurosciences, ExpORL, KU Leuven, Herestraat 49, Bus 721, 3000 Leuven, Belgium
| |
Collapse
|