1
|
Sinha R, Azadpour M. Employing Deep Learning Model to Evaluate Speech Information in Acoustic Simulations of Auditory Implants. RESEARCH SQUARE 2023:rs.3.rs-3085032. [PMID: 37461629 PMCID: PMC10350124 DOI: 10.21203/rs.3.rs-3085032/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Acoustic simulations have played a prominent role in the development of speech processing and sound coding strategies for auditory neural implant devices. Traditionally evaluated using human subjects, acoustic simulations have been used to model the impact of implant signal processing as well as individual anatomy/physiology on speech perception. However, human subject testing is time-consuming, costly, and subject to individual variability. In this study, we propose a novel approach to perform simulations of auditory implants. Rather than using actual human participants, we utilized an advanced deep-learning speech recognition model to simulate the effects of some important signal processing as well as psychophysical/physiological factors on speech perception. Several simulation conditions were produced by varying number of spectral bands, input frequency range, envelope cut-off frequency, envelope dynamic range and envelope quantization. Our results demonstrate that the deep-learning model exhibits human-like robustness to simulation parameters in quiet and noise, closely resembling existing human subject results. This approach is not only significantly quicker and less expensive than traditional human studies, but it also eliminates individual human variables such as attention and learning. Our findings pave the way for efficient and accurate evaluation of auditory implant simulations, aiding the future development of auditory neural prosthesis technologies.
Collapse
Affiliation(s)
- Rahul Sinha
- New York University Grossman School of Medicine
| | | |
Collapse
|
2
|
Sinha R, Azadpour M. Employing Deep Learning Model to Evaluate Speech Information in Vocoder Simulations of Auditory Implants. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.23.541843. [PMID: 37292787 PMCID: PMC10245887 DOI: 10.1101/2023.05.23.541843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Vocoder simulations have played a crucial role in the development of sound coding and speech processing techniques for auditory implant devices. Vocoders have been extensively used to model the effects of implant signal processing as well as individual anatomy and physiology on speech perception of implant users. Traditionally, such simulations have been conducted on human subjects, which can be time-consuming and costly. In addition, perception of vocoded speech varies significantly across individual subjects, and can be significantly affected by small amounts of familiarization or exposure to vocoded sounds. In this study, we propose a novel method that differs from traditional vocoder studies. Rather than using actual human participants, we use a speech recognition model to examine the influence of vocoder-simulated cochlear implant processing on speech perception. We used the OpenAI Whisper, a recently developed advanced open-source deep learning speech recognition model. The Whisper model's performance was evaluated on vocoded words and sentences in both quiet and noisy conditions with respect to several vocoder parameters such as number of spectral bands, input frequency range, envelope cut-off frequency, envelope dynamic range, and number of discriminable envelope steps. Our results indicate that the Whisper model exhibited human-like robustness to vocoder simulations, with performance closely mirroring that of human subjects in response to modifications in vocoder parameters. Furthermore, this proposed method has the advantage of being far less expensive and quicker than traditional human studies, while also being free from inter-individual variability in learning abilities, cognitive factors, and attentional states. Our study demonstrates the potential of employing advanced deep learning models of speech recognition in auditory prosthesis research.
Collapse
|
3
|
Cleary M, DeRoy Milvae K, Nguyen N, Bernstein JGW, Goupell MJ. Effect of experimentally introduced interaural frequency mismatch on sentence recognition in bilateral cochlear-implant listeners. JASA EXPRESS LETTERS 2023; 3:044401. [PMID: 37096891 PMCID: PMC10080388 DOI: 10.1121/10.0017705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 03/13/2023] [Indexed: 05/03/2023]
Abstract
Bilateral cochlear-implant users experience interaural frequency mismatch because of asymmetries in array insertion and frequency-to-electrode assignment. To explore the acute perceptual consequences of such mismatch, sentence recognition in quiet was measured in nine bilateral cochlear-implant listeners as frequency allocations in the poorer ear were shifted by ±1.5, ±3, and ±4.5 mm using experimental programs. Shifts in frequency allocation >3 mm reduced bilateral sentence scores below those for the better ear alone, suggesting that the poorer ear interfered with better-ear perception. This was not a result of fewer active channels; deactivating electrodes without frequency shifting had minimal effect.
Collapse
Affiliation(s)
- Miranda Cleary
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Kristina DeRoy Milvae
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Nicole Nguyen
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| | - Joshua G W Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, Maryland 20889, , , , ,
| | - Matthew J Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|
4
|
Cleary M, DeRoy Milvae K, Nguyen N, Bernstein JGW, Goupell MJ. Effect of experimentally introduced interaural frequency mismatch on sentence recognition in bilateral cochlear-implant listeners. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.01.06.23284274. [PMID: 36711489 PMCID: PMC9882401 DOI: 10.1101/2023.01.06.23284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Bilateral cochlear-implant users experience interaural frequency mismatch because of asymmetries in array insertion and frequency-to-electrode assignment. To explore the acute perceptual consequences of such mismatch, sentence recognition in quiet was measured in nine bilateral cochlear-implant listeners as frequency allocations in the poorer ear were shifted by ±1.5, ±3 and ±4.5 mm using experimental programs. Shifts in frequency allocation >3 mm were found to reduce bilateral sentence scores below those for the better ear alone, suggesting that the poorer ear interfered with better-ear perception. This was not a result of fewer active channels; deactivating electrodes without frequency shifting had minimal effect.
Collapse
Affiliation(s)
- Miranda Cleary
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| | - Kristina DeRoy Milvae
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA,Department of Communicative Disorders and Sciences, University at Buffalo, Buffalo, NY, USA
| | - Nicole Nguyen
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| | - Joshua G. W. Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Bethesda, MD, USA
| | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| |
Collapse
|
5
|
Cleary M, Bernstein JGW, Stakhovskaya OA, Noble J, Kolberg E, Jensen KK, Hoa M, Kim HJ, Goupell MJ. The Relationship Between Interaural Insertion-Depth Differences, Scalar Location, and Interaural Time-Difference Processing in Adult Bilateral Cochlear-Implant Listeners. Trends Hear 2022; 26:23312165221129165. [PMID: 36379607 PMCID: PMC9669699 DOI: 10.1177/23312165221129165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Sensitivity to interaural time differences (ITDs) in acoustic hearing involves comparison of interaurally frequency-matched inputs. Bilateral cochlear-implant arrays are, however, only approximately aligned in angular insertion depth and scalar location across the cochleae. Interaural place-of-stimulation mismatch therefore has the potential to impact binaural perception. ITD left-right discrimination thresholds were examined in 23 postlingually-deafened adult bilateral cochlear-implant listeners, using low-rate constant-amplitude pulse trains presented via direct stimulation to single electrodes in each ear. Angular insertion depth and scalar location measured from computed-tomography (CT) scans were used to quantify interaural mismatch, and their association with binaural performance was assessed. Number-matched electrodes displayed a median interaural insertion-depth mismatch of 18° and generally yielded best or near-best ITD discrimination thresholds. Two listeners whose discrimination thresholds did not show this pattern were confirmed via CT to have atypical array placement. Listeners with more number-matched electrode pairs located in the scala tympani displayed better thresholds than listeners with fewer such pairs. ITD tuning curves as a function of interaural electrode separation were broad; bandwidths at twice the threshold minimum averaged 10.5 electrodes (equivalent to 5.9 mm for a Cochlear-brand pre-curved array). Larger angular insertion-depth differences were associated with wider bandwidths. Wide ITD tuning curve bandwidths appear to be a product of both monopolar stimulation and angular insertion-depth mismatch. Cases of good ITD sensitivity with very wide bandwidths suggest that precise matching of insertion depth is not critical for discrimination thresholds. Further prioritizing scala tympani location at implantation should, however, benefit ITD sensitivity.
Collapse
Affiliation(s)
- Miranda Cleary
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| | - Joshua G. W. Bernstein
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical
Center, Bethesda, MD, USA
| | - Olga A. Stakhovskaya
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| | - Jack Noble
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA,Department of Hearing and Speech Sciences, Vanderbilt University
Medical Center, Nashville, TN, USA,Department of Otolaryngology, Vanderbilt University Medical Center,
Nashville, TN, USA
| | - Elizabeth Kolberg
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA
| | - Kenneth K. Jensen
- National Military Audiology and Speech Pathology Center, Walter Reed National Military Medical
Center, Bethesda, MD, USA
| | - Michael Hoa
- Department of Otolaryngology-Head and Neck Surgery, Georgetown University Medical
Center, Washington, DC, USA
| | - Hung Jeffrey Kim
- Department of Otolaryngology-Head and Neck Surgery, Georgetown University Medical
Center, Washington, DC, USA
| | - Matthew J. Goupell
- Department of Hearing and Speech Sciences, University of Maryland, College Park, MD, USA,Matthew J. Goupell, Department of Hearing
and Speech Sciences, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
6
|
Xu K, Willis S, Gopen Q, Fu QJ. Effects of Spectral Resolution and Frequency Mismatch on Speech Understanding and Spatial Release From Masking in Simulated Bilateral Cochlear Implants. Ear Hear 2021; 41:1362-1371. [PMID: 32132377 DOI: 10.1097/aud.0000000000000865] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
OBJECTIVES Due to interaural frequency mismatch, bilateral cochlear-implant (CI) users may be less able to take advantage of binaural cues that normal-hearing (NH) listeners use for spatial hearing, such as interaural time differences and interaural level differences. As such, bilateral CI users have difficulty segregating competing speech even when the target and competing talkers are spatially separated. The goal of this study was to evaluate the effects of spectral resolution, tonotopic mismatch (the frequency mismatch between the acoustic center frequency assigned to CI electrode within an implanted ear relative to the expected spiral ganglion characteristic frequency), and interaural mismatch (differences in the degree of tonotopic mismatch in each ear) on speech understanding and spatial release from masking (SRM) in the presence of competing talkers in NH subjects listening to bilateral vocoder simulations. DESIGN During testing, both target and masker speech were presented in five-word sentences that had the same syntax but were not necessarily meaningful. The sentences were composed of five categories in fixed order (Name, Verb, Number, Color, and Clothes), each of which had 10 items, such that multiple sentences could be generated by randomly selecting a word from each category. Speech reception thresholds (SRTs) for the target sentence presented in competing speech maskers were measured. The target speech was delivered to both ears and the two speech maskers were delivered to (1) both ears (diotic masker), or (2) different ears (dichotic masker: one delivered to the left ear and the other delivered to the right ear). Stimuli included the unprocessed speech and four 16-channel sine-vocoder simulations with different interaural mismatch (0, 1, and 2 mm). SRM was calculated as the difference between the diotic and dichotic listening conditions. RESULTS With unprocessed speech, SRTs were 0.3 and -18.0 dB for the diotic and dichotic maskers, respectively. For the spectrally degraded speech with mild tonotopic mismatch and no interaural mismatch, SRTs were 5.6 and -2.0 dB for the diotic and dichotic maskers, respectively. When the tonotopic mismatch increased in both ears, SRTs worsened to 8.9 and 2.4 dB for the diotic and dichotic maskers, respectively. When the two ears had different tonotopic mismatch (e.g., there was interaural mismatch), the performance drop in SRTs was much larger for the dichotic than for the diotic masker. The largest SRM was observed with unprocessed speech (18.3 dB). With the CI simulations, SRM was significantly reduced to 7.6 dB even with mild tonotopic mismatch but no interaural mismatch; SRM was further reduced with increasing interaural mismatch. CONCLUSIONS The results demonstrate that frequency resolution, tonotopic mismatch, and interaural mismatch have differential effects on speech understanding and SRM in simulation of bilateral CIs. Minimizing interaural mismatch may be critical to optimize binaural benefits and improve CI performance for competing speech, a typical listening environment. SRM (the difference in SRTs between diotic and dichotic maskers) may be a useful clinical tool to assess interaural frequency mismatch in bilateral CI users and to evaluate the benefits of optimization methods that minimize interaural mismatch.
Collapse
Affiliation(s)
- Kevin Xu
- Department of Head and Neck Surgery, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California, USA
| | | | | | | |
Collapse
|
7
|
Individual Variability in Recalibrating to Spectrally Shifted Speech: Implications for Cochlear Implants. Ear Hear 2021; 42:1412-1427. [PMID: 33795617 DOI: 10.1097/aud.0000000000001043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
OBJECTIVES Cochlear implant (CI) recipients are at a severe disadvantage compared with normal-hearing listeners in distinguishing consonants that differ by place of articulation because the key relevant spectral differences are degraded by the implant. One component of that degradation is the upward shifting of spectral energy that occurs with a shallow insertion depth of a CI. The present study aimed to systematically measure the effects of spectral shifting on word recognition and phoneme categorization by specifically controlling the amount of shifting and using stimuli whose identification specifically depends on perceiving frequency cues. We hypothesized that listeners would be biased toward perceiving phonemes that contain higher-frequency components because of the upward frequency shift and that intelligibility would decrease as spectral shifting increased. DESIGN Normal-hearing listeners (n = 15) heard sine wave-vocoded speech with simulated upward frequency shifts of 0, 2, 4, and 6 mm of cochlear space to simulate shallow CI insertion depth. Stimuli included monosyllabic words and /b/-/d/ and /∫/-/s/ continua that varied systematically by formant frequency transitions or frication noise spectral peaks, respectively. Recalibration to spectral shifting was operationally defined as shifting perceptual acoustic-phonetic mapping commensurate with the spectral shift. In other words, adjusting frequency expectations for both phonemes upward so that there is still a perceptual distinction, rather than hearing all upward-shifted phonemes as the higher-frequency member of the pair. RESULTS For moderate amounts of spectral shifting, group data suggested a general "halfway" recalibration to spectral shifting, but individual data suggested a notably different conclusion: half of the listeners were able to recalibrate fully, while the other halves of the listeners were utterly unable to categorize shifted speech with any reliability. There were no participants who demonstrated a pattern intermediate to these two extremes. Intelligibility of words decreased with greater amounts of spectral shifting, also showing loose clusters of better- and poorer-performing listeners. Phonetic analysis of word errors revealed certain cues were more susceptible to being compromised due to a frequency shift (place and manner of articulation), while voicing was robust to spectral shifting. CONCLUSIONS Shifting the frequency spectrum of speech has systematic effects that are in line with known properties of speech acoustics, but the ensuing difficulties cannot be predicted based on tonotopic mismatch alone. Difficulties are subject to substantial individual differences in the capacity to adjust acoustic-phonetic mapping. These results help to explain why speech recognition in CI listeners cannot be fully predicted by peripheral factors like electrode placement and spectral resolution; even among listeners with functionally equivalent auditory input, there is an additional factor of simply being able or unable to flexibly adjust acoustic-phonetic mapping. This individual variability could motivate precise treatment approaches guided by an individual's relative reliance on wideband frequency representation (even if it is mismatched) or limited frequency coverage whose tonotopy is preserved.
Collapse
|