1
|
Borjigin A, Kokkinakis K, Bharadwaj HM, Stohl JS. Deep learning restores speech intelligibility in multi-talker interference for cochlear implant users. Sci Rep 2024; 14:13241. [PMID: 38853168 PMCID: PMC11163011 DOI: 10.1038/s41598-024-63675-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 05/31/2024] [Indexed: 06/11/2024] Open
Abstract
Cochlear implants (CIs) do not offer the same level of effectiveness in noisy environments as in quiet settings. Current single-microphone noise reduction algorithms in hearing aids and CIs only remove predictable, stationary noise, and are ineffective against realistic, non-stationary noise such as multi-talker interference. Recent developments in deep neural network (DNN) algorithms have achieved noteworthy performance in speech enhancement and separation, especially in removing speech noise. However, more work is needed to investigate the potential of DNN algorithms in removing speech noise when tested with listeners fitted with CIs. Here, we implemented two DNN algorithms that are well suited for applications in speech audio processing: (1) recurrent neural network (RNN) and (2) SepFormer. The algorithms were trained with a customized dataset ( ∼ 30 h), and then tested with thirteen CI listeners. Both RNN and SepFormer algorithms significantly improved CI listener's speech intelligibility in noise without compromising the perceived quality of speech overall. These algorithms not only increased the intelligibility in stationary non-speech noise, but also introduced a substantial improvement in non-stationary noise, where conventional signal processing strategies fall short with little benefits. These results show the promise of using DNN algorithms as a solution for listening challenges in multi-talker noise interference.
Collapse
Affiliation(s)
- Agudemu Borjigin
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, 47907, IN, USA.
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- North American Research Laboratory, MED-EL Corporation, Durham, NC, 27713, USA.
| | - Kostas Kokkinakis
- Concha Labs, San Francisco, CA, 94114, USA
- North American Research Laboratory, MED-EL Corporation, Durham, NC, 27713, USA
| | - Hari M Bharadwaj
- Weldon School of Biomedical Engineering, Purdue University, West Lafayette, 47907, IN, USA
- Department of Speech, Language, and Hearing Sciences, Purdue University, West Lafayette, 47907, IN, USA
- Department of Communication Science and Disorders, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Joshua S Stohl
- North American Research Laboratory, MED-EL Corporation, Durham, NC, 27713, USA
| |
Collapse
|
2
|
Gaultier C, Goehring T. Recovering speech intelligibility with deep learning and multiple microphones in noisy-reverberant situations for people using cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:3833-3847. [PMID: 38884525 DOI: 10.1121/10.0026218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 05/10/2024] [Indexed: 06/18/2024]
Abstract
For cochlear implant (CI) listeners, holding a conversation in noisy and reverberant environments is often challenging. Deep-learning algorithms can potentially mitigate these difficulties by enhancing speech in everyday listening environments. This study compared several deep-learning algorithms with access to one, two unilateral, or six bilateral microphones that were trained to recover speech signals by jointly removing noise and reverberation. The noisy-reverberant speech and an ideal noise reduction algorithm served as lower and upper references, respectively. Objective signal metrics were compared with results from two listening tests, including 15 typical hearing listeners with CI simulations and 12 CI listeners. Large and statistically significant improvements in speech reception thresholds of 7.4 and 10.3 dB were found for the multi-microphone algorithms. For the single-microphone algorithm, there was an improvement of 2.3 dB but only for the CI listener group. The objective signal metrics correctly predicted the rank order of results for CI listeners, and there was an overall agreement for most effects and variances between results for CI simulations and CI listeners. These algorithms hold promise to improve speech intelligibility for CI listeners in environments with noise and reverberation and benefit from a boost in performance when using features extracted from multiple microphones.
Collapse
Affiliation(s)
- Clément Gaultier
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| | - Tobias Goehring
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, United Kingdom
| |
Collapse
|
3
|
Saba L, Maindarkar M, Johri AM, Mantella L, Laird JR, Khanna NN, Paraskevas KI, Ruzsa Z, Kalra MK, Fernandes JFE, Chaturvedi S, Nicolaides A, Rathore V, Singh N, Isenovic ER, Viswanathan V, Fouda MM, Suri JS. UltraAIGenomics: Artificial Intelligence-Based Cardiovascular Disease Risk Assessment by Fusion of Ultrasound-Based Radiomics and Genomics Features for Preventive, Personalized and Precision Medicine: A Narrative Review. Rev Cardiovasc Med 2024; 25:184. [PMID: 39076491 PMCID: PMC11267214 DOI: 10.31083/j.rcm2505184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/24/2024] [Accepted: 03/05/2024] [Indexed: 07/31/2024] Open
Abstract
Cardiovascular disease (CVD) diagnosis and treatment are challenging since symptoms appear late in the disease's progression. Despite clinical risk scores, cardiac event prediction is inadequate, and many at-risk patients are not adequately categorised by conventional risk factors alone. Integrating genomic-based biomarkers (GBBM), specifically those found in plasma and/or serum samples, along with novel non-invasive radiomic-based biomarkers (RBBM) such as plaque area and plaque burden can improve the overall specificity of CVD risk. This review proposes two hypotheses: (i) RBBM and GBBM biomarkers have a strong correlation and can be used to detect the severity of CVD and stroke precisely, and (ii) introduces a proposed artificial intelligence (AI)-based preventive, precision, and personalized ( aiP 3 ) CVD/Stroke risk model. The PRISMA search selected 246 studies for the CVD/Stroke risk. It showed that using the RBBM and GBBM biomarkers, deep learning (DL) modelscould be used for CVD/Stroke risk stratification in the aiP 3 framework. Furthermore, we present a concise overview of platelet function, complete blood count (CBC), and diagnostic methods. As part of the AI paradigm, we discuss explainability, pruning, bias, and benchmarking against previous studies and their potential impacts. The review proposes the integration of RBBM and GBBM, an innovative solution streamlined in the DL paradigm for predicting CVD/Stroke risk in the aiP 3 framework. The combination of RBBM and GBBM introduces a powerful CVD/Stroke risk assessment paradigm. aiP 3 model signifies a promising advancement in CVD/Stroke risk assessment.
Collapse
Affiliation(s)
- Luca Saba
- Department of Radiology, Azienda Ospedaliero Universitaria, 40138 Cagliari, Italy
| | - Mahesh Maindarkar
- School of Bioengineering Sciences and Research, MIT Art, Design and Technology University, 412021 Pune, India
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
| | - Amer M. Johri
- Department of Medicine, Division of Cardiology, Queen’s University, Kingston, ON K7L 3N6, Canada
| | - Laura Mantella
- Department of Medicine, Division of Cardiology, University of Toronto, Toronto, ON M5S 1A1, Canada
| | - John R. Laird
- Heart and Vascular Institute, Adventist Health St. Helena, St Helena, CA 94574, USA
| | - Narendra N. Khanna
- Department of Cardiology, Indraprastha APOLLO Hospitals, 110001 New Delhi, India
| | | | - Zoltan Ruzsa
- Invasive Cardiology Division, University of Szeged, 6720 Szeged, Hungary
| | - Manudeep K. Kalra
- Department of Radiology, Harvard Medical School, Boston, MA 02115, USA
| | | | - Seemant Chaturvedi
- Department of Neurology & Stroke Program, University of Maryland, Baltimore, MD 20742, USA
| | - Andrew Nicolaides
- Vascular Screening and Diagnostic Centre and University of Nicosia Medical School, 2368 Agios Dometios, Cyprus
| | - Vijay Rathore
- Nephrology Department, Kaiser Permanente, Sacramento, CA 95823, USA
| | - Narpinder Singh
- Department of Food Science and Technology, Graphic Era Deemed to be University, Dehradun, 248002 Uttarakhand, India
| | - Esma R. Isenovic
- Department of Radiobiology and Molecular Genetics, National Institute of The Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia
| | | | - Mostafa M. Fouda
- Department of Electrical and Computer Engineering, Idaho State University, Pocatello, ID 83209, USA
| | - Jasjit S. Suri
- Stroke Monitoring and Diagnostic Division, AtheroPoint™, Roseville, CA 95661, USA
- Department of Computer Engineering, Graphic Era Deemed to be University, Dehradun, 248002 Uttarakhand, India
| |
Collapse
|
4
|
Fletcher MD, Perry SW, Thoidis I, Verschuur CA, Goehring T. Improved tactile speech robustness to background noise with a dual-path recurrent neural network noise-reduction method. Sci Rep 2024; 14:7357. [PMID: 38548750 PMCID: PMC10978864 DOI: 10.1038/s41598-024-57312-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/17/2024] [Indexed: 04/01/2024] Open
Abstract
Many people with hearing loss struggle to understand speech in noisy environments, making noise robustness critical for hearing-assistive devices. Recently developed haptic hearing aids, which convert audio to vibration, can improve speech-in-noise performance for cochlear implant (CI) users and assist those unable to access hearing-assistive devices. They are typically body-worn rather than head-mounted, allowing additional space for batteries and microprocessors, and so can deploy more sophisticated noise-reduction techniques. The current study assessed whether a real-time-feasible dual-path recurrent neural network (DPRNN) can improve tactile speech-in-noise performance. Audio was converted to vibration on the wrist using a vocoder method, either with or without noise reduction. Performance was tested for speech in a multi-talker noise (recorded at a party) with a 2.5-dB signal-to-noise ratio. An objective assessment showed the DPRNN improved the scale-invariant signal-to-distortion ratio by 8.6 dB and substantially outperformed traditional noise-reduction (log-MMSE). A behavioural assessment in 16 participants showed the DPRNN improved tactile-only sentence identification in noise by 8.2%. This suggests that advanced techniques like the DPRNN could substantially improve outcomes with haptic hearing aids. Low-cost haptic devices could soon be an important supplement to hearing-assistive devices such as CIs or offer an alternative for people who cannot access CI technology.
Collapse
Affiliation(s)
- Mark D Fletcher
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Samuel W Perry
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Iordanis Thoidis
- School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124, Thessaloniki, Greece
| | - Carl A Verschuur
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Tobias Goehring
- MRC Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
5
|
Shahidi LK, Collins LM, Mainsah BO. Objective intelligibility measurement of reverberant vocoded speech for normal-hearing listeners: Towards facilitating the development of speech enhancement algorithms for cochlear implants. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2024; 155:2151-2168. [PMID: 38501923 PMCID: PMC10959555 DOI: 10.1121/10.0025285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 12/29/2023] [Accepted: 02/24/2024] [Indexed: 03/20/2024]
Abstract
Cochlear implant (CI) recipients often struggle to understand speech in reverberant environments. Speech enhancement algorithms could restore speech perception for CI listeners by removing reverberant artifacts from the CI stimulation pattern. Listening studies, either with cochlear-implant recipients or normal-hearing (NH) listeners using a CI acoustic model, provide a benchmark for speech intelligibility improvements conferred by the enhancement algorithm but are costly and time consuming. To reduce the associated costs during algorithm development, speech intelligibility could be estimated offline using objective intelligibility measures. Previous evaluations of objective measures that considered CIs primarily assessed the combined impact of noise and reverberation and employed highly accurate enhancement algorithms. To facilitate the development of enhancement algorithms, we evaluate twelve objective measures in reverberant-only conditions characterized by a gradual reduction of reverberant artifacts, simulating the performance of an enhancement algorithm during development. Measures are validated against the performance of NH listeners using a CI acoustic model. To enhance compatibility with reverberant CI-processed signals, measure performance was assessed after modifying the reference signal and spectral filterbank. Measures leveraging the speech-to-reverberant ratio, cepstral distance and, after modifying the reference or filterbank, envelope correlation are strong predictors of intelligibility for reverberant CI-processed speech.
Collapse
Affiliation(s)
- Lidea K Shahidi
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Leslie M Collins
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| | - Boyla O Mainsah
- Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27701, USA
| |
Collapse
|
6
|
Fletcher MD, Akis E, Verschuur CA, Perry SW. Improved tactile speech perception using audio-to-tactile sensory substitution with formant frequency focusing. Sci Rep 2024; 14:4889. [PMID: 38418558 PMCID: PMC10901863 DOI: 10.1038/s41598-024-55429-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Accepted: 02/23/2024] [Indexed: 03/01/2024] Open
Abstract
Haptic hearing aids, which provide speech information through tactile stimulation, could substantially improve outcomes for both cochlear implant users and for those unable to access cochlear implants. Recent advances in wide-band haptic actuator technology have made new audio-to-tactile conversion strategies viable for wearable devices. One such strategy filters the audio into eight frequency bands, which are evenly distributed across the speech frequency range. The amplitude envelopes from the eight bands modulate the amplitudes of eight low-frequency tones, which are delivered through vibration to a single site on the wrist. This tactile vocoder strategy effectively transfers some phonemic information, but vowels and obstruent consonants are poorly portrayed. In 20 participants with normal touch perception, we tested (1) whether focusing the audio filters of the tactile vocoder more densely around the first and second formant frequencies improved tactile vowel discrimination, and (2) whether focusing filters at mid-to-high frequencies improved obstruent consonant discrimination. The obstruent-focused approach was found to be ineffective. However, the formant-focused approach improved vowel discrimination by 8%, without changing overall consonant discrimination. The formant-focused tactile vocoder strategy, which can readily be implemented in real time on a compact device, could substantially improve speech perception for haptic hearing aid users.
Collapse
Affiliation(s)
- Mark D Fletcher
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Esma Akis
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Carl A Verschuur
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Samuel W Perry
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| |
Collapse
|
7
|
MacIntyre AD, Carlyon RP, Goehring T. Neural Decoding of the Speech Envelope: Effects of Intelligibility and Spectral Degradation. Trends Hear 2024; 28:23312165241266316. [PMID: 39183533 PMCID: PMC11345737 DOI: 10.1177/23312165241266316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/23/2024] [Accepted: 06/16/2024] [Indexed: 08/27/2024] Open
Abstract
During continuous speech perception, endogenous neural activity becomes time-locked to acoustic stimulus features, such as the speech amplitude envelope. This speech-brain coupling can be decoded using non-invasive brain imaging techniques, including electroencephalography (EEG). Neural decoding may provide clinical use as an objective measure of stimulus encoding by the brain-for example during cochlear implant listening, wherein the speech signal is severely spectrally degraded. Yet, interplay between acoustic and linguistic factors may lead to top-down modulation of perception, thereby complicating audiological applications. To address this ambiguity, we assess neural decoding of the speech envelope under spectral degradation with EEG in acoustically hearing listeners (n = 38; 18-35 years old) using vocoded speech. We dissociate sensory encoding from higher-order processing by employing intelligible (English) and non-intelligible (Dutch) stimuli, with auditory attention sustained using a repeated-phrase detection task. Subject-specific and group decoders were trained to reconstruct the speech envelope from held-out EEG data, with decoder significance determined via random permutation testing. Whereas speech envelope reconstruction did not vary by spectral resolution, intelligible speech was associated with better decoding accuracy in general. Results were similar across subject-specific and group analyses, with less consistent effects of spectral degradation in group decoding. Permutation tests revealed possible differences in decoder statistical significance by experimental condition. In general, while robust neural decoding was observed at the individual and group level, variability within participants would most likely prevent the clinical use of such a measure to differentiate levels of spectral degradation and intelligibility on an individual basis.
Collapse
Affiliation(s)
| | - Robert P. Carlyon
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| | - Tobias Goehring
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, UK
| |
Collapse
|
8
|
Henry F, Parsi A, Glavin M, Jones E. Experimental Investigation of Acoustic Features to Optimize Intelligibility in Cochlear Implants. SENSORS (BASEL, SWITZERLAND) 2023; 23:7553. [PMID: 37688009 PMCID: PMC10490615 DOI: 10.3390/s23177553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 08/21/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023]
Abstract
Although cochlear implants work well for people with hearing impairment in quiet conditions, it is well-known that they are not as effective in noisy environments. Noise reduction algorithms based on machine learning allied with appropriate speech features can be used to address this problem. The purpose of this study is to investigate the importance of acoustic features in such algorithms. Acoustic features are extracted from speech and noise mixtures and used in conjunction with the ideal binary mask to train a deep neural network to estimate masks for speech synthesis to produce enhanced speech. The intelligibility of this speech is objectively measured using metrics such as Short-time Objective Intelligibility (STOI), Hit Rate minus False Alarm Rate (HIT-FA) and Normalized Covariance Measure (NCM) for both simulated normal-hearing and hearing-impaired scenarios. A wide range of existing features is experimentally evaluated, including features that have not been traditionally applied in this application. The results demonstrate that frequency domain features perform best. In particular, Gammatone features performed best for normal hearing over a range of signal-to-noise ratios and noise types (STOI = 0.7826). Mel spectrogram features exhibited the best overall performance for hearing impairment (NCM = 0.7314). There is a stronger correlation between STOI and NCM than HIT-FA and NCM, suggesting that the former is a better predictor of intelligibility for hearing-impaired listeners. The results of this study may be useful in the design of adaptive intelligibility enhancement systems for cochlear implants based on both the noise level and the nature of the noise (stationary or non-stationary).
Collapse
Affiliation(s)
- Fergal Henry
- Department of Computing and Electronic Engineering, Atlantic Technological University Sligo, Ash Lane, F91 YW50 Sligo, Ireland
| | - Ashkan Parsi
- Electrical and Electronic Engineering, University of Galway, University Road, H91 TK33 Galway, Ireland; (A.P.); (M.G.); (E.J.)
| | - Martin Glavin
- Electrical and Electronic Engineering, University of Galway, University Road, H91 TK33 Galway, Ireland; (A.P.); (M.G.); (E.J.)
| | - Edward Jones
- Electrical and Electronic Engineering, University of Galway, University Road, H91 TK33 Galway, Ireland; (A.P.); (M.G.); (E.J.)
| |
Collapse
|
9
|
Stavropoulos A, Lakshminarasimhan KJ, Angelaki DE. Belief embodiment through eye movements facilitates memory-guided navigation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.21.554107. [PMID: 37662309 PMCID: PMC10473632 DOI: 10.1101/2023.08.21.554107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Neural network models optimized for task performance often excel at predicting neural activity but do not explain other properties such as the distributed representation across functionally distinct areas. Distributed representations may arise from animals' strategies for resource utilization, however, fixation-based paradigms deprive animals of a vital resource: eye movements. During a naturalistic task in which humans use a joystick to steer and catch flashing fireflies in a virtual environment lacking position cues, subjects physically track the latent task variable with their gaze. We show this strategy to be true also during an inertial version of the task in the absence of optic flow and demonstrate that these task-relevant eye movements reflect an embodiment of the subjects' dynamically evolving internal beliefs about the goal. A neural network model with tuned recurrent connectivity between oculomotor and evidence-integrating frontoparietal circuits accounted for this behavioral strategy. Critically, this model better explained neural data from monkeys' posterior parietal cortex compared to task-optimized models unconstrained by such an oculomotor-based cognitive strategy. These results highlight the importance of unconstrained movement in working memory computations and establish a functional significance of oculomotor signals for evidence-integration and navigation computations via embodied cognition.
Collapse
Affiliation(s)
| | | | - Dora E. Angelaki
- Center for Neural Science, New York University, New York, NY, USA
- Tandon School of Engineering, New York University, New York, NY, USA
| |
Collapse
|
10
|
Fletcher MD, Verschuur CA, Perry SW. Improving speech perception for hearing-impaired listeners using audio-to-tactile sensory substitution with multiple frequency channels. Sci Rep 2023; 13:13336. [PMID: 37587166 PMCID: PMC10432540 DOI: 10.1038/s41598-023-40509-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 08/11/2023] [Indexed: 08/18/2023] Open
Abstract
Cochlear implants (CIs) have revolutionised treatment of hearing loss, but large populations globally cannot access them either because of disorders that prevent implantation or because they are expensive and require specialist surgery. Recent technology developments mean that haptic aids, which transmit speech through vibration, could offer a viable low-cost, non-invasive alternative. One important development is that compact haptic actuators can now deliver intense stimulation across multiple frequencies. We explored whether these multiple frequency channels can transfer spectral information to improve tactile phoneme discrimination. To convert audio to vibration, the speech amplitude envelope was extracted from one or more audio frequency bands and used to amplitude modulate one or more vibro-tactile tones delivered to a single-site on the wrist. In 26 participants with normal touch sensitivity, tactile-only phoneme discrimination was assessed with one, four, or eight frequency bands. Compared to one frequency band, performance improved by 5.9% with four frequency bands and by 8.4% with eight frequency bands. The multi-band signal-processing approach can be implemented in real-time on a compact device, and the vibro-tactile tones can be reproduced by the latest compact, low-powered actuators. This approach could therefore readily be implemented in a low-cost haptic hearing aid to deliver real-world benefits.
Collapse
Affiliation(s)
- Mark D Fletcher
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK.
| | - Carl A Verschuur
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| | - Samuel W Perry
- University of Southampton Auditory Implant Service, University of Southampton, University Road, Southampton, SO17 1BJ, UK
- Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton, SO17 1BJ, UK
| |
Collapse
|
11
|
Healy EW, Johnson EM, Pandey A, Wang D. Progress made in the efficacy and viability of deep-learning-based noise reduction. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2023; 153:2751. [PMID: 37133814 PMCID: PMC10159658 DOI: 10.1121/10.0019341] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2022] [Revised: 04/17/2023] [Accepted: 04/17/2023] [Indexed: 05/04/2023]
Abstract
Recent years have brought considerable advances to our ability to increase intelligibility through deep-learning-based noise reduction, especially for hearing-impaired (HI) listeners. In this study, intelligibility improvements resulting from a current algorithm are assessed. These benefits are compared to those resulting from the initial demonstration of deep-learning-based noise reduction for HI listeners ten years ago in Healy, Yoho, Wang, and Wang [(2013). J. Acoust. Soc. Am. 134, 3029-3038]. The stimuli and procedures were broadly similar across studies. However, whereas the initial study involved highly matched training and test conditions, as well as non-causal operation, preventing its ability to operate in the real world, the current attentive recurrent network employed different noise types, talkers, and speech corpora for training versus test, as required for generalization, and it was fully causal, as required for real-time operation. Significant intelligibility benefit was observed in every condition, which averaged 51% points across conditions for HI listeners. Further, benefit was comparable to that obtained in the initial demonstration, despite the considerable additional demands placed on the current algorithm. The retention of large benefit despite the systematic removal of various constraints as required for real-world operation reflects the substantial advances made to deep-learning-based noise reduction.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric M Johnson
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| | - Ashutosh Pandey
- Department of Computer Science and Engineering, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
12
|
Scheinker A, Cropp F, Filippetto D. Adaptive autoencoder latent space tuning for more robust machine learning beyond the training set for six-dimensional phase space diagnostics of a time-varying ultrafast electron-diffraction compact accelerator. Phys Rev E 2023; 107:045302. [PMID: 37198850 DOI: 10.1103/physreve.107.045302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/27/2023] [Indexed: 05/19/2023]
Abstract
We present a general adaptive latent space tuning approach for improving the robustness of machine learning tools with respect to time variation and distribution shift. We demonstrate our approach by developing an encoder-decoder convolutional neural network-based virtual 6D phase space diagnostic of charged particle beams in the HiRES ultrafast electron diffraction (UED) compact particle accelerator with uncertainty quantification. Our method utilizes model-independent adaptive feedback to tune a low-dimensional 2D latent space representation of ∼1 million dimensional objects which are the 15 unique 2D projections (x,y),...,(z,p_{z}) of the 6D phase space (x,y,z,p_{x},p_{y},p_{z}) of the charged particle beams. We demonstrate our method with numerical studies of short electron bunches utilizing experimentally measured UED input beam distributions.
Collapse
Affiliation(s)
- Alexander Scheinker
- Applied Electrodynamics Group, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Frederick Cropp
- Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California 94720, USA
- Department of Physics and Astronomy, University of California Los Angeles, Los Angeles, California 90095, USA
| | - Daniele Filippetto
- Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, California 94720, USA
| |
Collapse
|
13
|
Chu K, Collins L, Mainsah B. Suppressing reverberation in cochlear implant stimulus patterns using time-frequency masks based on phoneme groups. PROCEEDINGS OF MEETINGS ON ACOUSTICS. ACOUSTICAL SOCIETY OF AMERICA 2022; 50:050002. [PMID: 38031629 PMCID: PMC10686264 DOI: 10.1121/2.0001698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2023]
Abstract
Cochlear implant (CI) users experience considerable difficulty in understanding speech in reverberant listening environments. This issue is commonly addressed with time-frequency masking, where a time-frequency decomposed reverberant signal is multiplied by a matrix of gain values to suppress reverberation. However, mask estimation is challenging in reverberant environments due to the large spectro-temporal variations in the speech signal. To overcome this variability, we previously developed a phoneme-based algorithm that selects a different mask estimation model based on the underlying phoneme. In the ideal case where knowledge of the phoneme was assumed, the phoneme-based approach provided larger benefits than a phoneme-independent approach when tested in normal-hearing listeners using an acoustic model of CI processing. The current work investigates the phoneme-based mask estimation algorithm in the real-time feasible case where the prediction from a phoneme classifier is used to select the phoneme-specific mask. To further ensure real-time feasibility, both the phoneme classifier and mask estimation algorithm use causal features extracted from within the CI processing framework. We conducted experiments in normal-hearing listeners using an acoustic model of CI processing, and the results showed that the phoneme-specific algorithm benefitted the majority of subjects.
Collapse
Affiliation(s)
- Kevin Chu
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705
| | - Leslie Collins
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705
| | - Boyla Mainsah
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27705
| |
Collapse
|
14
|
Toward Personalized Diagnosis and Therapy for Hearing Loss: Insights From Cochlear Implants. Otol Neurotol 2022; 43:e903-e909. [PMID: 35970169 DOI: 10.1097/mao.0000000000003624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
ABSTRACT Sensorineural hearing loss (SNHL) is the most common sensory deficit, disabling nearly half a billion people worldwide. The cochlear implant (CI) has transformed the treatment of patients with SNHL, having restored hearing to more than 800,000 people. The success of CIs has inspired multidisciplinary efforts to address the unmet need for personalized, cellular-level diagnosis, and treatment of patients with SNHL. Current limitations include an inability to safely and accurately image at high resolution and biopsy the inner ear, precluding the use of key structural and molecular information during diagnostic and treatment decisions. Furthermore, there remains a lack of pharmacological therapies for hearing loss, which can partially be attributed to challenges associated with new drug development. We highlight advances in diagnostic and therapeutic strategies for SNHL that will help accelerate the push toward precision medicine. In addition, we discuss technological improvements for the CI that will further enhance its functionality for future patients. This report highlights work that was originally presented by Dr. Stankovic as part of the Dr. John Niparko Memorial Lecture during the 2021 American Cochlear Implant Alliance annual meeting.
Collapse
|
15
|
Brungart DS, Sherlock LP, Kuchinsky SE, Perry TT, Bieber RE, Grant KW, Bernstein JGW. Assessment methods for determining small changes in hearing performance over time. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 151:3866. [PMID: 35778214 DOI: 10.1121/10.0011509] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Although the behavioral pure-tone threshold audiogram is considered the gold standard for quantifying hearing loss, assessment of speech understanding, especially in noise, is more relevant to quality of life but is only partly related to the audiogram. Metrics of speech understanding in noise are therefore an attractive target for assessing hearing over time. However, speech-in-noise assessments have more potential sources of variability than pure-tone threshold measures, making it a challenge to obtain results reliable enough to detect small changes in performance. This review examines the benefits and limitations of speech-understanding metrics and their application to longitudinal hearing assessment, and identifies potential sources of variability, including learning effects, differences in item difficulty, and between- and within-individual variations in effort and motivation. We conclude by recommending the integration of non-speech auditory tests, which provide information about aspects of auditory health that have reduced variability and fewer central influences than speech tests, in parallel with the traditional audiogram and speech-based assessments.
Collapse
Affiliation(s)
- Douglas S Brungart
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - LaGuinn P Sherlock
- Hearing Conservation and Readiness Branch, U.S. Army Public Health Center, E1570 8977 Sibert Road, Aberdeen Proving Ground, Maryland 21010, USA
| | - Stefanie E Kuchinsky
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Trevor T Perry
- Hearing Conservation and Readiness Branch, U.S. Army Public Health Center, E1570 8977 Sibert Road, Aberdeen Proving Ground, Maryland 21010, USA
| | - Rebecca E Bieber
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Ken W Grant
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| | - Joshua G W Bernstein
- Audiology and Speech Pathology Center, Walter Reed National Military Medical Center, Building 19, Floor 5, 4954 North Palmer Road, Bethesda, Maryland 20889, USA
| |
Collapse
|
16
|
Goehring T, Monaghan J. Helping People Hear Better with "Smart" Hearing Devices. FRONTIERS FOR YOUNG MINDS 2022; 10:703643. [PMID: 35855497 PMCID: PMC7613069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Millions of people around the world have difficulty hearing. Hearing aids and cochlear implants help people hear better, especially in quiet places. Unfortunately, these devices do not always help in noisy situations like busy classrooms or restaurants. This means that a person with hearing loss may struggle to follow a conversation with friends or family and may avoid going out. We used methods from the field of artificial intelligence to develop "smart" hearing aids and cochlear implants that can get rid of background noise. We play many different sounds into a computer program, which learns to pick out the speech sounds and filter out unwanted background noises. Once the computer program has been trained, it is then tested on new examples of noisy speech and can be incorporated into hearing aids or cochlear implants. These "smart" approaches can help people with hearing loss understand speech better in noisy situations.
Collapse
Affiliation(s)
- Tobias Goehring
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, United Kingdom
| | | |
Collapse
|
17
|
Moore BCJ. Listening to Music Through Hearing Aids: Potential Lessons for Cochlear Implants. Trends Hear 2022; 26:23312165211072969. [PMID: 35179052 PMCID: PMC8859663 DOI: 10.1177/23312165211072969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Some of the problems experienced by users of hearing aids (HAs) when listening to music are relevant to cochlear implants (CIs). One problem is related to the high peak levels (up to 120 dB SPL) that occur in live music. Some HAs and CIs overload at such levels, because of the limited dynamic range of the microphones and analogue-to-digital converters (ADCs), leading to perceived distortion. Potential solutions are to use 24-bit ADCs or to include an adjustable gain between the microphones and the ADCs. A related problem is how to squeeze the wide dynamic range of music into the limited dynamic range of the user, which can be only 6-20 dB for CI users. In HAs, this is usually done via multi-channel amplitude compression (automatic gain control, AGC). In CIs, a single-channel front-end AGC is applied to the broadband input signal or a control signal derived from a running average of the broadband signal level is used to control the mapping of the channel envelope magnitude to an electrical signal. This introduces several problems: (1) an intense narrowband signal (e.g. a strong bass sound) reduces the level for all frequency components, making some parts of the music harder to hear; (2) the AGC introduces cross-modulation effects that can make a steady sound (e.g. sustained strings or a sung note) appear to fluctuate in level. Potential solutions are to use several frequency channels to create slowly varying gain-control signals and to use slow-acting (or dual time-constant) AGC rather than fast-acting AGC.
Collapse
Affiliation(s)
- Brian C J Moore
- Cambridge Hearing Group, Department of Psychology, 2152University of Cambridge, Cambridge, England
| |
Collapse
|
18
|
Tseng RY, Wang TW, Fu SW, Lee CY, Tsao Y. A Study of Joint Effect on Denoising Techniques and Visual Cues to Improve Speech Intelligibility in Cochlear Implant Simulation. IEEE Trans Cogn Dev Syst 2021. [DOI: 10.1109/tcds.2020.3017042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
19
|
Pinheiro MMC, Mancini PC, Soares AD, Ribas Â, Lima DP, Cavadas M, Banhara MR, Carvalho SADS, Buzo BC. Comparison of Speech Recognition in Cochlear Implant Users with Different Speech Processors. J Am Acad Audiol 2021; 32:469-476. [PMID: 34847587 DOI: 10.1055/s-0041-1735252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
BACKGROUND Speech recognition in noisy environments is a challenge for both cochlear implant (CI) users and device manufacturers. CI manufacturers have been investing in technological innovations for processors and researching strategies to improve signal processing and signal design for better aesthetic acceptance and everyday use. PURPOSE This study aimed to compare speech recognition in CI users using off-the-ear (OTE) and behind-the-ear (BTE) processors. DESIGN A cross-sectional study was conducted with 51 CI recipients, all users of the BTE Nucleus 5 (CP810) sound processor. Speech perception performances were compared in quiet and noisy conditions using the BTE sound processor Nucleus 5 (N5) and OTE sound processor Kanso. Each participant was tested with the Brazilian-Portuguese version of the hearing in noise test using each sound processor in a randomized order. Three test conditions were analyzed with both sound processors: (i) speech level fixed at 65 decibel sound pressure level in a quiet, (ii) speech and noise at fixed levels, and (iii) adaptive speech levels with a fixed noise level. To determine the relative performance of OTE with respect to BTE, paired comparison analyses were performed. RESULTS The paired t-tests showed no significant difference between the N5 and Kanso in quiet conditions. In all noise conditions, the performance of the OTE (Kanso) sound processor was superior to that of the BTE (N5), regardless of the order in which they were used. With the speech and noise at fixed levels, a significant mean 8.1 percentage point difference was seen between Kanso (78.10%) and N5 (70.7%) in the sentence scores. CONCLUSION CI users had a lower signal-to-noise ratio and a higher percentage of sentence recognition with the OTE processor than with the BTE processor.
Collapse
Affiliation(s)
| | - Patricia Cotta Mancini
- Department of Speech-Language Pathology and Audiology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Alexandra Dezani Soares
- Centro do Deficiente Auditivo - Hospital São Paulo, Universidade Federal de São Paulo, São Paulo, Brazil
| | - Ângela Ribas
- Centro de Implante Coclear do Hospital Pequeno Príncipe, Curitiba, Paraná, Brazil
| | - Danielle Penna Lima
- Centro de Implantes Cocleares do Hospital do Coração de Natal, Natal, Brazil
| | - Marcia Cavadas
- Department of Speech-Language Pathology and Audiology, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil.,Equipe Sonora, Rio de Janeiro, Rio de Janeiro, Brazil
| | - Marcos Roberto Banhara
- Centro Especializado de Reabilitação IV do Hospital Santo Antônio/Obras Sociais Irmã Dulce, Salvador, Bahia, Brazil
| | | | | |
Collapse
|
20
|
Kang Y, Zheng N, Meng Q. Deep Learning-Based Speech Enhancement With a Loss Trading Off the Speech Distortion and the Noise Residue for Cochlear Implants. Front Med (Lausanne) 2021; 8:740123. [PMID: 34820392 PMCID: PMC8606413 DOI: 10.3389/fmed.2021.740123] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 10/04/2021] [Indexed: 11/18/2022] Open
Abstract
The cochlea plays a key role in the transmission from acoustic vibration to neural stimulation upon which the brain perceives the sound. A cochlear implant (CI) is an auditory prosthesis to replace the damaged cochlear hair cells to achieve acoustic-to-neural conversion. However, the CI is a very coarse bionic imitation of the normal cochlea. The highly resolved time-frequency-intensity information transmitted by the normal cochlea, which is vital to high-quality auditory perception such as speech perception in challenging environments, cannot be guaranteed by CIs. Although CI recipients with state-of-the-art commercial CI devices achieve good speech perception in quiet backgrounds, they usually suffer from poor speech perception in noisy environments. Therefore, noise suppression or speech enhancement (SE) is one of the most important technologies for CI. In this study, we introduce recent progress in deep learning (DL), mostly neural networks (NN)-based SE front ends to CI, and discuss how the hearing properties of the CI recipients could be utilized to optimize the DL-based SE. In particular, different loss functions are introduced to supervise the NN training, and a set of objective and subjective experiments is presented. Results verify that the CI recipients are more sensitive to the residual noise than the SE-induced speech distortion, which has been common knowledge in CI research. Furthermore, speech reception threshold (SRT) in noise tests demonstrates that the intelligibility of the denoised speech can be significantly improved when the NN is trained with a loss function bias to more noise suppression than that with equal attention on noise residue and speech distortion.
Collapse
Affiliation(s)
- Yuyong Kang
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Nengheng Zheng
- Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China.,Pengcheng Laboratory, Shenzhen, China
| | - Qinglin Meng
- Acoustics Laboratory, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| |
Collapse
|
21
|
Healy EW, Taherian H, Johnson EM, Wang D. A causal and talker-independent speaker separation/dereverberation deep learning algorithm: Cost associated with conversion to real-time capable operation. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:3976. [PMID: 34852625 PMCID: PMC8612765 DOI: 10.1121/10.0007134] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 10/19/2021] [Accepted: 10/22/2021] [Indexed: 05/20/2023]
Abstract
The fundamental requirement for real-time operation of a speech-processing algorithm is causality-that it operate without utilizing future time frames. In the present study, the performance of a fully causal deep computational auditory scene analysis algorithm was assessed. Target sentences were isolated from complex interference consisting of an interfering talker and concurrent room reverberation. The talker- and corpus/channel-independent model used Dense-UNet and temporal convolutional networks and estimated both magnitude and phase of the target speech. It was found that mean algorithm benefit was significant in every condition. Mean benefit for hearing-impaired (HI) listeners across all conditions was 46.4 percentage points. The cost of converting the algorithm to causal processing was also assessed by comparing to a prior non-causal version. Intelligibility decrements for HI and normal-hearing listeners from non-causal to causal processing were present in most but not all conditions, and these decrements were statistically significant in half of the conditions tested-those representing the greater levels of complex interference. Although a cost associated with causal processing was present in most conditions, it may be considered modest relative to the overall level of benefit.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Hassan Taherian
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric M Johnson
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
22
|
Wasmann JWA, Lanting CP, Huinck WJ, Mylanus EAM, van der Laak JWM, Govaerts PJ, Swanepoel DW, Moore DR, Barbour DL. Computational Audiology: New Approaches to Advance Hearing Health Care in the Digital Age. Ear Hear 2021; 42:1499-1507. [PMID: 33675587 PMCID: PMC8417156 DOI: 10.1097/aud.0000000000001041] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
The global digital transformation enables computational audiology for advanced clinical applications that can reduce the global burden of hearing loss. In this article, we describe emerging hearing-related artificial intelligence applications and argue for their potential to improve access, precision, and efficiency of hearing health care services. Also, we raise awareness of risks that must be addressed to enable a safe digital transformation in audiology. We envision a future where computational audiology is implemented via interoperable systems using shared data and where health care providers adopt expanded roles within a network of distributed expertise. This effort should take place in a health care system where privacy, responsibility of each stakeholder, and patients' safety and autonomy are all guarded by design.
Collapse
Affiliation(s)
- Jan-Willem A Wasmann
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, the Netherlands
| | - Cris P Lanting
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, the Netherlands
| | - Wendy J Huinck
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, the Netherlands
| | - Emmanuel A M Mylanus
- Department of Otorhinolaryngology, Donders Institute for Brain, Cognition and Behaviour, Radboud University Medical Center Nijmegen, the Netherlands
| | - Jeroen W M van der Laak
- Department of Pathology, Radboud University Medical Center Nijmegen, the Netherlands
- Center for Medical Image Science and Visualization, Linköping University, Sweden
| | | | - De Wet Swanepoel
- Department of Speech-Language Pathology and Audiology, University of Pretoria, South Africa
| | - David R Moore
- Communication Sciences Research Center, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Otolaryngology, University of Cincinnati, Cincinnati, Ohio, USA
- Manchester Centre for Audiology and Deafness, University of Manchester, Manchester, United Kingdom
| | - Dennis L Barbour
- Department of Biomedical Engineering. Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
23
|
Li LPH, Han JY, Zheng WZ, Huang RJ, Lai YH. Improved Environment-Aware-Based Noise Reduction System for Cochlear Implant Users Based on a Knowledge Transfer Approach: Development and Usability Study. J Med Internet Res 2021; 23:e25460. [PMID: 34709193 PMCID: PMC8587190 DOI: 10.2196/25460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 02/11/2021] [Accepted: 04/27/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Cochlear implant technology is a well-known approach to help deaf individuals hear speech again and can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning-based noise reduction, such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. OBJECTIVE Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T; examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests; and investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. METHODS The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility and perceptual evaluation of speech quality scores as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled 10 cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. RESULTS The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding short-time objective intelligibility and perceptual evaluation of speech quality scores. Therefore, the parameters of layer 3 in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only required a quarter of the number of parameters compared to the NC+DDAE. CONCLUSIONS This study demonstrated that knowledge transfer technology can help reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users.
Collapse
Affiliation(s)
- Lieber Po-Hung Li
- Department of Otolaryngology, Cheng Hsin General Hospital, Taipei, Taiwan.,Faculty of Medicine, Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei, Taiwan.,Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan.,Department of Speech Language Pathology and Audiology, College of Health Technology, National Taipei University of Nursing and Health Sciences, Taipei, Taiwan
| | - Ji-Yan Han
- Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Wei-Zhong Zheng
- Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ren-Jie Huang
- Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| | - Ying-Hui Lai
- Department of Biomedical Engineering, National Yang Ming Chiao Tung University, Taipei, Taiwan
| |
Collapse
|
24
|
Harnessing the power of artificial intelligence to transform hearing healthcare and research. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00394-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
25
|
Healy EW, Johnson EM, Delfarah M, Krishnagiri DS, Sevich VA, Taherian H, Wang D. Deep learning based speaker separation and dereverberation can generalize across different languages to improve intelligibility. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 150:2526. [PMID: 34717521 PMCID: PMC8637753 DOI: 10.1121/10.0006565] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 09/16/2021] [Accepted: 09/16/2021] [Indexed: 05/20/2023]
Abstract
The practical efficacy of deep learning based speaker separation and/or dereverberation hinges on its ability to generalize to conditions not employed during neural network training. The current study was designed to assess the ability to generalize across extremely different training versus test environments. Training and testing were performed using different languages having no known common ancestry and correspondingly large linguistic differences-English for training and Mandarin for testing. Additional generalizations included untrained speech corpus/recording channel, target-to-interferer energy ratios, reverberation room impulse responses, and test talkers. A deep computational auditory scene analysis algorithm, employing complex time-frequency masking to estimate both magnitude and phase, was used to segregate two concurrent talkers and simultaneously remove large amounts of room reverberation to increase the intelligibility of a target talker. Significant intelligibility improvements were observed for the normal-hearing listeners in every condition. Benefit averaged 43.5% points across conditions and was comparable to that obtained when training and testing were performed both in English. Benefit is projected to be considerably larger for individuals with hearing impairment. It is concluded that a properly designed and trained deep speaker separation/dereverberation network can be capable of generalization across vastly different acoustic environments that include different languages.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric M Johnson
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Masood Delfarah
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - Divya S Krishnagiri
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Victoria A Sevich
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Hassan Taherian
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
26
|
Carlyon RP, Goehring T. Cochlear Implant Research and Development in the Twenty-first Century: A Critical Update. J Assoc Res Otolaryngol 2021; 22:481-508. [PMID: 34432222 PMCID: PMC8476711 DOI: 10.1007/s10162-021-00811-5] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 08/02/2021] [Indexed: 12/22/2022] Open
Abstract
Cochlear implants (CIs) are the world's most successful sensory prosthesis and have been the subject of intense research and development in recent decades. We critically review the progress in CI research, and its success in improving patient outcomes, from the turn of the century to the present day. The review focuses on the processing, stimulation, and audiological methods that have been used to try to improve speech perception by human CI listeners, and on fundamental new insights in the response of the auditory system to electrical stimulation. The introduction of directional microphones and of new noise reduction and pre-processing algorithms has produced robust and sometimes substantial improvements. Novel speech-processing algorithms, the use of current-focusing methods, and individualised (patient-by-patient) deactivation of subsets of electrodes have produced more modest improvements. We argue that incremental advances have and will continue to be made, that collectively these may substantially improve patient outcomes, but that the modest size of each individual advance will require greater attention to experimental design and power. We also briefly discuss the potential and limitations of promising technologies that are currently being developed in animal models, and suggest strategies for researchers to collectively maximise the potential of CIs to improve hearing in a wide range of listening situations.
Collapse
Affiliation(s)
- Robert P Carlyon
- Cambridge Hearing Group, MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK.
| | - Tobias Goehring
- Cambridge Hearing Group, MRC Cognition & Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
27
|
Fletcher MD, Verschuur CA. Electro-Haptic Stimulation: A New Approach for Improving Cochlear-Implant Listening. Front Neurosci 2021; 15:581414. [PMID: 34177440 PMCID: PMC8219940 DOI: 10.3389/fnins.2021.581414] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 04/29/2021] [Indexed: 12/12/2022] Open
Abstract
Cochlear implants (CIs) have been remarkably successful at restoring speech perception for severely to profoundly deaf individuals. Despite their success, several limitations remain, particularly in CI users' ability to understand speech in noisy environments, locate sound sources, and enjoy music. A new multimodal approach has been proposed that uses haptic stimulation to provide sound information that is poorly transmitted by the implant. This augmenting of the electrical CI signal with haptic stimulation (electro-haptic stimulation; EHS) has been shown to improve speech-in-noise performance and sound localization in CI users. There is also evidence that it could enhance music perception. We review the evidence of EHS enhancement of CI listening and discuss key areas where further research is required. These include understanding the neural basis of EHS enhancement, understanding the effectiveness of EHS across different clinical populations, and the optimization of signal-processing strategies. We also discuss the significant potential for a new generation of haptic neuroprosthetic devices to aid those who cannot access hearing-assistive technology, either because of biomedical or healthcare-access issues. While significant further research and development is required, we conclude that EHS represents a promising new approach that could, in the near future, offer a non-invasive, inexpensive means of substantially improving clinical outcomes for hearing-impaired individuals.
Collapse
Affiliation(s)
- Mark D. Fletcher
- Faculty of Engineering and Physical Sciences, University of Southampton Auditory Implant Service, University of Southampton, Southampton, United Kingdom
- Faculty of Engineering and Physical Sciences, Institute of Sound and Vibration Research, University of Southampton, Southampton, United Kingdom
| | - Carl A. Verschuur
- Faculty of Engineering and Physical Sciences, University of Southampton Auditory Implant Service, University of Southampton, Southampton, United Kingdom
| |
Collapse
|
28
|
Chu K, Collins L, Mainsah B. A CAUSAL DEEP LEARNING FRAMEWORK FOR CLASSIFYING PHONEMES IN COCHLEAR IMPLANTS. PROCEEDINGS OF THE ... IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING. ICASSP (CONFERENCE) 2021; 2021:6498-6502. [PMID: 34512195 PMCID: PMC8425961 DOI: 10.1109/icassp39728.2021.9413986] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Speech intelligibility in cochlear implant (CI) users degrades considerably in listening environments with reverberation and noise. Previous research in automatic speech recognition (ASR) has shown that phoneme-based speech enhancement algorithms improve ASR system performance in reverberant environments as compared to a global model. However, phoneme-specific speech processing has not yet been implemented in CIs. In this paper, we propose a causal deep learning framework for classifying phonemes using features extracted at the time-frequency resolution of a CI processor. We trained and tested long short-term memory networks to classify phonemes and manner of articulation in anechoic and reverberant conditions. The results showed that CI-inspired features provide slightly higher levels of performance than traditional ASR features. To the best of our knowledge, this study is the first to provide a classification framework with the potential to categorize phonetic units in real-time in a CI.
Collapse
Affiliation(s)
- Kevin Chu
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
| | - Leslie Collins
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
| | - Boyla Mainsah
- Department of Electrical and Computer Engineering, Duke University, Durham, NC, USA
| |
Collapse
|
29
|
Healy EW, Tan K, Johnson EM, Wang D. An effectively causal deep learning algorithm to increase intelligibility in untrained noises for hearing-impaired listeners. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2021; 149:3943. [PMID: 34241481 PMCID: PMC8186949 DOI: 10.1121/10.0005089] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Revised: 05/09/2021] [Accepted: 05/10/2021] [Indexed: 05/20/2023]
Abstract
Real-time operation is critical for noise reduction in hearing technology. The essential requirement of real-time operation is causality-that an algorithm does not use future time-frame information and, instead, completes its operation by the end of the current time frame. This requirement is extended currently through the concept of "effectively causal," in which future time-frame information within the brief delay tolerance of the human speech-perception mechanism is used. Effectively causal deep learning was used to separate speech from background noise and improve intelligibility for hearing-impaired listeners. A single-microphone, gated convolutional recurrent network was used to perform complex spectral mapping. By estimating both the real and imaginary parts of the noise-free speech, both the magnitude and phase of the estimated noise-free speech were obtained. The deep neural network was trained using a large set of noises and tested using complex noises not employed during training. Significant algorithm benefit was observed in every condition, which was largest for those with the greatest hearing loss. Allowable delays across different communication settings are reviewed and assessed. The current work demonstrates that effectively causal deep learning can significantly improve intelligibility for one of the largest populations of need in challenging conditions involving untrained background noises.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - Ke Tan
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric M Johnson
- Department of Speech and Hearing Science, The Ohio State University, Columbus, Ohio 43210, USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
30
|
Fletcher MD, Zgheib J, Perry SW. Sensitivity to Haptic Sound-Localization Cues at Different Body Locations. SENSORS (BASEL, SWITZERLAND) 2021; 21:3770. [PMID: 34071729 PMCID: PMC8198414 DOI: 10.3390/s21113770] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Revised: 05/21/2021] [Accepted: 05/24/2021] [Indexed: 01/09/2023]
Abstract
Cochlear implants (CIs) recover hearing in severely to profoundly hearing-impaired people by electrically stimulating the cochlea. While they are extremely effective, spatial hearing is typically severely limited. Recent studies have shown that haptic stimulation can supplement the electrical CI signal (electro-haptic stimulation) and substantially improve sound localization. In haptic sound-localization studies, the signal is extracted from the audio received by behind-the-ear devices and delivered to each wrist. Localization is achieved using tactile intensity differences (TIDs) across the wrists, which match sound intensity differences across the ears (a key sound localization cue). The current study established sensitivity to across-limb TIDs at three candidate locations for a wearable haptic device, namely: the lower tricep and the palmar and dorsal wrist. At all locations, TID sensitivity was similar to the sensitivity to across-ear intensity differences for normal-hearing listeners. This suggests that greater haptic sound-localization accuracy than previously shown can be achieved. The dynamic range was also measured and far exceeded that available through electrical CI stimulation for all of the locations, suggesting that haptic stimulation could provide additional sound-intensity information. These results indicate that an effective haptic aid could be deployed for any of the candidate locations, and could offer a low-cost, non-invasive means of improving outcomes for hearing-impaired listeners.
Collapse
Affiliation(s)
- Mark D. Fletcher
- Faculty of Engineering and Physical Sciences, Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton SO17 1BJ, UK
- University of Southampton Auditory Implant Service, Faculty of Engineering and Physical Sciences, University of Southampton, University Road, Southampton SO17 1BJ, UK;
| | - Jana Zgheib
- University of Southampton Auditory Implant Service, Faculty of Engineering and Physical Sciences, University of Southampton, University Road, Southampton SO17 1BJ, UK;
| | - Samuel W. Perry
- Faculty of Engineering and Physical Sciences, Institute of Sound and Vibration Research, University of Southampton, University Road, Southampton SO17 1BJ, UK
- University of Southampton Auditory Implant Service, Faculty of Engineering and Physical Sciences, University of Southampton, University Road, Southampton SO17 1BJ, UK;
| |
Collapse
|
31
|
The effect of increased channel interaction on speech perception with cochlear implants. Sci Rep 2021; 11:10383. [PMID: 34001987 PMCID: PMC8128897 DOI: 10.1038/s41598-021-89932-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/29/2021] [Indexed: 11/30/2022] Open
Abstract
Cochlear implants (CIs) are neuroprostheses that partially restore hearing for people with severe-to-profound hearing loss. While CIs can provide good speech perception in quiet listening situations for many, they fail to do so in environments with interfering sounds for most listeners. Previous research suggests that this is due to detrimental interaction effects between CI electrode channels, limiting their function to convey frequency-specific information, but evidence is still scarce. In this study, an experimental manipulation called spectral blurring was used to increase channel interaction in CI listeners using Advanced Bionics devices with HiFocus 1J and MS electrode arrays to directly investigate its causal effect on speech perception. Instead of using a single electrode per channel as in standard CI processing, spectral blurring used up to 6 electrodes per channel simultaneously to increase the overlap between adjacent frequency channels as would occur in cases with severe channel interaction. Results demonstrated that this manipulation significantly degraded CI speech perception in quiet by 15% and speech reception thresholds in babble noise by 5 dB when all channels were blurred by a factor of 6. Importantly, when channel interaction was increased just on a subset of electrodes, speech scores were mostly unaffected and were only significantly degraded when the 5 most apical channels were blurred. These apical channels convey information up to 1 kHz at the apical end of the electrode array and are typically located at angular insertion depths of about 250 up to 500°. These results confirm and extend earlier findings indicating that CI speech perception may not benefit from deactivating individual channels along the array and that efforts should instead be directed towards reducing channel interaction per se and in particular for the most-apical electrodes. Hereby, causal methods such as spectral blurring could be used in future research to control channel interaction effects within listeners for evaluating compensation strategies.
Collapse
|
32
|
Archer-Boyd AW, Goehring T, Carlyon RP. The Effect of Free-Field Presentation and Processing Strategy on a Measure of Spectro-Temporal Processing by Cochlear-Implant Listeners. Trends Hear 2021; 24:2331216520964281. [PMID: 33305696 PMCID: PMC7734493 DOI: 10.1177/2331216520964281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The STRIPES (Spectro-Temporal Ripple for Investigating Processor EffectivenesS) test is a psychophysical test of spectro-temporal resolution developed for cochlear-implant (CI) listeners. Previously, the test has been strictly controlled to minimize the introduction of extraneous, nonspectro-temporal cues. Here, the effect of relaxing many of those controls was investigated to ascertain the generalizability of the STRIPES test. Preemphasis compensation was removed from the STRIPES stimuli, the test was presented over a loudspeaker at a level similar to conversational speech and above the automatic gain control threshold of the CI processor, and listeners were tested using the everyday setting of their clinical devices. There was no significant difference in STRIPES thresholds measured across conditions for the 10 CI listeners tested. One listener obtained higher (better) thresholds when listening with their clinical processor. An analysis of longitudinal results showed excellent test–retest reliability of STRIPES over multiple listening sessions with similar conditions. Overall, the results show that the STRIPES test is robust to extraneous cues, and that thresholds are reliable over time. It is sufficiently robust for use with different processing strategies, free-field presentation, and in nonresearch settings.
Collapse
Affiliation(s)
- Alan W Archer-Boyd
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, 2152University of Cambridge, Cambridge, United Kingdom
| | - Tobias Goehring
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, 2152University of Cambridge, Cambridge, United Kingdom
| | - Robert P Carlyon
- Cambridge Hearing Group, MRC Cognition and Brain Sciences Unit, 2152University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
33
|
Hosseini M, Rodriguez G, Guo H, Lim HH, Plourde E. The effect of input noises on the activity of auditory neurons using GLM-based metrics. J Neural Eng 2021; 18. [PMID: 33626516 DOI: 10.1088/1741-2552/abe979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 02/24/2021] [Indexed: 11/11/2022]
Abstract
CONTEXT The auditory system is extremely efficient in extracting auditory information in the presence of background noise. However, people with auditory implants have a hard time understanding speech in noisy conditions. Understanding the mechanisms of perception in noise could lead to better stimulation or preprocessing strategies for such implants. OBJECTIVE The neural mechanisms related to the processing of background noise, especially in the inferior colliculus (IC) where the auditory midbrain implant is located, are still not well understood. We thus wish to investigate if there is a difference in the activity of neurons in the IC when presenting noisy vocalizations with different types of noise (stationary vs. non-stationary), input signal-to-noise ratios (SNR) and signal levels. APPROACH We developed novel metrics based on a generalized linear model (GLM) to investigate the effect of a given input noise on neural activity. We used these metrics to analyze neural data recorded from the IC in ketamine-anesthetized female Hartley guinea pigs while presenting noisy vocalizations. MAIN RESULTS We found that non-stationary noise clearly contributes to the multi-unit neural activity in the IC by causing excitation, regardless of the SNR, input level or vocalization type. However, when presenting white or natural stationary noises, a great diversity of responses was observed for the different conditions, where the multi-unit activity of some sites was affected by the presence of noise and the activity of others was not. SIGNIFICANCE The GLM-based metrics allowed the identification of a clear distinction between the effect of white or natural stationary noises and that of non-stationary noise on the multi-unit activity in the IC. This had not been observed before and indicates that the so-called noise invariance in the IC is dependent on the input noisy conditions. This could suggest different preprocessing or stimulation approaches for auditory midbrain implants depending on the noisy conditions.
Collapse
Affiliation(s)
- Maryam Hosseini
- Electrical engineering, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Quebec, J1K 2R1, CANADA
| | - Gerardo Rodriguez
- Biomedical engineering, University of Minnesota, 312 Church St SE, Minneapolis, Minnesota, 55455, UNITED STATES
| | - Hongsun Guo
- Biomedical engineering, University of Minnesota, 312 Church St SE, Minneapolis, Minnesota, 55455, UNITED STATES
| | - Hubert H Lim
- Department of Biomedical Engineering, University of Minnesota, 7-105 Hasselmo Hall, 312 Church Street SE, Minneapolis, MN 55455, USA, Minneapolis, Minnesota, 55455, UNITED STATES
| | - Eric Plourde
- Electrical engineering, Université de Sherbrooke, 2500 Boulevard de l'Université, Sherbrooke, Quebec, J1K 2R1, CANADA
| |
Collapse
|
34
|
Keshavarzi M, Reichenbach T, Moore BCJ. Transient Noise Reduction Using a Deep Recurrent Neural Network: Effects on Subjective Speech Intelligibility and Listening Comfort. Trends Hear 2021; 25:23312165211041475. [PMID: 34606381 PMCID: PMC8642050 DOI: 10.1177/23312165211041475] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/04/2021] [Accepted: 08/04/2021] [Indexed: 11/17/2022] Open
Abstract
A deep recurrent neural network (RNN) for reducing transient sounds was developed and its effects on subjective speech intelligibility and listening comfort were investigated. The RNN was trained using sentences spoken with different accents and corrupted by transient sounds, using the clean speech as the target. It was tested using sentences spoken by unseen talkers and corrupted by unseen transient sounds. A paired-comparison procedure was used to compare all possible combinations of three conditions for subjective speech intelligibility and listening comfort for two relative levels of the transients. The conditions were: no processing (NP); processing using the RNN; and processing using a multi-channel transient reduction method (MCTR). Ten participants with normal hearing and ten with mild-to-moderate hearing loss participated. For the latter, frequency-dependent linear amplification was applied to all stimuli to compensate for individual audibility losses. For the normal-hearing participants, processing using the RNN was significantly preferred over that for NP for subjective intelligibility and comfort, processing using the RNN was significantly preferred over that for MCTR for subjective intelligibility, and processing using the MCTR was significantly preferred over that for NP for comfort for the higher transient level only. For the hearing-impaired participants, processing using the RNN was significantly preferred over that for NP for both subjective intelligibility and comfort, processing using the RNN was significantly preferred over that for MCTR for comfort, and processing using the MCTR was significantly preferred over that for NP for comfort.
Collapse
Affiliation(s)
- Mahmoud Keshavarzi
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, London, UK
- Centre for Neuroscience in Education, Department of Psychology, University of Cambridge, Cambridge, UK
- Cambridge Hearing Group, Department of Psychology, University of Cambridge, Cambridge, UK
| | - Tobias Reichenbach
- Department of Bioengineering and Centre for Neurotechnology, Imperial College London, London, UK
- Department Artificial Intelligence in Biomedical Engineering,
Friedrich-Alexander-University Erlangen- Nuremberg, Erlangen, Germany
| | - Brian C. J. Moore
- Cambridge Hearing Group, Department of Psychology, University of Cambridge, Cambridge, UK
| |
Collapse
|
35
|
Automated Detection of Sleep Stages Using Deep Learning Techniques: A Systematic Review of the Last Decade (2010–2020). APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10248963] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Sleep is vital for one’s general well-being, but it is often neglected, which has led to an increase in sleep disorders worldwide. Indicators of sleep disorders, such as sleep interruptions, extreme daytime drowsiness, or snoring, can be detected with sleep analysis. However, sleep analysis relies on visuals conducted by experts, and is susceptible to inter- and intra-observer variabilities. One way to overcome these limitations is to support experts with a programmed diagnostic tool (PDT) based on artificial intelligence for timely detection of sleep disturbances. Artificial intelligence technology, such as deep learning (DL), ensures that data are fully utilized with low to no information loss during training. This paper provides a comprehensive review of 36 studies, published between March 2013 and August 2020, which employed DL models to analyze overnight polysomnogram (PSG) recordings for the classification of sleep stages. Our analysis shows that more than half of the studies employed convolutional neural networks (CNNs) on electroencephalography (EEG) recordings for sleep stage classification and achieved high performance. Our study also underscores that CNN models, particularly one-dimensional CNN models, are advantageous in yielding higher accuracies for classification. More importantly, we noticed that EEG alone is not sufficient to achieve robust classification results. Future automated detection systems should consider other PSG recordings, such as electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG) signals, along with input from human experts, to achieve the required sleep stage classification robustness. Hence, for DL methods to be fully realized as a practical PDT for sleep stage scoring in clinical applications, inclusion of other PSG recordings, besides EEG recordings, is necessary. In this respect, our report includes methods published in the last decade, underscoring the use of DL models with other PSG recordings, for scoring of sleep stages.
Collapse
|
36
|
Wang NYH, Wang HLS, Wang TW, Fu SW, Lu X, Wang HM, Tsao Y. Improving the Intelligibility of Speech for Simulated Electric and Acoustic Stimulation Using Fully Convolutional Neural Networks. IEEE Trans Neural Syst Rehabil Eng 2020; 29:184-195. [PMID: 33275585 DOI: 10.1109/tnsre.2020.3042655] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Combined electric and acoustic stimulation (EAS) has demonstrated better speech recognition than conventional cochlear implant (CI) and yielded satisfactory performance under quiet conditions. However, when noise signals are involved, both the electric signal and the acoustic signal may be distorted, thereby resulting in poor recognition performance. To suppress noise effects, speech enhancement (SE) is a necessary unit in EAS devices. Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in short) has received increasing attention due to its simple structure and effectiveness of restoring clean speech signals from noisy counterparts. With evidence showing the benefits of FCN(S) for normal speech, this study sets out to assess its ability to improve the intelligibility of EAS simulated speech. Objective evaluations and listening tests were conducted to examine the performance of FCN(S) in improving the speech intelligibility of normal and vocoded speech in noisy environments. The experimental results show that, compared with the traditional minimum-mean square-error SE method and the deep denoising autoencoder SE method, FCN(S) can obtain better gain in the speech intelligibility for normal as well as vocoded speech. This study, being the first to evaluate deep learning SE approaches for EAS, confirms that FCN(S) is an effective SE approach that may potentially be integrated into an EAS processor to benefit users in noisy environments.
Collapse
|
37
|
Shankar N, Bhat GS, Panahi IMS. Real-time single-channel deep neural network-based speech enhancement on edge devices. INTERSPEECH 2020; 2020:3281-3285. [PMID: 33898608 PMCID: PMC8064406 DOI: 10.21437/interspeech.2020-1901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
In this paper, we present a deep neural network architecture comprising of both convolutional neural network (CNN) and recurrent neural network (RNN) layers for real-time single-channel speech enhancement (SE). The proposed neural network model focuses on enhancing the noisy speech magnitude spectrum on a frame-by-frame process. The developed model is implemented on the smartphone (edge device), to demonstrate the real-time usability of the proposed method. Perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI) test results are used to compare the proposed algorithm to previously published conventional and deep learning-based SE methods. Subjective ratings show the performance improvement of the proposed model over the other baseline SE methods.
Collapse
Affiliation(s)
- Nikhil Shankar
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson, TX-75080, USA
| | - Gautam Shreedhar Bhat
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson, TX-75080, USA
| | - Issa M S Panahi
- Department of Electrical and Computer Engineering, The University of Texas at Dallas, Richardson, TX-75080, USA
| |
Collapse
|
38
|
Goehring T, Arenberg JG, Carlyon RP. Using Spectral Blurring to Assess Effects of Channel Interaction on Speech-in-Noise Perception with Cochlear Implants. J Assoc Res Otolaryngol 2020; 21:353-371. [PMID: 32519088 PMCID: PMC7445227 DOI: 10.1007/s10162-020-00758-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 05/21/2020] [Indexed: 01/07/2023] Open
Abstract
Cochlear implant (CI) listeners struggle to understand speech in background noise. Interactions between electrode channels due to current spread increase the masking of speech by noise and lead to difficulties with speech perception. Strategies that reduce channel interaction therefore have the potential to improve speech-in-noise perception by CI listeners, but previous results have been mixed. We investigated the effects of channel interaction on speech-in-noise perception and its association with spectro-temporal acuity in a listening study with 12 experienced CI users. Instead of attempting to reduce channel interaction, we introduced spectral blurring to simulate some of the effects of channel interaction by adjusting the overlap between electrode channels at the input level of the analysis filters or at the output by using several simultaneously stimulated electrodes per channel. We measured speech reception thresholds in noise as a function of the amount of blurring applied to either all 15 electrode channels or to 5 evenly spaced channels. Performance remained roughly constant as the amount of blurring applied to all channels increased up to some knee point, above which it deteriorated. This knee point differed across listeners in a way that correlated with performance on a non-speech spectro-temporal task, and is proposed here as an individual measure of channel interaction. Surprisingly, even extreme amounts of blurring applied to 5 channels did not affect performance. The effects on speech perception in noise were similar for blurring at the input and at the output of the CI. The results are in line with the assumption that experienced CI users can make use of a limited number of effective channels of information and tolerate some deviations from their everyday settings when identifying speech in the presence of a masker. Furthermore, these findings may explain the mixed results by strategies that optimized or deactivated a small number of electrodes evenly distributed along the array by showing that blurring or deactivating one-third of the electrodes did not harm speech-in-noise performance.
Collapse
Affiliation(s)
- Tobias Goehring
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK.
| | - Julie G Arenberg
- Massachusetts Eye and Ear, Harvard Medical School, 243 Charles St, Boston, MA, 02114, USA
| | - Robert P Carlyon
- Cambridge Hearing Group, Medical Research Council Cognition and Brain Sciences Unit, University of Cambridge, 15 Chaucer Road, Cambridge, CB2 7EF, UK
| |
Collapse
|
39
|
Healy EW, Johnson EM, Delfarah M, Wang D. A talker-independent deep learning algorithm to increase intelligibility for hearing-impaired listeners in reverberant competing talker conditions. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2020; 147:4106. [PMID: 32611178 PMCID: PMC7314568 DOI: 10.1121/10.0001441] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 05/28/2020] [Accepted: 05/29/2020] [Indexed: 05/20/2023]
Abstract
Deep learning based speech separation or noise reduction needs to generalize to voices not encountered during training and to operate under multiple corruptions. The current study provides such a demonstration for hearing-impaired (HI) listeners. Sentence intelligibility was assessed under conditions of a single interfering talker and substantial amounts of room reverberation. A talker-independent deep computational auditory scene analysis (CASA) algorithm was employed, in which talkers were separated and dereverberated in each time frame (simultaneous grouping stage), then the separated frames were organized to form two streams (sequential grouping stage). The deep neural networks consisted of specialized convolutional neural networks, one based on U-Net and the other a temporal convolutional network. It was found that every HI (and normal-hearing, NH) listener received algorithm benefit in every condition. Benefit averaged across all conditions ranged from 52 to 76 percentage points for individual HI listeners and averaged 65 points. Further, processed HI intelligibility significantly exceeded unprocessed NH intelligibility. Although the current utterance-based model was not implemented as a real-time system, a perspective on this important issue is provided. It is concluded that deep CASA represents a powerful framework capable of producing large increases in HI intelligibility for potentially any two voices.
Collapse
Affiliation(s)
- Eric W Healy
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| | - Eric M Johnson
- Department of Speech and Hearing Science, and Center for Cognitive and Brain Sciences, The Ohio State University, Columbus, Ohio 43210, USA
| | - Masood Delfarah
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| | - DeLiang Wang
- Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio 43210, USA
| |
Collapse
|
40
|
Zhou H, Wang N, Zheng N, Yu G, Meng Q. A New Approach for Noise Suppression in Cochlear Implants: A Single-Channel Noise Reduction Algorithm. Front Neurosci 2020; 14:301. [PMID: 32372902 PMCID: PMC7186595 DOI: 10.3389/fnins.2020.00301] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2019] [Accepted: 03/16/2020] [Indexed: 12/11/2022] Open
Abstract
The cochlea “translates” the in-air vibrational acoustic “language” into the spikes of neural “language” that are then transmitted to the brain for auditory understanding and/or perception. During this intracochlear “translation” process, high resolution in time–frequency–intensity domains guarantees the high quality of the input neural information for the brain, which is vital for our outstanding hearing abilities. However, cochlear implants (CIs) have coarse artificial coding and interfaces, and CI users experience more challenges in common acoustic environments than their normal-hearing (NH) peers. Noise from sound sources that a listener has no interest in may be neglected by NH listeners, but they may distract a CI user. We discuss the CI noise-suppression techniques and introduce noise management for a new implant system. The monaural signal-to-noise ratio estimation-based noise suppression algorithm “eVoice,” which is incorporated in the processors of Nurotron® EnduroTM, was evaluated in two speech perception experiments. The results show that speech intelligibility in stationary speech-shaped noise can be significantly improved with eVoice. Similar results have been observed in other CI devices with single-channel noise reduction techniques. Specifically, the mean speech reception threshold decrease in the present study was 2.2 dB. The Nurotron society already has more than 10,000 users, and eVoice is a start for noise management in the new system. Future steps on non-stationary-noise suppression, spatial-source separation, bilateral hearing, microphone configuration, and environment specification are warranted. The existing evidence, including our research, suggests that noise-suppression techniques should be applied in CI systems. The artificial hearing of CI listeners requires more advanced signal processing techniques to reduce brain effort and increase intelligibility in noisy settings.
Collapse
Affiliation(s)
- Huali Zhou
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| | | | - Nengheng Zheng
- The Guangdong Key Laboratory of Intelligent Information Processing, College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
| | - Guangzheng Yu
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| | - Qinglin Meng
- Acoustics Lab, School of Physics and Optoelectronics, South China University of Technology, Guangzhou, China
| |
Collapse
|
41
|
H Y V, M A A. Improving speech recognition using bionic wavelet features. AIMS ELECTRONICS AND ELECTRICAL ENGINEERING 2020. [DOI: 10.3934/electreng.2020.2.200] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
42
|
Implementation of Artificial Intelligence for Classification of Frogs in Bioacoustics. Symmetry (Basel) 2019. [DOI: 10.3390/sym11121454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
This research presents the implementation of artificial intelligence (AI) for classification of frogs in symmetry of the bioacoustics spectral by using the feedforward neural network approach (FNNA) and support vector machine (SVM). Recently, the symmetry concept has been applied in physics, and in mathematics to help make mathematical models tractable to achieve the best learning performance. Owing to the symmetry of the bioacoustics spectral, feature extraction can be achieved by integrating the techniques of Mel-scale frequency cepstral coefficient (MFCC) and mentioned machine learning algorithms, such as SVM, neural network, and so on. At the beginning, the raw data information for our experiment is taken from a website which collects many kinds of frog sounds. This in fact saves us collecting the raw data by using a digital signal processing technique. The generally proposed system detects bioacoustic features by using the microphone sensor to record the sounds of different frogs. The data acquisition system uses an embedded controller and a dynamic signal module for making high-accuracy measurements. With regard to bioacoustic features, they are filtered through the MFCC algorithm. As the filtering process is finished, all values from ceptrum signals are collected to form the datasets. For classification and identification of frogs, we adopt the multi-layer FNNA algorithm in machine learning and the results are compared with those obtained by the SVM method at the same time. Additionally, two optimizer functions in neural network include: scaled conjugate gradient (SCG) and gradient descent adaptive learning rate (GDA). Both optimization methods are used to evaluate the classification results from the feature datasets in model training. Also, calculation results from the general central processing unit (CPU) and Nvidia graphics processing unit (GPU) processors are evaluated and discussed. The effectiveness of the experimental system on the filtered feature datasets is classified by using the FNNA and the SVM scheme. The expected experimental results of the identification with respect to different symmetry bioacoustic features of fifteen frogs are obtained and finally distinguished.
Collapse
|