1
|
Fatehifar M, Schlittenlacher J, Almufarrij I, Wong D, Cootes T, Munro KJ. Applications of automatic speech recognition and text-to-speech technologies for hearing assessment: a scoping review. Int J Audiol 2024:1-12. [PMID: 39530742 DOI: 10.1080/14992027.2024.2422390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 10/18/2024] [Accepted: 10/23/2024] [Indexed: 11/16/2024]
Abstract
OBJECTIVE Exploring applications of automatic speech recognition and text-to-speech technologies in hearing assessment and evaluations of hearing aids. DESIGN Review protocol was registered at the INPLASY database and was performed following the PRISMA scoping review guidelines. A search in ten databases was conducted in January 2023 and updated in June 2024. STUDY SAMPLE Studies that used automatic speech recognition or text-to-speech to assess measures of hearing ability (e.g. speech reception threshold), or to configure hearing aids were retrieved. Of the 2942 records found, 28 met the inclusion criteria. RESULTS The results indicated that text-to-speech could effectively replace recorded stimuli in speech intelligibility tests, requiring less effort for experimenters, without negatively impacting outcomes (n = 5). Automatic speech recognition captured verbal responses accurately, allowing for reliable speech reception threshold measurements without human supervision (n = 7). Moreover, automatic speech recognition was employed to simulate participants' hearing, with high correlations between simulated and empirical data (n = 14). Finally, automatic speech recognition was used to optimise hearing aid configurations, leading to higher speech intelligibility for wearers compared to the original configuration (n = 3). CONCLUSIONS There is the potential for automatic speech recognition and text-to-speech systems to enhance accessibility of, and efficiency in, hearing assessments, offering unsupervised testing options, and facilitating hearing aid personalisation.
Collapse
Affiliation(s)
- Mohsen Fatehifar
- Manchester Centre for Audiology and Deafness (ManCAD), School of Health Sciences, University of Manchester, Manchester, UK
| | - Josef Schlittenlacher
- Department of Speech, Hearing and Phonetic Sciences, University College London, London, UK
| | - Ibrahim Almufarrij
- Manchester Centre for Audiology and Deafness (ManCAD), School of Health Sciences, University of Manchester, Manchester, UK
- Department of Rehabilitation Sciences, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia
| | - David Wong
- Leeds Institute of Health Sciences, University of Leeds, Leeds, UK
| | - Tim Cootes
- Centre for Imaging Sciences, University of Manchester, Manchester, UK
| | - Kevin J Munro
- Manchester Centre for Audiology and Deafness (ManCAD), School of Health Sciences, University of Manchester, Manchester, UK
- Manchester Academic Health Science Centre, Manchester University Hospitals NHS Foundation Trust, Manchester, UK
| |
Collapse
|
2
|
Hohmann V. The future of hearing aid technology : Can technology turn us into superheroes? Z Gerontol Geriatr 2023:10.1007/s00391-023-02179-y. [PMID: 37103645 DOI: 10.1007/s00391-023-02179-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 03/02/2023] [Indexed: 04/28/2023]
Abstract
BACKGROUND Hearing aid technology has proven to be successful in the rehabilitation of hearing loss, but its performance is still limited in difficult everyday conditions characterized by noise and reverberation. OBJECTIVE Introduction to the current state of hearing aid technology and presentation of the current state of research and future developments. METHODS The current literature was analyzed and several specific new developments are presented. RESULTS Both objective and subjective data from empirical studies show the limitations of the current technology. Examples of current research show the potential of machine learning-based algorithms and multimodal signal processing for improving speech processing and perception, of using virtual reality for improving hearing device fitting and of mobile health technology for improving hearing health services. CONCLUSION Hearing device technology will remain a key factor in the rehabilitation of hearing impairments. New technology, such as machine learning and multimodal signal processing, virtual reality and mobile health technology, will improve speech enhancement, individual fitting and communication training, thus providing better support for all hearing-impaired patients, including older patients with disabilities or declining cognitive skills.
Collapse
Affiliation(s)
- Volker Hohmann
- Department of Medical Physics and Acoustics, University of Oldenburg, 26111, Oldenburg, Germany.
- Hörzentrum Oldenburg gGmbH, Oldenburg, Germany.
- Cluster of Excellence Hearing4all, Oldenburg, Germany.
| |
Collapse
|
3
|
Zhang J, Wu J, Qiu Y, Song A, Li W, Li X, Liu Y. Intelligent speech technologies for transcription, disease diagnosis, and medical equipment interactive control in smart hospitals: A review. Comput Biol Med 2023; 153:106517. [PMID: 36623438 PMCID: PMC9814440 DOI: 10.1016/j.compbiomed.2022.106517] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 12/23/2022] [Accepted: 12/31/2022] [Indexed: 01/07/2023]
Abstract
The growing and aging of the world population have driven the shortage of medical resources in recent years, especially during the COVID-19 pandemic. Fortunately, the rapid development of robotics and artificial intelligence technologies help to adapt to the challenges in the healthcare field. Among them, intelligent speech technology (IST) has served doctors and patients to improve the efficiency of medical behavior and alleviate the medical burden. However, problems like noise interference in complex medical scenarios and pronunciation differences between patients and healthy people hamper the broad application of IST in hospitals. In recent years, technologies such as machine learning have developed rapidly in intelligent speech recognition, which is expected to solve these problems. This paper first introduces IST's procedure and system architecture and analyzes its application in medical scenarios. Secondly, we review existing IST applications in smart hospitals in detail, including electronic medical documentation, disease diagnosis and evaluation, and human-medical equipment interaction. In addition, we elaborate on an application case of IST in the early recognition, diagnosis, rehabilitation training, evaluation, and daily care of stroke patients. Finally, we discuss IST's limitations, challenges, and future directions in the medical field. Furthermore, we propose a novel medical voice analysis system architecture that employs active hardware, active software, and human-computer interaction to realize intelligent and evolvable speech recognition. This comprehensive review and the proposed architecture offer directions for future studies on IST and its applications in smart hospitals.
Collapse
Affiliation(s)
- Jun Zhang
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China,Corresponding author
| | - Jingyue Wu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Yiyi Qiu
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Aiguo Song
- The State Key Laboratory of Bioelectronics, School of Instrument Science and Engineering, Southeast University, Nanjing, 210096, China
| | - Weifeng Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Xin Li
- Department of Emergency Medicine, Guangdong Provincial People's Hospital, Guangdong Academy of Medical Sciences, Guangzhou, 510080, China
| | - Yecheng Liu
- Emergency Department, State Key Laboratory of Complex Severe and Rare Diseases, Peking Union Medical College Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing, 100730, China
| |
Collapse
|
4
|
Karbasi M, Kolossa D. ASR-based speech intelligibility prediction: A review. Hear Res 2022; 426:108606. [PMID: 36154977 DOI: 10.1016/j.heares.2022.108606] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 08/15/2022] [Accepted: 09/12/2022] [Indexed: 11/04/2022]
Abstract
Various types of methods and approaches are available to predict the intelligibility of speech signals, but many of these still suffer from two major problems: first, their required prior knowledge, which itself could limit the applicability and lower the objectivity of the method, and second, a low generalization capacity, e.g. across noise types, degradation conditions, and speech material. Automatic speech recognition (ASR) has been suggested as a machine-learning-based component of speech intelligibility prediction (SIP), aiming to ameliorate the shortcomings of other SIP methods. Since their first introduction, ASR-based SIP approaches have been developing at an increasingly rapid pace, were deployed in a range of contexts, and have shown promising performance in many scenarios. Our article provides an overview of this body of research. The main differences between competing methods are highlighted and their benefits are explained next to their limitations. We conclude with an outlook on future work and new related directions.
Collapse
Affiliation(s)
- Mahdie Karbasi
- Cognitive Signal Processing Group, Faculty of Electrical Engineering and Information Technology, Ruhr University Bochum, 44801, NRW, Germany.
| | - Dorothea Kolossa
- Cognitive Signal Processing Group, Faculty of Electrical Engineering and Information Technology, Ruhr University Bochum, 44801, NRW, Germany
| |
Collapse
|
5
|
Karbasi M, Zeiler S, Kolossa D. Microscopic and Blind Prediction of Speech Intelligibility: Theory and Practice. IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING 2022; 30:2141-2155. [PMID: 37007458 PMCID: PMC10065470 DOI: 10.1109/taslp.2022.3184888] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Being able to estimate speech intelligibility without the need for listening tests would confer great benefits for a wide range of speech processing applications. Many attempts have therefore been made to introduce an objective, and ideally referencefree measure for this purpose. Most works analyze speech intelligibility prediction (SIP) methods from a macroscopic point of view, averaging over longer time spans. This paper, in contrast, presents a theoretical framework for the microscopic evaluation of SIP methods. Within our framework, a Statistically estimated Accuracy based on Theory (StAT) is derived, which numerically quantifies the statistical limitations inherent in microscopic SIP. A state-of-the-art approach to microscopic SIP, namely, the use of automatic speech recognition (ASR) to directly predict listening test results, is evaluated within this framework. The practical results are in good agreement with the theory. As the final contribution, a fully blind DIscriminative Speech intelligibility Predictor (DISP) is introduced and is also evaluated within the StAT framework. It is shown that this novel, blind estimator can predict intelligibility as well as-and often even with better accuracy than-the non-blind ASR-based approach, and that its results are again in good agreement with its theoretically derived performance potential.
Collapse
Affiliation(s)
- Mahdie Karbasi
- Cognitive signal processing group, Electrical engineering department, Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, NRW, Germany
| | - Steffen Zeiler
- Cognitive signal processing group, Electrical engineering department, Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, NRW, Germany
| | - Dorothea Kolossa
- Cognitive signal processing group, Electrical engineering department, Ruhr-Universität Bochum, Universitätsstraße 150, 44801 Bochum, NRW, Germany
| |
Collapse
|
6
|
Fontan L, Gonçalves Braz L, Pinquier J, Stone MA, Füllgrabe C. Using Automatic Speech Recognition to Optimize Hearing-Aid Time Constants. Front Neurosci 2022; 16:779062. [PMID: 35368250 PMCID: PMC8969748 DOI: 10.3389/fnins.2022.779062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 02/14/2022] [Indexed: 12/03/2022] Open
Abstract
Automatic speech recognition (ASR), when combined with hearing-aid (HA) and hearing-loss (HL) simulations, can predict aided speech-identification performances of persons with age-related hearing loss. ASR can thus be used to evaluate different HA configurations, such as combinations of insertion-gain functions and compression thresholds, in order to optimize HA fitting for a given person. The present study investigated whether, after fixing compression thresholds and insertion gains, a random-search algorithm could be used to optimize time constants (i.e., attack and release times) for 12 audiometric profiles. The insertion gains were either those recommended by the CAM2 prescription rule or those optimized using ASR, while compression thresholds were always optimized using ASR. For each audiometric profile, the random-search algorithm was used to vary time constants with the aim to maximize ASR performance. A HA simulator and a HL simulator simulator were used, respectively, to amplify and to degrade speech stimuli according to the input audiogram. The resulting speech signals were fed to an ASR system for recognition. For each audiogram, 1,000 iterations of the random-search algorithm were used to find the time-constant configuration yielding the highest ASR score. To assess the reproducibility of the results, the random search algorithm was run twice. Optimizing the time constants significantly improved the ASR scores when CAM2 insertion gains were used, but not when using ASR-based gains. Repeating the random search yielded similar ASR scores, but different time-constant configurations.
Collapse
Affiliation(s)
- Lionel Fontan
- Archean LABS, Montauban, France
- *Correspondence: Lionel Fontan,
| | | | | | - Michael A. Stone
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Christian Füllgrabe
- School of Sport, Exercise and Health Sciences, Loughborough University, Loughborough, United Kingdom
| |
Collapse
|
7
|
Gonçalves Braz L, Fontan L, Pinquier J, Stone MA, Füllgrabe C. OPRA-RS: A Hearing-Aid Fitting Method Based on Automatic Speech Recognition and Random Search. Front Neurosci 2022; 16:779048. [PMID: 35264922 PMCID: PMC8899657 DOI: 10.3389/fnins.2022.779048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Accepted: 01/13/2022] [Indexed: 12/27/2022] Open
Abstract
Hearing-aid (HA) prescription rules (such as NAL-NL2, DSL-v5, and CAM2) are used by HA audiologists to define initial HA settings (e.g., insertion gains, IGs) for patients. This initial fitting is later individually adjusted for each patient to improve clinical outcomes in terms of speech intelligibility and listening comfort. During this fine-tuning stage, speech-intelligibility tests are often carried out with the patient to assess the benefits associated with different HA settings. As these tests tend to be time-consuming and performance on them depends on the patient's level of fatigue and familiarity with the test material, only a limited number of HA settings can be explored. Consequently, it is likely that a suboptimal fitting is used for the patient. Recent studies have shown that automatic speech recognition (ASR) can be used to predict the effects of IGs on speech intelligibility for patients with age-related hearing loss (ARHL). The aim of the present study was to extend this approach by optimizing, in addition to IGs, compression thresholds (CTs). However, increasing the number of parameters to be fitted increases exponentially the number of configurations to be assessed. To limit the number of HA settings to be tested, three random-search (RS) genetic algorithms were used. The resulting new HA fitting method, combining ASR and RS, is referred to as "objective prescription rule based on ASR and random search" (OPRA-RS). Optimal HA settings were computed for 12 audiograms, representing average and individual audiometric profiles typical for various levels of ARHL severity, and associated ASR performances were compared to those obtained with the settings recommended by CAM2. Each RS algorithm was run twice to assess its reliability. For all RS algorithms, ASR scores obtained with OPRA-RS were significantly higher than those associated with CAM2. Each RS algorithm converged on similar optimal HA settings across repetitions. However, significant differences were observed between RS algorithms in terms of maximum ASR performance and processing costs. These promising results open the way to the use of ASR and RS algorithms for the fine-tuning of HAs with potential speech-intelligibility benefits for the patient.
Collapse
Affiliation(s)
- Libio Gonçalves Braz
- IRIT, CNRS, Université Paul Sabatier, Toulouse, France,*Correspondence: Libio Gonçalves Braz
| | | | | | - Michael A. Stone
- Manchester Centre for Audiology and Deafness, School of Health Sciences, University of Manchester, Manchester, United Kingdom
| | - Christian Füllgrabe
- School of Sport, Exercise and Health Sciences, Loughborough University, Loughborough, United Kingdom
| |
Collapse
|
8
|
Hülsmeier D, Buhl M, Wardenga N, Warzybok A, Schädler MR, Kollmeier B. Inference of the distortion component of hearing impairment from speech recognition by predicting the effect of the attenuation component. Int J Audiol 2021; 61:205-219. [PMID: 34081564 DOI: 10.1080/14992027.2021.1929515] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
OBJECTIVE A model-based determination of the average supra-threshold ("distortion") component of hearing impairment which limits the benefit of hearing aid amplification. DESIGN Published speech recognition thresholds (SRTs) were predicted with the framework for auditory discrimination experiments (FADE), which simulates recognition processes, the speech intelligibility index (SII), which exploits frequency-dependent signal-to-noise ratios (SNR), and a modified SII with a hearing-loss-dependent band importance function (PAV). Their attenuation-component-based prediction errors were interpreted as estimates of the distortion component. STUDY SAMPLE Unaided SRTs of 315 hearing-impaired ears measured with the German matrix sentence test in stationary noise. RESULTS Overall, the models showed root-mean-square errors (RMSEs) of 7 dB, but for steeply sloping hearing loss FADE and PAV were more accurate (RMSE = 9 dB) than the SII (RMSE = 23 dB). Prediction errors of FADE and PAV increased linearly with the average hearing loss. The consideration of the distortion component estimate significantly improved the accuracy of FADE's and PAV's predictions. CONCLUSIONS The supra-threshold distortion component-estimated by prediction errors of FADE and PAV-seems to increase with the average hearing loss. Accounting for a distortion component improves the model predictions and implies a need for effective compensation strategies for supra-threshold processing deficits with increasing audibility loss.
Collapse
Affiliation(s)
- David Hülsmeier
- Medical Physics, CvO University Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Mareike Buhl
- Medical Physics, CvO University Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Nina Wardenga
- Cluster of Excellence Hearing4all, Oldenburg, Germany.,Department of Otolaryngology, Hannover Medical School, Hannover, Germany
| | - Anna Warzybok
- Medical Physics, CvO University Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Marc René Schädler
- Medical Physics, CvO University Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Oldenburg, Germany
| | - Birger Kollmeier
- Medical Physics, CvO University Oldenburg, Oldenburg, Germany.,Cluster of Excellence Hearing4all, Oldenburg, Germany
| |
Collapse
|
9
|
Hülsmeier D, Schädler MR, Kollmeier B. DARF: A data-reduced FADE version for simulations of speech recognition thresholds with real hearing aids. Hear Res 2021; 404:108217. [PMID: 33706223 DOI: 10.1016/j.heares.2021.108217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Revised: 02/11/2021] [Accepted: 02/16/2021] [Indexed: 10/22/2022]
Abstract
Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signals have to be recorded. We propose and evaluate a data-reduced FADE version (DARF) which facilitates simulations with signals that cannot be processed digitally, but that can only be recorded in real-time. DARF simulates one speech recognition threshold (SRT) with about 30 min of recorded and processed signals of the (German) matrix sentence test. Benchmark experiments were carried out to compare DARF and standard FADE exhibiting small differences for stationary maskers (1 dB), but larger differences with strongly fluctuating maskers (5 dB). Hearing impairment and hearing aid algorithms seemed to reduce the differences. Hearing aid benefits were simulated in terms of speech recognition with three pairs of real hearing aids in silence (≥8 dB), in stationary and fluctuating maskers in co-located (stat. 2 dB; fluct. 6 dB), and spatially separated speech and noise signals (stat. ≥8 dB; fluct. 8 dB). The simulations were plausible in comparison to data from literature, but a comparison with empirical data is still open. DARF facilitates objective SRT simulations with real devices with unknown signal processing in real environments. Yet, a validation of DARF for devices with unknown signal processing is still pending since it was only tested with three similar devices. Nonetheless, DARF could be used for improving as well as for developing or model-based fitting of hearing aids.
Collapse
Affiliation(s)
- David Hülsmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, CvO Universität Oldenburg, Oldenburg 26129, Germany.
| | - Marc René Schädler
- Medizinische Physik and Cluster of Excellence Hearing4all, CvO Universität Oldenburg, Oldenburg 26129, Germany
| | - Birger Kollmeier
- Medizinische Physik and Cluster of Excellence Hearing4all, CvO Universität Oldenburg, Oldenburg 26129, Germany
| |
Collapse
|