Karbasi M, Kolossa D. ASR-based speech intelligibility prediction: A review.
Hear Res 2022;
426:108606. [PMID:
36154977 DOI:
10.1016/j.heares.2022.108606]
[Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 08/15/2022] [Accepted: 09/12/2022] [Indexed: 11/04/2022]
Abstract
Various types of methods and approaches are available to predict the intelligibility of speech signals, but many of these still suffer from two major problems: first, their required prior knowledge, which itself could limit the applicability and lower the objectivity of the method, and second, a low generalization capacity, e.g. across noise types, degradation conditions, and speech material. Automatic speech recognition (ASR) has been suggested as a machine-learning-based component of speech intelligibility prediction (SIP), aiming to ameliorate the shortcomings of other SIP methods. Since their first introduction, ASR-based SIP approaches have been developing at an increasingly rapid pace, were deployed in a range of contexts, and have shown promising performance in many scenarios. Our article provides an overview of this body of research. The main differences between competing methods are highlighted and their benefits are explained next to their limitations. We conclude with an outlook on future work and new related directions.
Collapse